@phdthesis{Bauer2021,
  author    = {Bauer, Andr{\´e}},
  title     = {Automated Hybrid Time Series Forecasting: Design, Benchmarking, and Use Cases},
  doi       = {10.25972/OPUS-22025},
  url       = {http://nbn-resolving.de/urn:nbn:de:bvb:20-opus-220255},
  school      = {Universit{\"a}t W{\"u}rzburg},
  year      = {2021},
  abstract  = {These days, we are living in a digitalized world. Both our professional and private lives are pervaded by various IT services, which are typically operated using distributed computing systems (e.g., cloud environments). Due to the high level of digitalization, the operators of such systems are confronted with fast-paced and changing requirements. In particular, cloud environments have to cope with load fluctuations and respective rapid and unexpected changes in the computing resource demands. To face this challenge, so-called auto-scalers, such as the threshold-based mechanism in Amazon Web Services EC2, can be employed to enable elastic scaling of the computing resources. However, despite this opportunity, business-critical applications are still run with highly overprovisioned resources to guarantee a stable and reliable service operation. This strategy is pursued due to the lack of trust in auto-scalers and the concern that inaccurate or delayed adaptations may result in financial losses. To adapt the resource capacity in time, the future resource demands must be "foreseen", as reacting to changes once they are observed introduces an inherent delay. In other words, accurate forecasting methods are required to adapt systems proactively. A powerful approach in this context is time series forecasting, which is also applied in many other domains. The core idea is to examine past values and predict how these values will evolve as time progresses. According to the "No-Free-Lunch Theorem", there is no algorithm that performs best for all scenarios. Therefore, selecting a suitable forecasting method for a given use case is a crucial task. Simply put, each method has its benefits and drawbacks, depending on the specific use case. The choice of the forecasting method is usually based on expert knowledge, which cannot be fully automated, or on trial-and-error. In both cases, this is expensive and prone to error. Although auto-scaling and time series forecasting are established research fields, existing approaches cannot fully address the mentioned challenges: (i) In our survey on time series forecasting, we found that publications on time series forecasting typically consider only a small set of (mostly related) methods and evaluate their performance on a small number of time series with only a few error measures while providing no information on the execution time of the studied methods. Therefore, such articles cannot be used to guide the choice of an appropriate method for a particular use case; (ii) Existing open-source hybrid forecasting methods that take advantage of at least two methods to tackle the "No-Free-Lunch Theorem" are computationally intensive, poorly automated, designed for a particular data set, or they lack a predictable time-to-result. Methods exhibiting a high variance in the time-to-result cannot be applied for time-critical scenarios (e.g., auto-scaling), while methods tailored to a specific data set introduce restrictions on the possible use cases (e.g., forecasting only annual time series); (iii) Auto-scalers typically scale an application either proactively or reactively. Even though some hybrid auto-scalers exist, they lack sophisticated solutions to combine reactive and proactive scaling. For instance, resources are only released proactively while resource allocation is entirely done in a reactive manner (inherently delayed); (iv) The majority of existing mechanisms do not take the provider's pricing scheme into account while scaling an application in a public cloud environment, which often results in excessive charged costs. Even though some cost-aware auto-scalers have been proposed, they only consider the current resource demands, neglecting their development over time. For example, resources are often shut down prematurely, even though they might be required again soon. To address the mentioned challenges and the shortcomings of existing work, this thesis presents three contributions: (i) The first contribution-a forecasting benchmark-addresses the problem of limited comparability between existing forecasting methods; (ii) The second contribution-Telescope-provides an automated hybrid time series forecasting method addressing the challenge posed by the "No-Free-Lunch Theorem"; (iii) The third contribution-Chamulteon-provides a novel hybrid auto-scaler for coordinated scaling of applications comprising multiple services, leveraging Telescope to forecast the workload intensity as a basis for proactive resource provisioning. In the following, the three contributions of the thesis are summarized: Contribution I - Forecasting Benchmark To establish a level playing field for evaluating the performance of forecasting methods in a broad setting, we propose a novel benchmark that automatically evaluates and ranks forecasting methods based on their performance in a diverse set of evaluation scenarios. The benchmark comprises four different use cases, each covering 100 heterogeneous time series taken from different domains. The data set was assembled from publicly available time series and was designed to exhibit much higher diversity than existing forecasting competitions. Besides proposing a new data set, we introduce two new measures that describe different aspects of a forecast. We applied the developed benchmark to evaluate Telescope. Contribution II - Telescope To provide a generic forecasting method, we introduce a novel machine learning-based forecasting approach that automatically retrieves relevant information from a given time series. More precisely, Telescope automatically extracts intrinsic time series features and then decomposes the time series into components, building a forecasting model for each of them. Each component is forecast by applying a different method and then the final forecast is assembled from the forecast components by employing a regression-based machine learning algorithm. In more than 1300 hours of experiments benchmarking 15 competing methods (including approaches from Uber and Facebook) on 400 time series, Telescope outperformed all methods, exhibiting the best forecast accuracy coupled with a low and reliable time-to-result. Compared to the competing methods that exhibited, on average, a forecast error (more precisely, the symmetric mean absolute forecast error) of 29\%, Telescope exhibited an error of 20\% while being 2556 times faster. In particular, the methods from Uber and Facebook exhibited an error of 48\% and 36\%, and were 7334 and 19 times slower than Telescope, respectively. Contribution III - Chamulteon To enable reliable auto-scaling, we present a hybrid auto-scaler that combines proactive and reactive techniques to scale distributed cloud applications comprising multiple services in a coordinated and cost-effective manner. More precisely, proactive adaptations are planned based on forecasts of Telescope, while reactive adaptations are triggered based on actual observations of the monitored load intensity. To solve occurring conflicts between reactive and proactive adaptations, a complex conflict resolution algorithm is implemented. Moreover, when deployed in public cloud environments, Chamulteon reviews adaptations with respect to the cloud provider's pricing scheme in order to minimize the charged costs. In more than 400 hours of experiments evaluating five competing auto-scaling mechanisms in scenarios covering five different workloads, four different applications, and three different cloud environments, Chamulteon exhibited the best auto-scaling performance and reliability while at the same time reducing the charged costs. The competing methods provided insufficient resources for (on average) 31\% of the experimental time; in contrast, Chamulteon cut this time to 8\% and the SLO (service level objective) violations from 18\% to 6\% while using up to 15\% less resources and reducing the charged costs by up to 45\%. The contributions of this thesis can be seen as major milestones in the domain of time series forecasting and cloud resource management. (i) This thesis is the first to present a forecasting benchmark that covers a variety of different domains with a high diversity between the analyzed time series. Based on the provided data set and the automatic evaluation procedure, the proposed benchmark contributes to enhance the comparability of forecasting methods. The benchmarking results for different forecasting methods enable the selection of the most appropriate forecasting method for a given use case. (ii) Telescope provides the first generic and fully automated time series forecasting approach that delivers both accurate and reliable forecasts while making no assumptions about the analyzed time series. Hence, it eliminates the need for expensive, time-consuming, and error-prone procedures, such as trial-and-error searches or consulting an expert. This opens up new possibilities especially in time-critical scenarios, where Telescope can provide accurate forecasts with a short and reliable time-to-result. Although Telescope was applied for this thesis in the field of cloud computing, there is absolutely no limitation regarding the applicability of Telescope in other domains, as demonstrated in the evaluation. Moreover, Telescope, which was made available on GitHub, is already used in a number of interdisciplinary data science projects, for instance, predictive maintenance in an Industry 4.0 context, heart failure prediction in medicine, or as a component of predictive models of beehive development. (iii) In the context of cloud resource management, Chamulteon is a major milestone for increasing the trust in cloud auto-scalers. The complex resolution algorithm enables reliable and accurate scaling behavior that reduces losses caused by excessive resource allocation or SLO violations. In other words, Chamulteon provides reliable online adaptations minimizing charged costs while at the same time maximizing user experience.},
  subject      = {Zeitreihenanalyse},
  language  = {en}
}
@article{KellerLeidingerVogeletal.2014,
  author    = {Keller, Andreas and Leidinger, Petra and Vogel, Britta and Backes, Christina and ElSharawy, Abdou and Galata, Valentina and Mueller, Sabine C. and Marquart, Sabine and Schrauder, Michael G. and Strick, Reiner and Bauer, Andrea and Wischhusen, J{\"o}rg and Beier, Markus and Kohlhaas, Jochen and Katus, Hugo A. and Hoheisel, J{\"o}rg and Franke, Andre and Meder, Benjamin and Meese, Eckart},
  title     = {miRNAs can be generally associated with human pathologies as exemplified for miR-144*},
  series = {BMC MEDICINE},
  volume    = {12},
  journal   = {BMC MEDICINE},
  issn      = {1741-7015},
  doi       = {10.1186/s12916-014-0224-0},
  url       = {http://nbn-resolving.de/urn:nbn:de:bvb:20-opus-114349},
  pages     = {224},
  year      = {2014},
  abstract  = {Background: miRNA profiles are promising biomarker candidates for a manifold of human pathologies, opening new avenues for diagnosis and prognosis. Beyond studies that describe miRNAs frequently as markers for specific traits, we asked whether a general pattern for miRNAs across many diseases exists. Methods: We evaluated genome-wide circulating profiles of 1,049 patients suffering from 19 different cancer and non-cancer diseases as well as unaffected controls. The results were validated on 319 individuals using qRT-PCR. Results: We discovered 34 miRNAs with strong disease association. Among those, we found substantially decreased levels of hsa-miR-144* and hsa-miR-20b with AUC of 0.751 ( 95\% CI: 0.703-0.799), respectively. We also discovered a set of miRNAs, including hsa-miR-155*, as rather stable markers, offering reasonable control miRNAs for future studies. The strong downregulation of hsa-miR-144* and the less variable pattern of hsa-miR-155* has been validated in a cohort of 319 samples in three different centers. Here, breast cancer as an additional disease phenotype not included in the screening phase has been included as the 20th trait. Conclusions: Our study on 1,368 patients including 1,049 genome-wide miRNA profiles and 319 qRT-PCR validations further underscores the high potential of specific blood-borne miRNA patterns as molecular biomarkers. Importantly, we highlight 34 miRNAs that are generally dysregulated in human pathologies. Although these markers are not specific to certain diseases they may add to the diagnosis in combination with other markers, building a specific signature. Besides these dysregulated miRNAs, we propose a set of constant miRNAs that may be used as control markers.},
  language  = {en}
}
@article{PrantlZeckBaueretal.2022,
  author    = {Prantl, Thomas and Zeck, Timo and Bauer, Andre and Ten, Peter and Prantl, Dominik and Yahya, Ala Eddine Ben and Ifflaender, Lukas and Dmitrienko, Alexandra and Krupitzer, Christian and Kounev, Samuel},
  title     = {A Survey on Secure Group Communication Schemes With Focus on IoT Communication},
  series = {IEEE Access},
  volume    = {10},
  journal   = {IEEE Access},
  doi       = {10.1109/ACCESS.2022.3206451},
  url       = {http://nbn-resolving.de/urn:nbn:de:bvb:20-opus-300257},
  pages     = {99944 -- 99962},
  year      = {2022},
  abstract  = {A key feature for Internet of Things (IoT) is to control what content is available to each user. To handle this access management, encryption schemes can be used. Due to the diverse usage of encryption schemes, there are various realizations of 1-to-1, 1-to-n, and n-to-n schemes in the literature. This multitude of encryption methods with a wide variety of properties presents developers with the challenge of selecting the optimal method for a particular use case, which is further complicated by the fact that there is no overview of existing encryption schemes. To fill this gap, we envision a cryptography encyclopedia providing such an overview of existing encryption schemes. In this survey paper, we take a first step towards such an encyclopedia by creating a sub-encyclopedia for secure group communication (SGC) schemes, which belong to the n-to-n category. We extensively surveyed the state-of-the-art and classified 47 different schemes. More precisely, we provide (i) a comprehensive overview of the relevant security features, (ii) a set of relevant performance metrics, (iii) a classification for secure group communication schemes, and (iv) workflow descriptions of the 47 schemes. Moreover, we perform a detailed performance and security evaluation of the 47 secure group communication schemes. Based on this evaluation, we create a guideline for the selection of secure group communication schemes.},
  language  = {en}
}
@article{KoehlerBauerDietzetal.2022,
  author    = {Koehler, Jonas and Bauer, Andr{\´e} and Dietz, Andreas J. and Kuenzer, Claudia},
  title     = {Towards forecasting future snow cover dynamics in the European Alps — the potential of long optical remote-sensing time series},
  series = {Remote Sensing},
  volume    = {14},
  journal   = {Remote Sensing},
  number    = {18},
  issn      = {2072-4292},
  doi       = {10.3390/rs14184461},
  url       = {http://nbn-resolving.de/urn:nbn:de:bvb:20-opus-288338},
  year      = {2022},
  abstract  = {Snow is a vital environmental parameter and dynamically responsive to climate change, particularly in mountainous regions. Snow cover can be monitored at variable spatial scales using Earth Observation (EO) data. Long-lasting remote sensing missions enable the generation of multi-decadal time series and thus the detection of long-term trends. However, there have been few attempts to use these to model future snow cover dynamics. In this study, we, therefore, explore the potential of such time series to forecast the Snow Line Elevation (SLE) in the European Alps. We generate monthly SLE time series from the entire Landsat archive (1985-2021) in 43 Alpine catchments. Positive long-term SLE change rates are detected, with the highest rates (5-8 m/y) in the Western and Central Alps. We utilize this SLE dataset to implement and evaluate seven uni-variate time series modeling and forecasting approaches. The best results were achieved by Random Forests, with a Nash-Sutcliffe efficiency (NSE) of 0.79 and a Mean Absolute Error (MAE) of 258 m, Telescope (0.76, 268 m), and seasonal ARIMA (0.75, 270 m). Since the model performance varies strongly with the input data, we developed a combined forecast based on the best-performing methods in each catchment. This approach was then used to forecast the SLE for the years 2022-2029. In the majority of the catchments, the shift of the forecast median SLE level retained the sign of the long-term trend. In cases where a deviating SLE dynamic is forecast, a discussion based on the unique properties of the catchment and past SLE dynamics is required. In the future, we expect major improvements in our SLE forecasting efforts by including external predictor variables in a multi-variate modeling approach.},
  language  = {en}
}