OPUS Würzburg

13 search hits

1 to 10

Sort by

Year
Year
Title
Title
Author
Author

Proactive Critical Event Prediction based on Monitoring Data with Focus on Technical Systems (2022)

The importance of proactive and timely prediction of critical events is steadily increasing, whether in the manufacturing industry or in private life. In the past, machines in the manufacturing industry were often maintained based on a regular schedule or threshold violations, which is no longer competitive as it causes unnecessary costs and downtime. In contrast, the predictions of critical events in everyday life are often much more concealed and hardly noticeable to the private individual, unless the critical event occurs. For instance, our electricity provider has to ensure that we, as end users, are always supplied with sufficient electricity, or our favorite streaming service has to guarantee that we can watch our favorite series without interruptions. For this purpose, they have to constantly analyze what the current situation is, how it will develop in the near future, and how they have to react in order to cope with future conditions without causing power outages or video stalling. In order to analyze the performance of a system, monitoring mechanisms are often integrated to observe characteristics that describe the workload and the state of the system and its environment. Reactive systems typically employ thresholds, utility functions, or models to determine the current state of the system. However, such reactive systems cannot proactively estimate future events, but only as they occur. In the case of critical events, reactive determination of the current system state is futile, whereas a proactive system could have predicted this event in advance and enabled timely countermeasures. To achieve proactivity, the system requires estimates of future system states. Given the gap between design time and runtime, it is typically not possible to use expert knowledge to a priori model all situations a system might encounter at runtime. Therefore, prediction methods must be integrated into the system. Depending on the available monitoring data and the complexity of the prediction task, either time series forecasting in combination with thresholding or more sophisticated machine and deep learning models have to be trained. Although numerous forecasting methods have been proposed in the literature, these methods have their advantages and disadvantages depending on the characteristics of the time series under consideration. Therefore, expert knowledge is required to decide which forecasting method to choose. However, since the time series observed at runtime cannot be known at design time, such expert knowledge cannot be implemented in the system. In addition to selecting an appropriate forecasting method, several time series preprocessing steps are required to achieve satisfactory forecasting accuracy. In the literature, this preprocessing is often done manually, which is not practical for autonomous computing systems, such as Self-Aware Computing Systems. Several approaches have also been presented in the literature for predicting critical events based on multivariate monitoring data using machine and deep learning. However, these approaches are typically highly domain-specific, such as financial failures, bearing failures, or product failures. Therefore, they require in-depth expert knowledge. For this reason, these approaches cannot be fully automated and are not transferable to other use cases. Thus, the literature lacks generalizable end-to-end workflows for modeling, detecting, and predicting failures that require only little expert knowledge. To overcome these shortcomings, this thesis presents a system model for meta-self-aware prediction of critical events based on the LRA-M loop of Self-Aware Computing Systems. Building upon this system model, this thesis provides six further contributions to critical event prediction. While the first two contributions address critical event prediction based on univariate data via time series forecasting, the three subsequent contributions address critical event prediction for multivariate monitoring data using machine and deep learning algorithms. Finally, the last contribution addresses the update procedure of the system model. Specifically, the seven main contributions of this thesis can be summarized as follows: First, we present a system model for meta self-aware prediction of critical events. To handle both univariate and multivariate monitoring data, it offers univariate time series forecasting for use cases where a single observed variable is representative of the state of the system, and machine learning algorithms combined with various preprocessing techniques for use cases where a large number of variables are observed to characterize the system’s state. However, the two different modeling alternatives are not disjoint, as univariate time series forecasts can also be included to estimate future monitoring data as additional input to the machine learning models. Finally, a feedback loop is incorporated to monitor the achieved prediction quality and trigger model updates. We propose a novel hybrid time series forecasting method for univariate, seasonal time series, called Telescope. To this end, Telescope automatically preprocesses the time series, performs a kind of divide-and-conquer technique to split the time series into multiple components, and derives additional categorical information. It then forecasts the components and categorical information separately using a specific state-of-the-art method for each component. Finally, Telescope recombines the individual predictions. As Telescope performs both preprocessing and forecasting automatically, it represents a complete end-to-end approach to univariate seasonal time series forecasting. Experimental results show that Telescope achieves enhanced forecast accuracy, more reliable forecasts, and a substantial speedup. Furthermore, we apply Telescope to the scenario of predicting critical events for virtual machine auto-scaling. Here, results show that Telescope considerably reduces the average response time and significantly reduces the number of service level objective violations. For the automatic selection of a suitable forecasting method, we introduce two frameworks for recommending forecasting methods. The first framework extracts various time series characteristics to learn the relationship between them and forecast accuracy. In contrast, the other framework divides the historical observations into internal training and validation parts to estimate the most appropriate forecasting method. Moreover, this framework also includes time series preprocessing steps. Comparisons between the proposed forecasting method recommendation frameworks and the individual state-of-the-art forecasting methods and the state-of-the-art forecasting method recommendation approach show that the proposed frameworks considerably improve the forecast accuracy. With regard to multivariate monitoring data, we first present an end-to-end workflow to detect critical events in technical systems in the form of anomalous machine states. The end-to-end design includes raw data processing, phase segmentation, data resampling, feature extraction, and machine tool anomaly detection. In addition, the workflow does not rely on profound domain knowledge or specific monitoring variables, but merely assumes standard machine monitoring data. We evaluate the end-to-end workflow using data from a real CNC machine. The results indicate that conventional frequency analysis does not detect the critical machine conditions well, while our workflow detects the critical events very well with an F1-score of almost 91%. To predict critical events rather than merely detecting them, we compare different modeling alternatives for critical event prediction in the use case of time-to-failure prediction of hard disk drives. Given that failure records are typically significantly less frequent than instances representing the normal state, we employ different oversampling strategies. Next, we compare the prediction quality of binary class modeling with downscaled multi-class modeling. Furthermore, we integrate univariate time series forecasting into the feature generation process to estimate future monitoring data. Finally, we model the time-to-failure using not only classification models but also regression models. The results suggest that multi-class modeling provides the overall best prediction quality with respect to practical requirements. In addition, we prove that forecasting the features of the prediction model significantly improves the critical event prediction quality. We propose an end-to-end workflow for predicting critical events of industrial machines. Again, this approach does not rely on expert knowledge except for the definition of monitoring data, and therefore represents a generalizable workflow for predicting critical events of industrial machines. The workflow includes feature extraction, feature handling, target class mapping, and model learning with integrated hyperparameter tuning via a grid-search technique. Drawing on the result of the previous contribution, the workflow models the time-to-failure prediction in terms of multiple classes, where we compare different labeling strategies for multi-class classification. The evaluation using real-world production data of an industrial press demonstrates that the workflow is capable of predicting six different time-to-failure windows with a macro F1-score of 90%. When scaling the time-to-failure classes down to a binary prediction of critical events, the F1-score increases to above 98%. Finally, we present four update triggers to assess when critical event prediction models should be re-trained during on-line application. Such re-training is required, for instance, due to concept drift. The update triggers introduced in this thesis take into account the elapsed time since the last update, the prediction quality achieved on the current test data, and the prediction quality achieved on the preceding test data. We compare the different update strategies with each other and with the static baseline model. The results demonstrate the necessity of model updates during on-line application and suggest that the update triggers that consider both the prediction quality of the current and preceding test data achieve the best trade-off between prediction quality and number of updates required. We are convinced that the contributions of this thesis constitute significant impulses for the academic research community as well as for practitioners. First of all, to the best of our knowledge, we are the first to propose a fully automated, end-to-end, hybrid, component-based forecasting method for seasonal time series that also includes time series preprocessing. Due to the combination of reliably high forecast accuracy and reliably low time-to-result, it offers many new opportunities in applications requiring accurate forecasts within a fixed time period in order to take timely countermeasures. In addition, the promising results of the forecasting method recommendation systems provide new opportunities to enhance forecasting performance for all types of time series, not just seasonal ones. Furthermore, we are the first to expose the deficiencies of the prior state-of-the-art forecasting method recommendation system. Concerning the contributions to critical event prediction based on multivariate monitoring data, we have already collaborated closely with industrial partners, which supports the practical relevance of the contributions of this thesis. The automated end-to-end design of the proposed workflows that do not demand profound domain or expert knowledge represents a milestone in bridging the gap between academic theory and industrial application. Finally, the workflow for predicting critical events in industrial machines is currently being operationalized in a real production system, underscoring the practical impact of this thesis.

Temporal Confounding Effects in Virtual and Extended Reality Systems (2022)

Stauffert, Jan-Philipp

Latency is an inherent problem of computing systems. Each computation takes time until the result is available. Virtual reality systems use elaborated computer resources to create virtual experiences. The latency of those systems is often ignored or assumed as small enough to provide a good experience. This cumulative thesis is comprised of published peer reviewed research papers exploring the behaviour and effects of latency. Contrary to the common description of time invariant latency, latency is shown to fluctuate. Few other researchers have looked into this time variant behaviour. This thesis explores time variant latency with a focus on randomly occurring latency spikes. Latency spikes are observed both for small algorithms and as end to end latency in complete virtual reality systems. Most latency measurements gather close to the mean latency with potentially multiple smaller clusters of larger latency values and rare extreme outliers. The latency behaviour differs for different implementations of an algorithm. Operating system schedulers and programming language environments such as garbage collectors contribute to the overall latency behaviour. The thesis demonstrates these influences on the example of different implementations of message passing. The plethora of latency sources result in an unpredictable latency behaviour. Measuring and reporting it in scientific experiments is important. This thesis describes established approaches to measuring latency and proposes an enhanced setup to gather detailed information. The thesis proposes to dissect the measured data with a stacked z-outlier-test to separate the clusters of latency measurements for better reporting. Latency in virtual reality applications can degrade the experience in multiple ways. The thesis focuses on cybersickness as a major detrimental effect. An approach to simulate time variant latency is proposed to make latency available as an independent variable in experiments to understand latency's effects. An experiment with modified latency shows that latency spikes can contribute to cybersickness. A review of related research shows that different time invariant latency behaviour also contributes to cybersickness.

Measurement, Modeling, and Emulation of Power Consumption of Distributed Systems (2022)

Schmitt, Norbert

Today’s cloud data centers consume an enormous amount of energy, and energy consumption will rise in the future. An estimate from 2012 found that data centers consume about 30 billion watts of power, resulting in about 263TWh of energy usage per year. The energy consumption will rise to 1929TWh until 2030. This projected rise in energy demand is fueled by a growing number of services deployed in the cloud. 50% of enterprise workloads have been migrated to the cloud in the last decade so far. Additionally, an increasing number of devices are using the cloud to provide functionalities and enable data centers to grow. Estimates say more than 75 billion IoT devices will be in use by 2025. The growing energy demand also increases the amount of CO2 emissions. Assuming a CO2-intensity of 200g CO2 per kWh will get us close to 227 billion tons of CO2. This emission is more than the emissions of all energy-producing power plants in Germany in 2020. However, data centers consume energy because they respond to service requests that are fulfilled through computing resources. Hence, it is not the users and devices that consume the energy in the data center but the software that controls the hardware. While the hardware is physically consuming energy, it is not always responsible for wasting energy. The software itself plays a vital role in reducing the energy consumption and CO2 emissions of data centers. The scenario of our thesis is, therefore, focused on software development. Nevertheless, we must first show developers that software contributes to energy consumption by providing evidence of its influence. The second step is to provide methods to assess an application’s power consumption during different phases of the development process and to allow modern DevOps and agile development methods. We, therefore, need to have an automatic selection of system-level energy-consumption models that can accommodate rapid changes in the source code and application-level models allowing developers to locate power-consuming software parts for constant improvements. Afterward, we need emulation to assess the energy efficiency before the actual deployment.

Detecting Anomalies in Transaction Data (2022)

Schlör, Daniel

Detecting anomalies in transaction data is an important task with a high potential to avoid financial loss due to irregularities deliberately or inadvertently carried out, such as credit card fraud, occupational fraud in companies or ordering and accounting errors. With ongoing digitization of our world, data-driven approaches, including machine learning, can draw benefit from data with less manual effort and feature engineering. A large variety of machine learning-based anomaly detection methods approach this by learning a precise model of normality from which anomalies can be distinguished. Modeling normality in transactional data, however, requires to capture distributions and dependencies within the data precisely with special attention to numerical dependencies such as quantities, prices or amounts. To implicitly model numerical dependencies, Neural Arithmetic Logic Units have been proposed as neural architecture. In practice, however, these have stability and precision issues. Therefore, we first develop an improved neural network architecture, iNALU, which is designed to better model numerical dependencies as found in transaction data. We compare this architecture to the previous approach and show in several experiments of varying complexity that our novel architecture provides better precision and stability. We integrate this architecture into two generative neural network models adapted for transaction data and investigate how well normal behavior is modeled. We show that both architectures can successfully model normal transaction data, with our neural architecture improving generative performance for one model. Since categorical and numerical variables are common in transaction data, but many machine learning methods only process numerical representations, we explore different representation learning techniques to transform categorical transaction data into dense numerical vectors. We extend this approach by proposing an outlier-aware discretization, thus incorporating numerical attributes into the computation of categorical embeddings, and investigate latent spaces, as well as quantitative performance for anomaly detection. Next, we evaluate different scenarios for anomaly detection on transaction data. We extend our iNALU architecture to a neural layer that can model both numerical and non-numerical dependencies and evaluate it in a supervised and one-class setting. We investigate the stability and generalizability of our approach and show that it outperforms a variety of models in the balanced supervised setting and performs comparably in the one-class setting. Finally, we evaluate three approaches to using a generative model as an anomaly detector and compare the anomaly detection performance.

Network Coding for Reliable Data Dissemination in Wireless Sensor Networks (2022)

Runge, Isabel Madeleine

The application of Wireless Sensor Networks (WSNs) with a large number of tiny, cost-efficient, battery-powered sensor nodes that are able to communicate directly with each other poses many challenges. Due to the large number of communicating objects and despite a used CSMA/CA MAC protocol, there may be many signal collisions. In addition, WSNs frequently operate under harsh conditions and nodes are often prone to failure, for example, due to a depleted battery or unreliable components. Thus, nodes or even large parts of the network can fail. These aspects lead to reliable data dissemination and data storage being a key issue. Therefore, these issues are addressed herein while keeping latency low, throughput high, and energy consumption reduced. Furthermore, simplicity as well as robustness to changes in conditions are essential here. In order to achieve these aims, a certain amount of redundancy has to be included. This can be realized, for example, by using network coding. Existing approaches, however, often only perform well under certain conditions or for a specific scenario, have to perform a time-consuming initialization, require complex calculations, or do not provide the possibility of early decoding. Therefore, we developed a network coding procedure called Broadcast Growth Codes (BCGC) for reliable data dissemination, which performs well under a broad range of diverse conditions. These can be a high probability of signal collisions, any degree of nodes' mobility, a large number of nodes, or occurring node failures, for example. BCGC do not require complex initialization and only use simple XOR operations for encoding and decoding. Furthermore, decoding can be started as soon as a first packet/codeword has been received. Evaluations by using an in-house implemented network simulator as well as a real-world testbed showed that BCGC enhance reliability and enable to retrieve data dependably despite an unreliable network. In terms of latency, throughput, and energy consumption, depending on the conditions and the procedure being compared, BCGC can achieve the same performance or even outperform existing procedures significantly while being robust to changes in conditions and allowing low complexity of the nodes as well as early decoding.

Self-Aware Optimization of Cyber-Physical Systems in Intelligent Transportation and Logistics Systems (2022)

Lesch, Veronika

In today's world, circumstances, processes, and requirements for systems in general-in this thesis a special focus is given to the context of Cyber-Physical Systems (CPS)-are becoming increasingly complex and dynamic. In order to operate properly in such dynamic environments, systems must adapt to dynamic changes, which has led to the research area of Self-Adaptive Systems (SAS). These systems can deal with changes in their environment and the system itself. In our daily lives, we come into contact with many different self-adaptive systems that are designed to support and improve our way of life. In this work we focus on the two domains Intelligent Transportation Systems (ITS) and logistics as both domains provide complex and adaptable use cases to prototypical apply the contributions of this thesis. However, the contributions are not limited to these areas and can be generalized also to other domains such as the general area of CPS and Internet of Things including smart grids or even intelligent computer networks. In ITS, real-time traffic control is an example adaptive system that monitors the environment, analyzes observations, and plans and executes adaptation actions. Another example is platooning, which is the ability of vehicles to drive with close inter-vehicle distances. This technology enables an increase in road throughput and safety, which directly addresses the increased infrastructure needs due to increased traffic on the roads. In logistics, the Vehicle Routing Problem (VRP) deals with the planning of road freight transport tours. To cope with the ever-increasing transport volume due to the rise of just-in-time production and online shopping, efficient and correct route planning for transports is important. Further, warehouses play a central role in any company's supply chain and contribute to the logistical success. The processes of storage assignment and order picking are the two main tasks in mezzanine warehouses highly affected by a dynamic environment. Usually, optimization algorithms are applied to find solutions in reasonable computation time. SASes can help address these dynamics by allowing systems to deal with changing demands and constraints. For the application of SASes in the two areas ITS and logistics, the definition of adaptation planning strategies is the key success factor. A wide range of adaptation planning strategies for different domains can be found in the literature, and the operator must select the most promising strategy for the problem at hand. However, the No-Free-Lunch theorem states that the performance of one strategy is not necessarily transferable to other problems. Accordingly, the algorithm selection problem, first defined in 1976, aims to find the best performing algorithm for the current problem. Since then, this problem has been explored more and more, and the machine learning community, for example, considers it a learning problem. The ideas surrounding the algorithm selection problem have been applied in various use cases, but little research has been done to generalize the approaches. Moreover, especially in the field of SASes, the selection of the most appropriate strategy depends on the current situation of the system. Techniques for identifying the situation of a system can be found in the literature, such as the use of rules or clustering techniques. This knowledge can then be used to improve the algorithm selection, or in the scope of this thesis, to improve the selection of adaptation planning strategies. In addition, knowledge about the current situation and the performance of strategies in similar previously observed situations provides another opportunity for improvements. This ongoing learning and reasoning about the system and its environment is found in the research area Self-Aware Computing (SeAC). In this thesis, we explore common characteristics of adaptation planning strategies in the domain of ITS and logistics presenting a self-aware optimization framework for adaptation planning strategies. We consider platooning coordination strategies from ITS and optimization techniques from logistics as adaptation planning strategies that can be exchanged during operation to better reflect the current situation. Further, we propose to integrate fairness and uncertainty handling mechanisms directly into the adaptation planning strategies. We then examine the complex structure of the logistics use cases VRP and mezzanine warehouses and identify their systems-of-systems structure. We propose a two-stage approach for vertical or nested systems and propose to consider the impact of intertwining horizontal or coexisting systems. More specifically, we summarize the six main contributions of this thesis as follows: First, we analyze specific characteristics of adaptation planning strategies with a particular focus on ITS and logistics. We use platooning and route planning in highly dynamic environments as representatives of ITS and we use the rich Vehicle Routing Problem (rVRP) and mezzanine warehouses as representatives of the logistics domain. Using these case studies, we derive the need for situation-aware optimization of adaptation planning strategies and argue that fairness is an important consideration when applying these strategies in ITS. In logistics, we discuss that these complex systems can be considered as systems-of-systems and this structure affects each subsystem. Hence, we argue that the consideration of these characteristics is a crucial factor for the success of the system. Second, we design a self-aware optimization framework for adaptation planning strategies. The optimization framework is abstracted into a third layer above the application and its adaptation planning system, which allows the concept to be applied to a diverse set of use cases. Further, the Domain Data Model (DDM) used to configure the framework enables the operator to easily apply it by defining the available adaptation planning strategies, parameters to be optimized, and performance measures. The framework consists of four components: (i) Coordination, (ii) Situation Detection, (iii) Strategy Selection, and (iv) Parameter Optimization. While the coordination component receives observations and triggers the other components, the situation detection applies rules or clustering techniques to identify the current situation. The strategy selection uses this knowledge to select the most promising strategy for the current situation, and the parameter optimization applies optimization algorithms to tune the parameters of the strategy. Moreover, we apply the concepts of the SeAC domain and integrate learning and reasoning processes to enable ongoing advancement of the framework. We evaluate our framework using the platooning use case and consider platooning coordination strategies as the adaptation planning strategies to be selected and optimized. Our evaluation shows that the framework is able to select the most appropriate adaptation strategy and learn the situational behavior of the system. Third, we argue that fairness aspects, previously identified as an important characteristic of adaptation planning strategies, are best addressed directly as part of the strategies. Hence, focusing on platooning as an example use case, we propose a set of fairness mechanisms to balance positive and negative effects of platooning among all participants in a platoon. We design six vehicle sequence rotation mechanisms that continuously change the leader position among all participants, as this is the position with the least positive effects. We analyze these strategies on roads of different sizes and with different traffic volumes, and show that these mechanisms should also be chosen wisely. Fourth, we address the uncertainty characteristic of adaptation planning strategies and propose a methodology to account for uncertainty and also address it directly as part of the adaptation planning strategies. We address the use case of fueling planning along a route associated with highly dynamic fuel prices and develop six utility functions that account for different aspects of route planning. Further, we incorporate uncertainty measures for dynamic fuel prices by adding penalties for longer travel times or greater distance to the next gas station. Through this approach, we are able to reduce the uncertainty at planning time and obtain a more robust route planning. Fifth, we analyze optimization of nested systems-of-systems for the use case rVRP. Before proposing an approach to deal with the complex structure of the problem, we analyze important constraints and objectives that need to be considered when formulating a real-world rVRP. Then, we propose a two-stage workflow to optimize both systems individually, flexibly, and interchangeably. We apply Genetic Algorithms and Ant Colony Optimization (ACO) to both nested systems and compare the performance of our workflow with state-of-the-art optimization algorithms for this use case. In our evaluation, we show that the proposed two-stage workflow is able to handle the complex structure of the problem and consider all real-world constraints and objectives. Finally, we study coexisting systems-of-systems by optimizing typical processes in mezzanine warehouses. We first define which ergonomic and economic constraints and objectives must be considered when addressing a real-world problem. Then, we analyze the interrelatedness of the storage assignment and order picking problems; we identify opportunities to design optimization approaches that optimize all objectives and aim for a good overall system performance, taking into account the interdependence of both systems. We use the NSGA-II for storage assignment and Ant Colony Optimization (ACO) for order picking and adapt them to the specific requirements of horizontal systems-of-systems. In our evaluation, we compare our approaches to state-of-the-art approaches in mezzanine warehouses and show that our proposed approaches increase the system performance. Our proposed approaches provide important contributions to both academic research and practical applications. To the best of our knowledge, we are the first to design a self-aware optimization framework for adaptation planning strategies that integrates situation-awareness, algorithm selection, parameter tuning, as well as learning and reasoning. Our evaluation of platooning coordination shows promising results for the application of the framework. Moreover, our proposed strategies to compensate for negative effects of platooning represent an important milestone, which could lead to higher acceptance of this technology in society and support its future adoption in the real world. The proposed methodology and utility functions that address uncertainty are an important step to improving the capabilities of SAS in an increasingly turbulent environment. Similarly, our contributions to systems-of-systems optimization are major contributions to the state of logistics and systems-of-systems research. Finally, we select real-world use cases for the application of our approaches and cooperate with industrial partners, which highlights the practical relevance of our contributions. The reduction of manual effort and required expert knowledge in our self-aware optimization framework is a milestone in bridging the gap between academia and practice. One of our partners integrated the two-stage approach to tackling the rVRP into its software system, improving both time to solution and solution quality. In conclusion, the contributions of this thesis have spawned several research projects such as a long-term industrial project on optimizing tours and routes in parcel delivery funded by Bayerisches Verbundforschungsprogramm (BayVFP) – Digitalisierung and further collaborations, opening up many promising avenues for future research.

Optimizing Crossings in Circular-Arc Drawings and Circular Layouts (2022)

Kryven, Myroslav

A graph is an abstract network that represents a set of objects, called vertices, and relations between these objects, called edges. Graphs can model various networks. For example, a social network where the vertices correspond to users of the network and the edges represent relations between the users. To better see the structure of a graph it is helpful to visualize it. A standard visualization is a node-link diagram in the Euclidean plane. In such a representation the vertices are drawn as points in the plane and edges are drawn as Jordan curves between every two vertices connected by an edge. Edge crossings decrease the readability of a drawing, therefore, Crossing Optimization is a fundamental problem in Computer Science. This book explores the research frontiers and introduces novel approaches in Crossing Optimization.

Model Learning for Performance Prediction of Cloud-native Microservice Applications (2022)

Grohmann, Johannes Sebastian

One consequence of the recent coronavirus pandemic is increased demand and use of online services around the globe. At the same time, performance requirements for modern technologies are becoming more stringent as users become accustomed to higher standards. These increased performance and availability requirements, coupled with the unpredictable usage growth, are driving an increasing proportion of applications to run on public cloud platforms as they promise better scalability and reliability. With data centers already responsible for about one percent of the world's power consumption, optimizing resource usage is of paramount importance. Simultaneously, meeting the increasing and changing resource and performance requirements is only possible by optimizing resource management without introducing additional overhead. This requires the research and development of new modeling approaches to understand the behavior of running applications with minimal information. However, the emergence of modern software paradigms makes it increasingly difficult to derive such models and renders previous performance modeling techniques infeasible. Modern cloud applications are often deployed as a collection of fine-grained and interconnected components called microservices. Microservice architectures offer massive benefits but also have broad implications for the performance characteristics of the respective systems. In addition, the microservices paradigm is typically paired with a DevOps culture, resulting in frequent application and deployment changes. Such applications are often referred to as cloud-native applications. In summary, the increasing use of ever-changing cloud-hosted microservice applications introduces a number of unique challenges for modeling the performance of modern applications. These include the amount, type, and structure of monitoring data, frequent behavioral changes, or infrastructure variabilities. This violates common assumptions of the state of the art and opens a research gap for our work. In this thesis, we present five techniques for automated learning of performance models for cloud-native software systems. We achieve this by combining machine learning with traditional performance modeling techniques. Unlike previous work, our focus is on cloud-hosted and continuously evolving microservice architectures, so-called cloud-native applications. Therefore, our contributions aim to solve the above challenges to deliver automated performance models with minimal computational overhead and no manual intervention. Depending on the cloud computing model, privacy agreements, or monitoring capabilities of each platform, we identify different scenarios where performance modeling, prediction, and optimization techniques can provide great benefits. Specifically, the contributions of this thesis are as follows: Monitorless: Application-agnostic prediction of performance degradations. To manage application performance with only platform-level monitoring, we propose Monitorless, the first truly application-independent approach to detecting performance degradation. We use machine learning to bridge the gap between platform-level monitoring and application-specific measurements, eliminating the need for application-level monitoring. Monitorless creates a single and holistic resource saturation model that can be used for heterogeneous and untrained applications. Results show that Monitorless infers resource-based performance degradation with 97% accuracy. Moreover, it can achieve similar performance to typical autoscaling solutions, despite using less monitoring information. SuanMing: Predicting performance degradation using tracing. We introduce SuanMing to mitigate performance issues before they impact the user experience. This contribution is applied in scenarios where tracing tools enable application-level monitoring. SuanMing predicts explainable causes of expected performance degradations and prevents performance degradations before they occur. Evaluation results show that SuanMing can predict and pinpoint future performance degradations with an accuracy of over 90%. SARDE: Continuous and autonomous estimation of resource demands. We present SARDE to learn application models for highly variable application deployments. This contribution focuses on the continuous estimation of application resource demands, a key parameter of performance models. SARDE represents an autonomous ensemble estimation technique. It dynamically and continuously optimizes, selects, and executes an ensemble of approaches to estimate resource demands in response to changes in the application or its environment. Through continuous online adaptation, SARDE efficiently achieves an average resource demand estimation error of 15.96% in our evaluation. DepIC: Learning parametric dependencies from monitoring data. DepIC utilizes feature selection techniques in combination with an ensemble regression approach to automatically identify and characterize parametric dependencies. Although parametric dependencies can massively improve the accuracy of performance models, DepIC is the first approach to automatically learn such parametric dependencies from passive monitoring data streams. Our evaluation shows that DepIC achieves 91.7% precision in identifying dependencies and reduces the characterization prediction error by 30% compared to the best individual approach. Baloo: Modeling the configuration space of databases. To study the impact of different configurations within distributed DBMSs, we introduce Baloo. Our last contribution models the configuration space of databases considering measurement variabilities in the cloud. More specifically, Baloo dynamically estimates the required benchmarking measurements and automatically builds a configuration space model of a given DBMS. Our evaluation of Baloo on a dataset consisting of 900 configuration points shows that the framework achieves a prediction error of less than 11% while saving up to 80% of the measurement effort. Although the contributions themselves are orthogonally aligned, taken together they provide a holistic approach to performance management of modern cloud-native microservice applications. Our contributions are a significant step forward as they specifically target novel and cloud-native software development and operation paradigms, surpassing the capabilities and limitations of previous approaches. In addition, the research presented in this paper also has a significant impact on the industry, as the contributions were developed in collaboration with research teams from Nokia Bell Labs, Huawei, and Google. Overall, our solutions open up new possibilities for managing and optimizing cloud applications and improve cost and energy efficiency.

Performance Evaluation of Next-Generation Data Plane Architectures and their Components (2022)

Geißler, Stefan

In this doctoral thesis we cover the performance evaluation of next generation data plane architectures, comprised of complex software as well as programmable hardware components that allow fine granular configuration. In the scope of the thesis we propose mechanisms to monitor the performance of singular components and model key performance indicators of software based packet processing solutions. We present novel approaches towards network abstraction that allow the integration of heterogeneous data plane technologies into a singular network while maintaining total transparency between control and data plane. Finally, we investigate a full, complex system consisting of multiple software-based solutions and perform a detailed performance analysis. We employ simulative approaches to investigate overload control mechanisms that allow efficient operation under adversary conditions. The contributions of this work build the foundation for future research in the areas of network softwarization and network function virtualization.

Increasing the effectiveness of human-computer interfaces for mental health interventions (2022)

Gall, Dominik

Human-computer interfaces have the potential to support mental health practitioners in alleviating mental distress. Adaption of this technology in practice is, however, slow. We provide means to extend the design space of human-computer interfaces for mitigating mental distress. To this end, we suggest three complementary approaches: using presentation technology, using virtual environments, and using communication technology to facilitate social interaction. We provide new evidence that elementary aspects of presentation technology affect the emotional processing of virtual stimuli, that perception of our environment affects the way we assess our environment, and that communication technologies affect social bonding between users. By showing how interfaces modify emotional reactions and facilitate social interaction, we provide converging evidence that human-computer interfaces can help alleviate mental distress. These findings may advance the goal of adapting technological means to the requirements of mental health practitioners.

1 to 10

Refine

Has Fulltext

Is part of the Bibliography

Year of publication

Document Type

Language

Keywords

Author

Institute

13 search hits