Refine
Has Fulltext
- yes (14)
Is part of the Bibliography
- yes (14)
Document Type
- Doctoral Thesis (14)
Language
- English (14)
Keywords
- Cloud Computing (6)
- Leistungsbewertung (4)
- Benchmarking (3)
- Auto-Scaling (2)
- Energieeffizienz (2)
- Energy Efficiency (2)
- Forecasting (2)
- Metrics (2)
- Modellierung (2)
- Prognose (2)
Institute
Over the last decades, cybersecurity has become an increasingly important issue. Between 2019 and 2011 alone, the losses from cyberattacks in the United States grew by 6217%. At the same time, attacks became not only more intensive but also more and more versatile and diverse. Cybersecurity has become everyone’s concern. Today, service providers require sophisticated and extensive security infrastructures comprising many security functions dedicated to various cyberattacks. Still, attacks become more violent to a level where infrastructures can no longer keep up. Simply scaling up is no longer sufficient. To address this challenge, in a whitepaper, the Cloud Security Alliance (CSA) proposed multiple work packages for security infrastructure, leveraging the possibilities of Software-defined Networking (SDN) and Network Function Virtualization (NFV).
Security functions require a more sophisticated modeling approach than regular network functions. Notably, the property to drop packets deemed malicious has a significant impact on Security Service Function Chains (SSFCs)—service chains consisting of multiple security functions to protect against multiple at- tack vectors. Under attack, the order of these chains influences the end-to-end system performance depending on the attack type. Unfortunately, it is hard to predict the attack composition at system design time. Thus, we make a case for dynamic attack-aware SSFC reordering. Also, we tackle the issues of the lack of integration between security functions and the surrounding network infrastructure, the insufficient use of short term CPU frequency boosting, and the lack of Intrusion Detection and Prevention Systems (IDPS) against database ransomware attacks.
Current works focus on characterizing the performance of security functions and their behavior under overload without considering the surrounding infrastructure. Other works aim at replacing security functions using network infrastructure features but do not consider integrating security functions within the network. Further publications deal with using SDN for security or how to deal with new vulnerabilities introduced through SDN. However, they do not take security function performance into account. NFV is a popular field for research dealing with frameworks, benchmarking methods, the combination with SDN, and implementing security functions as Virtualized Network
Functions (VNFs). Research in this area brought forth the concept of Service Function Chains (SFCs) that chain multiple network functions after one another. Nevertheless, they still do not consider the specifics of security functions. The mentioned CSA whitepaper proposes many valuable ideas but leaves their realization open to others.
This thesis presents solutions to increase the performance of single security functions using SDN, performance modeling, a framework for attack-aware SSFC reordering, a solution to make better use of CPU frequency boosting, and an IDPS against database ransomware.
Specifically, the primary contributions of this work are:
• We present approaches to dynamically bypass Intrusion Detection Systems (IDS) in order to increase their performance without reducing the security level. To this end, we develop and implement three SDN-based approaches (two dynamic and one static).
We evaluate the proposed approaches regarding security and performance and show that they significantly increase the performance com- pared to an inline IDS without significant security deficits. We show that using software switches can further increase the performance of the dynamic approaches up to a point where they can eliminate any throughput drawbacks when using the IDS.
• We design a DDoS Protection System (DPS) against TCP SYN flood at tacks in the form of a VNF that works inside an SDN-enabled network. This solution eliminates known scalability and performance drawbacks of existing solutions for this attack type.
Then, we evaluate this solution showing that it correctly handles the connection establishment and present solutions for an observed issue. Next, we evaluate the performance showing that our solution increases performance up to three times. Parallelization and parameter tuning yields another 76% performance boost. Based on these findings, we discuss optimal deployment strategies.
• We introduce the idea of attack-aware SSFC reordering and explain its impact in a theoretical scenario. Then, we discuss the required information to perform this process.
We validate our claim of the importance of the SSFC order by analyzing the behavior of single security functions and SSFCs. Based on the results, we conclude that there is a massive impact on the performance up to three orders of magnitude, and we find contradicting optimal orders
for different workloads. Thus, we demonstrate the need for dynamic reordering.
Last, we develop a model for SSFC regarding traffic composition and resource demands. We classify the traffic into multiple classes and model the effect of single security functions on the traffic and their generated resource demands as functions of the incoming network traffic. Based on our model, we propose three approaches to determine optimal orders for reordering.
• We implement a framework for attack-aware SSFC reordering based on this knowledge. The framework places all security functions inside an SDN-enabled network and reorders them using SDN flows.
Our evaluation shows that the framework can enforce all routes as desired. It correctly adapts to all attacks and returns to the original state after the attacks cease. We find possible security issues at the moment of reordering and present solutions to eliminate them.
• Next, we design and implement an approach to load balance servers while taking into account their ability to go into a state of Central Processing Unit (CPU) frequency boost. To this end, the approach collects temperature information from available hosts and places services on the host that can attain the boosted mode the longest.
We evaluate this approach and show its effectiveness. For high load scenarios, the approach increases the overall performance and the performance per watt. Even better results show up for low load workloads, where not only all performance metrics improve but also the temperatures and total power consumption decrease.
• Last, we design an IDPS protecting against database ransomware attacks that comprise multiple queries to attain their goal. Our solution models these attacks using a Colored Petri Net (CPN).
A proof-of-concept implementation shows that our approach is capable of detecting attacks without creating false positives for benign scenarios. Furthermore, our solution creates only a small performance impact.
Our contributions can help to improve the performance of security infrastructures. We see multiple application areas from data center operators over software and hardware developers to security and performance researchers. Most of the above-listed contributions found use in several research publications.
Regarding future work, we see the need to better integrate SDN-enabled security functions and SSFC reordering in data center networks. Future SSFC should discriminate between different traffic types, and security frameworks should support automatically learning models for security functions. We see the need to consider energy efficiency when regarding SSFCs and take CPU boosting technologies into account when designing performance models as well as placement, scaling, and deployment strategies. Last, for a faster adaptation against recent ransomware attacks, we propose machine-assisted learning for database IDPS signatures.
These days, we are living in a digitalized world. Both our professional and private lives are pervaded by various IT services, which are typically operated using distributed computing systems (e.g., cloud environments). Due to the high level of digitalization, the operators of such systems are confronted with fast-paced and changing requirements. In particular, cloud environments have to cope with load fluctuations and respective rapid and unexpected changes in the computing resource demands. To face this challenge, so-called auto-scalers, such as the threshold-based mechanism in Amazon Web Services EC2, can be employed to enable elastic scaling of the computing resources. However, despite this opportunity, business-critical applications are still run with highly overprovisioned resources to guarantee a stable and reliable service operation. This strategy is pursued due to the lack of trust in auto-scalers and the concern that inaccurate or delayed adaptations may result in financial losses.
To adapt the resource capacity in time, the future resource demands must be "foreseen", as reacting to changes once they are observed introduces an inherent delay. In other words, accurate forecasting methods are required to adapt systems proactively. A powerful approach in this context is time series forecasting, which is also applied in many other domains. The core idea is to examine past values and predict how these values will evolve as time progresses. According to the "No-Free-Lunch Theorem", there is no algorithm that performs best for all scenarios. Therefore, selecting a suitable forecasting method for a given use case is a crucial task. Simply put, each method has its benefits and drawbacks, depending on the specific use case. The choice of the forecasting method is usually based on expert knowledge, which cannot be fully automated, or on trial-and-error. In both cases, this is expensive and prone to error.
Although auto-scaling and time series forecasting are established research fields, existing approaches cannot fully address the mentioned challenges: (i) In our survey on time series forecasting, we found that publications on time series forecasting typically consider only a small set of (mostly related) methods and evaluate their performance on a small number of time series with only a few error measures while providing no information on the execution time of the studied methods. Therefore, such articles cannot be used to guide the choice of an appropriate method for a particular use case; (ii) Existing open-source hybrid forecasting methods that take advantage of at least two methods to tackle the "No-Free-Lunch Theorem" are computationally intensive, poorly automated, designed for a particular data set, or they lack a predictable time-to-result. Methods exhibiting a high variance in the time-to-result cannot be applied for time-critical scenarios (e.g., auto-scaling), while methods tailored to a specific data set introduce restrictions on the possible use cases (e.g., forecasting only annual time series); (iii) Auto-scalers typically scale an application either proactively or reactively. Even though some hybrid auto-scalers exist, they lack sophisticated solutions to combine reactive and proactive scaling. For instance, resources are only released proactively while resource allocation is entirely done in a reactive manner (inherently delayed); (iv) The majority of existing mechanisms do not take the provider's pricing scheme into account while scaling an application in a public cloud environment, which often results in excessive charged costs. Even though some cost-aware auto-scalers have been proposed, they only consider the current resource demands, neglecting their development over time. For example, resources are often shut down prematurely, even though they might be required again soon.
To address the mentioned challenges and the shortcomings of existing work, this thesis presents three contributions: (i) The first contribution-a forecasting benchmark-addresses the problem of limited comparability between existing forecasting methods; (ii) The second contribution-Telescope-provides an automated hybrid time series forecasting method addressing the challenge posed by the "No-Free-Lunch Theorem"; (iii) The third contribution-Chamulteon-provides a novel hybrid auto-scaler for coordinated scaling of applications comprising multiple services, leveraging Telescope to forecast the workload intensity as a basis for proactive resource provisioning. In the following, the three contributions of the thesis are summarized:
Contribution I - Forecasting Benchmark
To establish a level playing field for evaluating the performance of forecasting methods in a broad setting, we propose a novel benchmark that automatically evaluates and ranks forecasting methods based on their performance in a diverse set of evaluation scenarios. The benchmark comprises four different use cases, each covering 100 heterogeneous time series taken from different domains. The data set was assembled from publicly available time series and was designed to exhibit much higher diversity than existing forecasting competitions. Besides proposing a new data set, we introduce two new measures that describe different aspects of a forecast. We applied the developed benchmark to evaluate Telescope.
Contribution II - Telescope
To provide a generic forecasting method, we introduce a novel machine learning-based forecasting approach that automatically retrieves relevant information from a given time series. More precisely, Telescope automatically extracts intrinsic time series features and then decomposes the time series into components, building a forecasting model for each of them. Each component is forecast by applying a different method and then the final forecast is assembled from the forecast components by employing a regression-based machine learning algorithm. In more than 1300 hours of experiments benchmarking 15 competing methods (including approaches from Uber and Facebook) on 400 time series, Telescope outperformed all methods, exhibiting the best forecast accuracy coupled with a low and reliable time-to-result. Compared to the competing methods that exhibited, on average, a forecast error (more precisely, the symmetric mean absolute forecast error) of 29%, Telescope exhibited an error of 20% while being 2556 times faster. In particular, the methods from Uber and Facebook exhibited an error of 48% and 36%, and were 7334 and 19 times slower than Telescope, respectively.
Contribution III - Chamulteon
To enable reliable auto-scaling, we present a hybrid auto-scaler that combines proactive and reactive techniques to scale distributed cloud applications comprising multiple services in a coordinated and cost-effective manner. More precisely, proactive adaptations are planned based on forecasts of Telescope, while reactive adaptations are triggered based on actual observations of the monitored load intensity. To solve occurring conflicts between reactive and proactive adaptations, a complex conflict resolution algorithm is implemented. Moreover, when deployed in public cloud environments, Chamulteon reviews adaptations with respect to the cloud provider's pricing scheme in order to minimize the charged costs. In more than 400 hours of experiments evaluating five competing auto-scaling mechanisms in scenarios covering five different workloads, four different applications, and three different cloud environments, Chamulteon exhibited the best auto-scaling performance and reliability while at the same time reducing the charged costs. The competing methods provided insufficient resources for (on average) 31% of the experimental time; in contrast, Chamulteon cut this time to 8% and the SLO (service level objective) violations from 18% to 6% while using up to 15% less resources and reducing the charged costs by up to 45%.
The contributions of this thesis can be seen as major milestones in the domain of time series forecasting and cloud resource management. (i) This thesis is the first to present a forecasting benchmark that covers a variety of different domains with a high diversity between the analyzed time series. Based on the provided data set and the automatic evaluation procedure, the proposed benchmark contributes to enhance the comparability of forecasting methods. The benchmarking results for different forecasting methods enable the selection of the most appropriate forecasting method for a given use case. (ii) Telescope provides the first generic and fully automated time series forecasting approach that delivers both accurate and reliable forecasts while making no assumptions about the analyzed time series. Hence, it eliminates the need for expensive, time-consuming, and error-prone procedures, such as trial-and-error searches or consulting an expert. This opens up new possibilities especially in time-critical scenarios, where Telescope can provide accurate forecasts with a short and reliable time-to-result.
Although Telescope was applied for this thesis in the field of cloud computing, there is absolutely no limitation regarding the applicability of Telescope in other domains, as demonstrated in the evaluation. Moreover, Telescope, which was made available on GitHub, is already used in a number of interdisciplinary data science projects, for instance, predictive maintenance in an Industry 4.0 context, heart failure prediction in medicine, or as a component of predictive models of beehive development. (iii) In the context of cloud resource management, Chamulteon is a major milestone for increasing the trust in cloud auto-scalers. The complex resolution algorithm enables reliable and accurate scaling behavior that reduces losses caused by excessive resource allocation or SLO violations. In other words, Chamulteon provides reliable online adaptations minimizing charged costs while at the same time maximizing user experience.
Automation in Software Performance Engineering Based on a Declarative Specification of Concerns
(2019)
Software performance is of particular relevance to software system design, operation, and evolution because it has a significant impact on key business indicators. During the life-cycle of a software system, its implementation, configuration, and deployment are subject to multiple changes that may affect the end-to-end performance characteristics. Consequently, performance analysts continually need to provide answers to and act based on performance-relevant concerns. To ensure a desired level of performance, software performance engineering provides a plethora of methods, techniques, and tools for measuring, modeling, and evaluating performance properties of software systems. However, the answering of performance concerns is subject to a significant semantic gap between the level on which performance concerns are formulated and the technical level on which performance evaluations are actually conducted. Performance evaluation approaches come with different strengths and limitations concerning, for example, accuracy, time-to-result, or system overhead. For the involved stakeholders, it can be an elaborate process to reasonably select, parameterize and correctly apply performance evaluation approaches, and to filter and interpret the obtained results. An additional challenge is that available performance evaluation artifacts may change over time, which requires to switch between different measurement-based and model-based performance evaluation approaches during the system evolution. At model-based analysis, the effort involved in creating performance models can also outweigh their benefits.
To overcome the deficiencies and enable an automatic and holistic evaluation of performance throughout the software engineering life-cycle requires an approach that: (i) integrates multiple types of performance concerns and evaluation approaches, (ii) automates performance model creation, and (iii) automatically selects an evaluation methodology tailored to a specific scenario. This thesis presents a declarative approach —called Declarative Performance Engineering (DPE)— to automate performance evaluation based on a humanreadable specification of performance-related concerns. To this end, we separate the definition of performance concerns from their solution. The primary scientific contributions presented in this thesis are:
A declarative language to express performance-related concerns and a corresponding processing framework:
We provide a language to specify performance concerns independent of a concrete performance evaluation approach. Besides the specification of functional aspects, the language allows to include non-functional tradeoffs optionally. To answer these concerns, we provide a framework architecture and a corresponding reference implementation to process performance concerns automatically. It allows to integrate arbitrary performance evaluation approaches and is accompanied by reference implementations for model-based and measurement-based performance evaluation.
Automated creation of architectural performance models from execution traces:
The creation of performance models can be subject to significant efforts outweighing the benefits of model-based performance evaluation. We provide a model extraction framework that creates architectural performance models based on execution traces, provided by monitoring tools.The framework separates the derivation of generic information from model creation routines. To derive generic information, the framework combines state-of-the-art extraction and estimation techniques. We isolate object creation routines specified in a generic model builder interface based on concepts present in multiple performance-annotated architectural modeling formalisms. To create model extraction for a novel performance modeling formalism, developers only need to write object creation routines instead of creating model extraction software from scratch when reusing the generic framework.
Automated and extensible decision support for performance evaluation approaches:
We present a methodology and tooling for the automated selection of a performance evaluation approach tailored to the user concerns and application scenario. To this end, we propose to decouple the complexity of selecting a performance evaluation approach for a given scenario by providing solution approach capability models and a generic decision engine. The proposed capability meta-model enables to describe functional and non-functional capabilities of performance evaluation approaches and tools at different granularities. In contrast to existing tree-based decision support mechanisms, the decoupling approach allows to easily update characteristics of solution approaches as well as appending new rating criteria and thereby stay abreast of evolution in performance evaluation tooling and system technologies.
Time-to-result estimation for model-based performance prediction:
The time required to execute a model-based analysis plays an important role in different decision processes. For example, evaluation scenarios might require the prediction results to be available in a limited period of time such that the system can be adapted in time to ensure the desired quality of service. We propose a method to estimate the time-to-result for modelbased performance prediction based on model characteristics and analysis parametrization. We learn a prediction model using performancerelevant features thatwe determined using statistical tests. We implement the approach and demonstrate its practicability by applying it to analyze a simulation-based multi-step performance evaluation approach for a representative architectural performance modeling formalism.
We validate each of the contributions based on representative case studies. The evaluation of automatic performance model extraction for two case study systems shows that the resulting models can accurately predict the performance behavior. Prediction accuracy errors are below 3% for resource utilization and mostly less than 20% for service response time. The separate evaluation of the reusability shows that the presented approach lowers the implementation efforts for automated model extraction tools by up to 91%. Based on two case studies applying measurement-based and model-based performance evaluation techniques, we demonstrate the suitability of the declarative performance engineering framework to answer multiple kinds of performance concerns customized to non-functional goals. Subsequently, we discuss reduced efforts in applying performance analyses using the integrated and automated declarative approach. Also, the evaluation of the declarative framework reviews benefits and savings integrating performance evaluation approaches into the declarative performance engineering framework. We demonstrate the applicability of the decision framework for performance evaluation approaches by applying it to depict existing decision trees. Then, we show how we can quickly adapt to the evolution of performance evaluation methods which is challenging for static tree-based decision support systems. At this, we show how to cope with the evolution of functional and non-functional capabilities of performance evaluation software and explain how to integrate new approaches. Finally, we evaluate the accuracy of the time-to-result estimation for a set of machinelearning algorithms and different training datasets. The predictions exhibit a mean percentage error below 20%, which can be further improved by including performance evaluations of the considered model into the training data. The presented contributions represent a significant step towards an integrated performance engineering process that combines the strengths of model-based and measurement-based performance evaluation. The proposed performance concern language in conjunction with the processing framework significantly reduces the complexity of applying performance evaluations for all stakeholders. Thereby it enables performance awareness throughout the software engineering life-cycle. The proposed performance concern language removes the semantic gap between the level on which performance concerns are formulated and the technical level on which performance evaluations are actually conducted by the user.
Virtualization allows the creation of virtual instances of physical devices, such as network and processing units. In a virtualized system, governed by a hypervisor, resources are shared among virtual machines (VMs). Virtualization has been receiving increasing interest as away to reduce costs through server consolidation and to enhance the flexibility of physical infrastructures. Although virtualization provides many benefits, it introduces new security challenges; that is, the introduction of a hypervisor introduces threats since hypervisors expose new attack surfaces.
Intrusion detection is a common cyber security mechanism whose task is to detect malicious activities in host and/or network environments. This enables timely reaction in order to stop an on-going attack, or to mitigate the impact of a security breach. The wide adoption of virtualization has resulted in the increasingly common practice of deploying conventional intrusion detection systems (IDSs), for example, hardware IDS appliances or common software-based IDSs, in designated VMs as virtual network functions (VNFs). In addition, the research and industrial communities have developed IDSs specifically designed to operate in virtualized environments (i.e., hypervisorbased IDSs), with components both inside the hypervisor and in a designated VM. The latter are becoming increasingly common with the growing proliferation of virtualized data centers and the adoption of the cloud computing paradigm, for which virtualization is as a key enabling technology.
To minimize the risk of security breaches, methods and techniques for evaluating IDSs in an accurate manner are essential. For instance, one may compare different IDSs in terms of their attack detection accuracy in order to identify and deploy the IDS that operates optimally in a given environment, thereby reducing the risks of a security breach. However, methods and techniques for realistic and accurate evaluation of the attack detection accuracy of IDSs in virtualized environments (i.e., IDSs deployed as VNFs or hypervisor-based IDSs) are lacking. That is, workloads that exercise the sensors of an evaluated IDS and contain attacks targeting hypervisors are needed. Attacks targeting hypervisors are of high severity since they may result in, for example, altering the hypervisors’s memory and thus enabling the execution of malicious code with hypervisor privileges. In addition, there are no metrics and measurement methodologies
for accurately quantifying the attack detection accuracy of IDSs in virtualized environments with elastic resource provisioning (i.e., on-demand allocation or deallocation of virtualized hardware resources to VMs). Modern hypervisors allow for hotplugging virtual CPUs and memory on the designated VM where the intrusion detection engine of hypervisor-based IDSs, as well as of IDSs deployed as VNFs, typically operates. Resource hotplugging may have a significant impact on the attack detection accuracy of an evaluated IDS, which is not taken into account by existing metrics for quantifying IDS attack detection accuracy. This may lead to inaccurate measurements, which, in turn, may result in the deployment of misconfigured or ill-performing IDSs, increasing
the risk of security breaches.
This thesis presents contributions that span the standard components of any system
evaluation scenario: workloads, metrics, and measurement methodologies. The scientific contributions of this thesis are:
A comprehensive systematization of the common practices and the state-of-theart on IDS evaluation. This includes: (i) a definition of an IDS evaluation design space allowing to put existing practical and theoretical work into a common context in a systematic manner; (ii) an overview of common practices in IDS evaluation reviewing evaluation approaches and methods related to each part of the design space; (iii) and a set of case studies demonstrating how different IDS evaluation approaches are applied in practice. Given the significant amount of existing practical and theoretical work related to IDS evaluation, the presented systematization is beneficial for improving the general understanding of the topic by providing an overview of the current state of the field. In addition, it is beneficial for identifying and contrasting advantages and disadvantages of different IDS evaluation methods and practices, while also helping to identify specific requirements and best practices for evaluating current and future IDSs.
An in-depth analysis of common vulnerabilities of modern hypervisors as well as a set of attack models capturing the activities of attackers triggering these vulnerabilities. The analysis includes 35 representative vulnerabilities of hypercall handlers (i.e., hypercall vulnerabilities). Hypercalls are software traps from a kernel of a VM to the hypervisor. The hypercall interface of hypervisors, among device drivers and VM exit events, is one of the attack surfaces that hypervisors expose. Triggering a hypercall vulnerability may lead to a crash of the hypervisor or to altering the hypervisor’s memory. We analyze the origins
of the considered hypercall vulnerabilities, demonstrate and analyze possible attacks that trigger them (i.e., hypercall attacks), develop hypercall attack models(i.e., systematized activities of attackers targeting the hypercall interface), and discuss future research directions focusing on approaches for securing hypercall interfaces.
A novel approach for evaluating IDSs enabling the generation of workloads that contain attacks targeting hypervisors, that is, hypercall attacks. We propose an approach for evaluating IDSs using attack injection (i.e., controlled execution of attacks during regular operation of the environment where an IDS under test is deployed). The injection of attacks is performed based on attack models that capture realistic attack scenarios. We use the hypercall attack models developed as part of this thesis for injecting hypercall attacks.
A novel metric and measurement methodology for quantifying the attack detection accuracy of IDSs in virtualized environments that feature elastic resource provisioning. We demonstrate how the elasticity of resource allocations in such environments may impact the IDS attack detection accuracy and show that using existing metrics in such environments may lead to practically challenging and inaccurate measurements. We also demonstrate the practical use of the metric we propose through a set of case studies, where we evaluate common conventional IDSs deployed as VNFs.
In summary, this thesis presents the first systematization of the state-of-the-art on IDS evaluation, considering workloads, metrics and measurement methodologies as integral parts of every IDS evaluation approach. In addition, we are the first to examine the hypercall attack surface of hypervisors in detail and to propose an approach using attack injection for evaluating IDSs in virtualized environments. Finally, this thesis presents the first metric and measurement methodology for quantifying the attack detection accuracy of IDSs in virtualized environments that feature elastic resource provisioning.
From a technical perspective, as part of the proposed approach for evaluating IDSsthis thesis presents hInjector, a tool for injecting hypercall attacks. We designed hInjector to enable the rigorous, representative, and practically feasible evaluation of IDSs using attack injection. We demonstrate the application and practical usefulness of hInjector, as well as of the proposed approach, by evaluating a representative hypervisor-based IDS designed to detect hypercall attacks. While we focus on evaluating the capabilities of IDSs to detect hypercall attacks, the proposed IDS evaluation approach can be generalized and applied in a broader context. For example, it may be directly used to also evaluate security mechanisms of hypervisors, such as hypercall access control (AC) mechanisms. It may also be applied to evaluate the capabilities
of IDSs to detect attacks involving operations that are functionally similar to hypercalls,
for example, the input/output control (ioctl) calls that the Kernel-based Virtual Machine (KVM) hypervisor supports. For IDSs in virtualized environments featuring elastic resource provisioning, our approach for injecting hypercall attacks can be applied in combination with the attack detection accuracy metric and measurement methodology we propose. Our approach for injecting hypercall attacks, and our metric and measurement methodology, can also be applied independently beyond the scenarios considered in this thesis. The wide spectrum of security mechanisms in virtualized environments whose evaluation can directly benefit from the contributions of this thesis (e.g., hypervisor-based IDSs, IDSs deployed as VNFs, and AC mechanisms) reflects the practical implication of the thesis.
Nowadays, data centers are becoming increasingly dynamic due to the common adoption of virtualization technologies. Systems can scale their capacity on demand by growing and shrinking their resources dynamically based on the current load. However, the complexity and performance of modern data centers is influenced not only by the software architecture, middleware, and computing resources, but also by network virtualization, network protocols, network services, and configuration. The field of network virtualization is not as mature as server virtualization and there are multiple competing approaches and technologies. Performance modeling and prediction techniques provide a powerful tool to analyze the performance of modern data centers. However, given the wide variety of network virtualization approaches, no common approach exists for modeling and evaluating the performance of virtualized networks.
The performance community has proposed multiple formalisms and models for evaluating the performance of infrastructures based on different network virtualization technologies. The existing performance models can be divided into two main categories: coarse-grained analytical models and highly-detailed simulation models. Analytical performance models are normally defined at a high level of abstraction and thus they abstract many details of the real network and therefore have limited predictive power. On the other hand, simulation models are normally focused on a selected networking technology and take into account many specific performance influencing factors, resulting in detailed models that are tightly bound to a given technology, infrastructure setup, or to a given protocol stack.
Existing models are inflexible, that means, they provide a single solution method without providing means for the user to influence the solution accuracy and solution overhead. To allow for flexibility in the performance prediction, the user is required to build multiple different performance models obtaining multiple performance predictions. Each performance prediction may then have different focus, different performance metrics, prediction accuracy, and solving time.
The goal of this thesis is to develop a modeling approach that does not require the user to have experience in any of the applied performance modeling formalisms. The approach offers the flexibility in the modeling and analysis by balancing between: (a) generic character and low overhead of coarse-grained analytical models, and (b) the more detailed simulation models with higher prediction accuracy.
The contributions of this thesis intersect with technologies and research areas, such as: software engineering, model-driven software development, domain-specific modeling, performance modeling and prediction, networking and data center networks, network virtualization, Software-Defined Networking (SDN), Network Function Virtualization (NFV). The main contributions of this thesis compose the Descartes Network Infrastructure (DNI) approach and include:
• Novel modeling abstractions for virtualized network infrastructures. This includes two meta-models that define modeling languages for modeling data center network performance. The DNI and miniDNI meta-models provide means for representing network infrastructures at two different abstraction levels. Regardless of which variant of the DNI meta-model is used, the modeling language provides generic modeling elements allowing to describe the majority of existing and future network technologies, while at the same time abstracting factors that have low influence on the overall performance. I focus on SDN and NFV as examples of modern virtualization technologies.
• Network deployment meta-model—an interface between DNI and other meta- models that allows to define mapping between DNI and other descriptive models. The integration with other domain-specific models allows capturing behaviors that are not reflected in the DNI model, for example, software bottlenecks, server virtualization, and middleware overheads.
• Flexible model solving with model transformations. The transformations enable solving a DNI model by transforming it into a predictive model. The model transformations vary in size and complexity depending on the amount of data abstracted in the transformation process and provided to the solver. In this thesis, I contribute six transformations that transform DNI models into various predictive models based on the following modeling formalisms: (a) OMNeT++ simulation, (b) Queueing Petri Nets (QPNs), (c) Layered Queueing Networks (LQNs). For each of these formalisms, multiple predictive models are generated (e.g., models with different level of detail): (a) two for OMNeT++, (b) two for QPNs, (c) two for LQNs. Some predictive models can be solved using multiple alternative solvers resulting in up to ten different automated solving methods for a single DNI model.
• A model extraction method that supports the modeler in the modeling process by automatically prefilling the DNI model with the network traffic data. The contributed traffic profile abstraction and optimization method provides a trade-off by balancing between the size and the level of detail of the extracted profiles.
• A method for selecting feasible solving methods for a DNI model. The method proposes a set of solvers based on trade-off analysis characterizing each transformation with respect to various parameters such as its specific limitations, expected prediction accuracy, expected run-time, required resources in terms of CPU and memory consumption, and scalability.
• An evaluation of the approach in the context of two realistic systems. I evaluate the approach with focus on such factors like: prediction of network capacity and interface throughput, applicability, flexibility in trading-off between prediction accuracy and solving time. Despite not focusing on the maximization of the prediction accuracy, I demonstrate that in the majority of cases, the prediction error is low—up to 20% for uncalibrated models and up to 10% for calibrated models depending on the solving technique.
In summary, this thesis presents the first approach to flexible run-time performance prediction in data center networks, including network based on SDN. It provides ability to flexibly balance between performance prediction accuracy and solving overhead. The approach provides the following key benefits:
• It is possible to predict the impact of changes in the data center network on the performance. The changes include: changes in network topology, hardware configuration, traffic load, and applications deployment.
• DNI can successfully model and predict the performance of multiple different of network infrastructures including proactive SDN scenarios.
• The prediction process is flexible, that is, it provides balance between the granularity of the predictive models and the solving time. The decreased prediction accuracy is usually rewarded with savings of the solving time and consumption of resources required for solving.
• The users are enabled to conduct performance analysis using multiple different prediction methods without requiring the expertise and experience in each of the modeling formalisms.
The components of the DNI approach can be also applied to scenarios that are not considered in this thesis. The approach is generalizable and applicable for the following examples: (a) networks outside of data centers may be analyzed with DNI as long as the background traffic profile is known; (b) uncalibrated DNI models may serve as a basis for design-time performance analysis; (c) the method for extracting and compacting of traffic profiles may be used for other, non-network workloads as well.
Today’s cloud data centers consume an enormous amount of energy, and energy consumption will rise in the future. An estimate from 2012 found that data centers consume about 30 billion watts of power, resulting in about 263TWh of energy usage per year. The energy consumption will rise to 1929TWh until 2030. This projected rise in energy demand is fueled by a growing number of services deployed in the cloud. 50% of enterprise workloads have been migrated to the cloud in the last decade so far. Additionally, an increasing number of devices are using the cloud to provide functionalities and enable data centers to grow. Estimates say more than 75 billion IoT devices will be in use by 2025.
The growing energy demand also increases the amount of CO2 emissions. Assuming a CO2-intensity of 200g CO2 per kWh will get us close to 227 billion tons of CO2. This emission is more than the emissions of all energy-producing power plants in Germany in 2020.
However, data centers consume energy because they respond to service requests that are fulfilled through computing resources. Hence, it is not the users and devices that consume the energy in the data center but the software that controls the hardware. While the hardware is physically consuming energy, it is not always responsible for wasting energy. The software itself plays a vital role in reducing the energy consumption and CO2 emissions of data centers. The scenario of our thesis is, therefore, focused on software development.
Nevertheless, we must first show developers that software contributes to energy consumption by providing evidence of its influence. The second step is to provide methods to assess an application’s power consumption during different phases of the development process and to allow modern DevOps and agile development methods. We, therefore, need to have an automatic selection of system-level energy-consumption models that can accommodate rapid changes in the source code and application-level models allowing developers to locate power-consuming software parts for constant improvements. Afterward, we need emulation to assess the energy efficiency before the actual deployment.
Energy efficiency of computing systems has become an increasingly important issue over the last decades. In 2015, data centers were responsible for 2% of the world's greenhouse gas emissions, which is roughly the same as the amount produced by air travel.
In addition to these environmental concerns, power consumption of servers in data centers results in significant operating costs, which increase by at least 10% each year.
To address this challenge, the U.S. EPA and other government agencies are considering the use of novel measurement methods in order to label the energy efficiency of servers.
The energy efficiency and power consumption of a server is subject to a great number of factors, including, but not limited to, hardware, software stack, workload, and load level.
This huge number of influencing factors makes measuring and rating of energy efficiency challenging. It also makes it difficult to find an energy-efficient server for a specific use-case. Among others, server provisioners, operators, and regulators would profit from information on the servers in question and on the factors that affect those servers' power consumption and efficiency. However, we see a lack of measurement methods and metrics for energy efficiency of the systems under consideration.
Even assuming that a measurement methodology existed, making decisions based on its results would be challenging. Power prediction methods that make use of these results would aid in decision making. They would enable potential server customers to make better purchasing decisions and help operators predict the effects of potential reconfigurations.
Existing energy efficiency benchmarks cannot fully address these challenges, as they only measure single applications at limited sets of load levels. In addition, existing efficiency metrics are not helpful in this context, as they are usually a variation of the simple performance per power ratio, which is only applicable to single workloads at a single load level. Existing data center efficiency metrics, on the other hand, express the efficiency of the data center space and power infrastructure, not focusing on the efficiency of the servers themselves. Power prediction methods for not-yet-available systems that could make use of the results provided by a comprehensive power rating methodology are also lacking. Existing power prediction models for hardware designers have a very fine level of granularity and detail that would not be useful for data center operators.
This thesis presents a measurement and rating methodology for energy efficiency of servers and an energy efficiency metric to be applied to the results of this methodology. We also design workloads, load intensity and distribution models, and mechanisms that can be used for energy efficiency testing. Based on this, we present power prediction mechanisms and models that utilize our measurement methodology and its results for power prediction.
Specifically, the six major contributions of this thesis are:
We present a measurement methodology and metrics for energy efficiency rating of servers that use multiple, specifically chosen workloads at different load levels for a full system characterization.
We evaluate the methodology and metric with regard to their reproducibility, fairness, and relevance. We investigate the power and performance variations of test results and show fairness of the metric through a mathematical proof and a correlation analysis on a set of 385 servers. We evaluate the metric's relevance by showing the relationships that can be established between metric results and third-party applications.
We create models and extraction mechanisms for load profiles that vary over time, as well as load distribution mechanisms and policies. The models are designed to be used to define arbitrary dynamic load intensity profiles that can be leveraged for benchmarking purposes. The load distribution mechanisms place workloads on computing resources in a hierarchical manner.
Our load intensity models can be extracted in less than 0.2 seconds and our resulting models feature a median modeling error of 12.7% on average. In addition, our new load distribution strategy can save up to 10.7% of power consumption on a single server node.
We introduce an approach to create small-scale workloads that emulate the power consumption-relevant behavior of large-scale workloads by approximating their CPU performance counter profile, and we introduce TeaStore, a distributed, micro-service-based reference application. TeaStore can be used to evaluate power and performance model accuracy, elasticity of cloud auto-scalers, and the effectiveness of power saving mechanisms for distributed systems.
We show that we are capable of emulating the power consumption behavior of realistic workloads with a mean deviation less than 10% and down to 0.2 watts (1%). We demonstrate the use of TeaStore in the context of performance model extraction and cloud auto-scaling also showing that it may generate workloads with different effects on the power consumption of the system under consideration.
We present a method for automated selection of interpolation strategies for performance and power characterization. We also introduce a configuration approach for polynomial interpolation functions of varying degrees that improves prediction accuracy for system power consumption for a given system utilization.
We show that, in comparison to regression, our automated interpolation method selection and configuration approach improves modeling accuracy by 43.6% if additional reference data is available and by 31.4% if it is not.
We present an approach for explicit modeling of the impact a virtualized environment has on power consumption and a method to predict the power consumption of a software application. Both methods use results produced by our measurement methodology to predict the respective power consumption for servers that are otherwise not available to the person making the prediction.
Our methods are able to predict power consumption reliably for multiple hypervisor configurations and for the target application workloads. Application workload power prediction features a mean average absolute percentage error of 9.5%.
Finally, we propose an end-to-end modeling approach for predicting the power consumption of component placements at run-time. The model can also be used to predict the power consumption at load levels that have not yet been observed on the running system.
We show that we can predict the power consumption of two different distributed web applications with a mean absolute percentage error of 2.2%. In addition, we can predict the power consumption of a system at a previously unobserved load level and component distribution with an error of 1.2%.
The contributions of this thesis already show a significant impact in science and industry. The presented efficiency rating methodology, including its metric, have been adopted by the U.S. EPA in the latest version of the ENERGY STAR Computer Server program. They are also being considered by additional regulatory agencies, including the EU Commission and the China National Institute of Standardization. In addition, the methodology's implementation and the underlying methodology itself have already found use in several research publications.
Regarding future work, we see a need for new workloads targeting specialized server hardware. At the moment, we are witnessing a shift in execution hardware to specialized machine learning chips, general purpose GPU computing, FPGAs being embedded into compute servers, etc. To ensure that our measurement methodology remains relevant, workloads covering these areas are required. Similarly, power prediction models must be extended to cover these new scenarios.
A key functionality of cloud systems are automated resource management mechanisms at the infrastructure level. As part of this, elastic scaling of allocated resources is realized by so-called auto-scalers that are supposed to match the current demand in a way that the performance remains stable while resources are efficiently used.
The process of rating cloud infrastructure offerings in terms of the quality of their achieved elastic scaling remains undefined. Clear guidance for the selection and configuration of an auto-scaler for a given context is not available. Thus, existing operating solutions are optimized in a highly application specific way and usually kept undisclosed.
The common state of practice is the use of simplistic threshold-based approaches. Due to their reactive nature they incur performance degradation during the minutes of provisioning delays. In the literature, a high-number of auto-scalers has been proposed trying to overcome the limitations of reactive mechanisms by employing proactive prediction methods.
In this thesis, we identify potentials in automated cloud system resource management and its evaluation methodology. Specifically, we make the following contributions:
We propose a descriptive load profile modeling framework together with automated model extraction from recorded traces to enable reproducible workload generation with realistic load intensity variations. The proposed Descartes Load Intensity Model (DLIM) with its Limbo framework provides key functionality to stress and benchmark resource management approaches in a representative and fair manner.
We propose a set of intuitive metrics for quantifying timing, stability and accuracy aspects of elasticity. Based on these metrics, we propose a novel approach for benchmarking the elasticity of Infrastructure-as-a-Service (IaaS) cloud platforms independent of the performance exhibited by the provisioned underlying resources.
We tackle the challenge of reducing the risk of relying on a single proactive auto-scaler by proposing a new self-aware auto-scaling mechanism, called Chameleon, combining multiple different proactive methods coupled with a reactive fallback mechanism.
Chameleon employs on-demand, automated time series-based forecasting methods to predict the arriving load intensity in combination with run-time service demand estimation techniques to calculate the required resource consumption per work unit without the need for a detailed application instrumentation. It can also leverage application knowledge by solving product-form queueing networks used to derive optimized scaling actions. The Chameleon approach is first in resolving conflicts between reactive and proactive scaling decisions in an intelligent way.
We are confident that the contributions of this thesis will have a long-term impact on the way cloud resource management approaches are assessed. While this could result in an improved quality of autonomic management algorithms, we see and discuss arising challenges for future research in cloud resource management and its assessment methods: The adoption of containerization on top of virtual machine instances introduces another level of indirection. As a result, the nesting of virtual resources increases resource fragmentation and causes unreliable provisioning delays. Furthermore, virtualized compute resources tend to become more and more inhomogeneous associated with various priorities and trade-offs. Due to DevOps practices, cloud hosted service updates are released with a higher frequency which impacts the dynamics in user behavior.
One consequence of the recent coronavirus pandemic is increased demand and use of online services around the globe. At the same time, performance requirements for modern technologies are becoming more stringent as users become accustomed to higher standards. These increased performance and availability requirements, coupled with the unpredictable usage growth, are driving an increasing proportion of applications to run on public cloud platforms as they promise better scalability and reliability.
With data centers already responsible for about one percent of the world's power consumption, optimizing resource usage is of paramount importance. Simultaneously, meeting the increasing and changing resource and performance requirements is only possible by optimizing resource management without introducing additional overhead. This requires the research and development of new modeling approaches to understand the behavior of running applications with minimal information.
However, the emergence of modern software paradigms makes it increasingly difficult to derive such models and renders previous performance modeling techniques infeasible. Modern cloud applications are often deployed as a collection of fine-grained and interconnected components called microservices. Microservice architectures offer massive benefits but also have broad implications for the performance characteristics of the respective systems. In addition, the microservices paradigm is typically paired with a DevOps culture, resulting in frequent application and deployment changes. Such applications are often referred to as cloud-native applications. In summary, the increasing use of ever-changing cloud-hosted microservice applications introduces a number of unique challenges for modeling the performance of modern applications. These include the amount, type, and structure of monitoring data, frequent behavioral changes, or infrastructure variabilities. This violates common assumptions of the state of the art and opens a research gap for our work.
In this thesis, we present five techniques for automated learning of performance models for cloud-native software systems. We achieve this by combining machine learning with traditional performance modeling techniques. Unlike previous work, our focus is on cloud-hosted and continuously evolving microservice architectures, so-called cloud-native applications. Therefore, our contributions aim to solve the above challenges to deliver automated performance models with minimal computational overhead and no manual intervention. Depending on the cloud computing model, privacy agreements, or monitoring capabilities of each platform, we identify different scenarios where performance modeling, prediction, and optimization techniques can provide great benefits. Specifically, the contributions of this thesis are as follows:
Monitorless: Application-agnostic prediction of performance degradations.
To manage application performance with only platform-level monitoring, we propose Monitorless, the first truly application-independent approach to detecting performance degradation. We use machine learning to bridge the gap between platform-level monitoring and application-specific measurements, eliminating the need for application-level monitoring. Monitorless creates a single and holistic resource saturation model that can be used for heterogeneous and untrained applications. Results show that Monitorless infers resource-based performance degradation with 97% accuracy. Moreover, it can achieve similar performance to typical autoscaling solutions, despite using less monitoring information.
SuanMing: Predicting performance degradation using tracing.
We introduce SuanMing to mitigate performance issues before they impact the user experience. This contribution is applied in scenarios where tracing tools enable application-level monitoring. SuanMing predicts explainable causes of expected performance degradations and prevents performance degradations before they occur. Evaluation results show that SuanMing can predict and pinpoint future performance degradations with an accuracy of over 90%.
SARDE: Continuous and autonomous estimation of resource demands.
We present SARDE to learn application models for highly variable application deployments. This contribution focuses on the continuous estimation of application resource demands, a key parameter of performance models. SARDE represents an autonomous ensemble estimation technique. It dynamically and continuously optimizes, selects, and executes an ensemble of approaches to estimate resource demands in response to changes in the application or its environment. Through continuous online adaptation, SARDE efficiently achieves an average resource demand estimation error of 15.96% in our evaluation.
DepIC: Learning parametric dependencies from monitoring data.
DepIC utilizes feature selection techniques in combination with an ensemble regression approach to automatically identify and characterize parametric dependencies. Although parametric dependencies can massively improve the accuracy of performance models, DepIC is the first approach to automatically learn such parametric dependencies from passive monitoring data streams. Our evaluation shows that DepIC achieves 91.7% precision in identifying dependencies and reduces the characterization prediction error by 30% compared to the best individual approach.
Baloo: Modeling the configuration space of databases.
To study the impact of different configurations within distributed DBMSs, we introduce Baloo. Our last contribution models the configuration space of databases considering measurement variabilities in the cloud. More specifically, Baloo dynamically estimates the required benchmarking measurements and automatically builds a configuration space model of a given DBMS. Our evaluation of Baloo on a dataset consisting of 900 configuration points shows that the framework achieves a prediction error of less than 11% while saving up to 80% of the measurement effort.
Although the contributions themselves are orthogonally aligned, taken together they provide a holistic approach to performance management of modern cloud-native microservice applications.
Our contributions are a significant step forward as they specifically target novel and cloud-native software development and operation paradigms, surpassing the capabilities and limitations of previous approaches.
In addition, the research presented in this paper also has a significant impact on the industry, as the contributions were developed in collaboration with research teams from Nokia Bell Labs, Huawei, and Google.
Overall, our solutions open up new possibilities for managing and optimizing cloud applications and improve cost and energy efficiency.
Serverless computing is an emerging cloud computing paradigm that offers a highlevel
application programming model with utilization-based billing. It enables the
deployment of cloud applications without managing the underlying resources or
worrying about other operational aspects. Function-as-a-Service (FaaS) platforms
implement serverless computing by allowing developers to execute code on-demand
in response to events with continuous scaling while having to pay only for the
time used with sub-second metering. Cloud providers have further introduced
many fully managed services for databases, messaging buses, and storage that also
implement a serverless computing model. Applications composed of these fully
managed services and FaaS functions are quickly gaining popularity in both industry
and in academia.
However, due to this rapid adoption, much information surrounding serverless
computing is inconsistent and often outdated as the serverless paradigm evolves.
This makes the performance engineering of serverless applications and platforms
challenging, as there are many open questions, such as: What types of applications
is serverless computing well suited for, and what are its limitations? How should
serverless applications be designed, configured, and implemented? Which design
decisions impact the performance properties of serverless platforms and how can
they be optimized? These and many other open questions can be traced back to an
inconsistent understanding of serverless applications and platforms, which could
present a major roadblock in the adoption of serverless computing.
In this thesis, we address the lack of performance knowledge surrounding serverless
applications and platforms from multiple angles: we conduct empirical studies
to further the understanding of serverless applications and platforms, we introduce
automated optimization methods that simplify the operation of serverless applications,
and we enable the analysis of design tradeoffs of serverless platforms by
extending white-box performance modeling.