TY  - THES
A1  - Rygielski, Piotr
T1  - Flexible Modeling of Data Center Networks for Capacity Management
T1  - Elastische Modellierung von Rechenzentren-Netzen zwecks Kapazitätsverwaltung
N2  - Nowadays, data centers are becoming increasingly dynamic due to the common adoption of virtualization technologies. Systems can scale their capacity on demand by growing and shrinking their resources dynamically based on the current load. However, the complexity and performance of modern data centers is influenced not only by the software architecture, middleware, and computing resources, but also by network virtualization, network protocols, network services, and configuration. The field of network virtualization is not as mature as server virtualization and there are multiple competing approaches and technologies. Performance modeling and prediction techniques provide a powerful tool to analyze the performance of modern data centers. However, given the wide variety of network virtualization approaches, no common approach exists for modeling and evaluating the performance of virtualized networks.
The performance community has proposed multiple formalisms and models for evaluating the performance of infrastructures based on different network virtualization technologies. The existing performance models can be divided into two main categories: coarse-grained analytical models and highly-detailed simulation models. Analytical performance models are normally defined at a high level of abstraction and thus they abstract many details of the real network and therefore have limited predictive power. On the other hand, simulation models are normally focused on a selected networking technology and take into account many specific performance influencing factors, resulting in detailed models that are tightly bound to a given technology, infrastructure setup, or to a given protocol stack.
Existing models are inflexible, that means, they provide a single solution method without providing means for the user to influence the solution accuracy and solution overhead. To allow for flexibility in the performance prediction, the user is required to build multiple different performance models obtaining multiple performance predictions. Each performance prediction may then have different focus, different performance metrics, prediction accuracy, and solving time.
The goal of this thesis is to develop a modeling approach that does not require the user to have experience in any of the applied performance modeling formalisms. The approach offers the flexibility in the modeling and analysis by balancing between: (a) generic character and low overhead of coarse-grained analytical models, and (b) the more detailed simulation models with higher prediction accuracy.

The contributions of this thesis intersect with technologies and research areas, such as: software engineering, model-driven software development, domain-specific modeling, performance modeling and prediction, networking and data center networks, network virtualization, Software-Defined Networking (SDN), Network Function Virtualization (NFV). The main contributions of this thesis compose the Descartes Network Infrastructure (DNI) approach and include:

• Novel modeling abstractions for virtualized network infrastructures. This includes two meta-models that define modeling languages for modeling data center network performance. The DNI and miniDNI meta-models provide means for representing network infrastructures at two different abstraction levels. Regardless of which variant of the DNI meta-model is used, the modeling language provides generic modeling elements allowing to describe the majority of existing and future network technologies, while at the same time abstracting factors that have low influence on the overall performance. I focus on SDN and NFV as examples of modern virtualization technologies.
• Network deployment meta-model—an interface between DNI and other meta- models that allows to define mapping between DNI and other descriptive models. The integration with other domain-specific models allows capturing behaviors that are not reflected in the DNI model, for example, software bottlenecks, server virtualization, and middleware overheads.
• Flexible model solving with model transformations. The transformations enable solving a DNI model by transforming it into a predictive model. The model transformations vary in size and complexity depending on the amount of data abstracted in the transformation process and provided to the solver. In this thesis, I contribute six transformations that transform DNI models into various predictive models based on the following modeling formalisms: (a) OMNeT++ simulation, (b) Queueing Petri Nets (QPNs), (c) Layered Queueing Networks (LQNs). For each of these formalisms, multiple predictive models are generated (e.g., models with different level of detail): (a) two for OMNeT++, (b) two for QPNs, (c) two for LQNs. Some predictive models can be solved using multiple alternative solvers resulting in up to ten different automated solving methods for a single DNI model.
• A model extraction method that supports the modeler in the modeling process by automatically prefilling the DNI model with the network traffic data. The contributed traffic profile abstraction and optimization method provides a trade-off by balancing between the size and the level of detail of the extracted profiles.
• A method for selecting feasible solving methods for a DNI model. The method proposes a set of solvers based on trade-off analysis characterizing each transformation with respect to various parameters such as its specific limitations, expected prediction accuracy, expected run-time, required resources in terms of CPU and memory consumption, and scalability.
• An evaluation of the approach in the context of two realistic systems. I evaluate the approach with focus on such factors like: prediction of network capacity and interface throughput, applicability, flexibility in trading-off between prediction accuracy and solving time. Despite not focusing on the maximization of the prediction accuracy, I demonstrate that in the majority of cases, the prediction error is low—up to 20% for uncalibrated models and up to 10% for calibrated models depending on the solving technique.
In summary, this thesis presents the first approach to flexible run-time performance prediction in data center networks, including network based on SDN. It provides ability to flexibly balance between performance prediction accuracy and solving overhead. The approach provides the following key benefits:
• It is possible to predict the impact of changes in the data center network on the performance. The changes include: changes in network topology, hardware configuration, traffic load, and applications deployment.
• DNI can successfully model and predict the performance of multiple different of network infrastructures including proactive SDN scenarios.
• The prediction process is flexible, that is, it provides balance between the granularity of the predictive models and the solving time. The decreased prediction accuracy is usually rewarded with savings of the solving time and consumption of resources required for solving.
• The users are enabled to conduct performance analysis using multiple different prediction methods without requiring the expertise and experience in each of the modeling formalisms.
The components of the DNI approach can be also applied to scenarios that are not considered in this thesis. The approach is generalizable and applicable for the following examples: (a) networks outside of data centers may be analyzed with DNI as long as the background traffic profile is known; (b) uncalibrated DNI models may serve as a basis for design-time performance analysis; (c) the method for extracting and compacting of traffic profiles may be used for other, non-network workloads as well.
N2  - Durch Virtualisierung werden moderne Rechenzentren immer dynamischer. Systeme sind in der Lage ihre Kapazität hoch und runter zu skalieren , um die ankommende Last zu bedienen. Die Komplexität der modernen Systeme in Rechenzentren wird nicht nur von der Softwarearchitektur, Middleware und Rechenressourcen sondern auch von der Netzwerkvirtualisierung beeinflusst. Netzwerkvirtualisierung ist noch  nicht  so ausgereift  wie  die Virtualisierung  von  Rechenressourcen und es existieren derzeit unterschiedliche Netzwerkvirtualisierungstechnologien. Man kann aber keine der Technologien als Standardvirtualisierung für Netzwerke bezeichnen. Die Auswahl von Ansätzen durch Performanzanalyse von Netzwerken stellt eine Herausforderung dar, weil existierende Ansätze sich mehrheitlich auf einzelne Virtualisierungstechniken fokussieren und es keinen universellen Ansatz für Performanzanalyse gibt, der alle Techniken in Betracht nimmt.
Die Forschungsgemeinschaft bietet verschiedene Performanzmodelle und Formalismen für Evaluierung der Performanz von virtualisierten Netzwerken an. Die bekannten Ansätze können in zwei Gruppen aufgegliedert werden: Grobetaillierte analytische Modelle und feindetaillierte Simulationsmodelle. Die analytischen Performanzmodelle abstrahieren viele Details und liefern daher nur beschränkt nutzbare Performanzvorhersagen. Auf der anderen Seite fokussiert sich die Gruppe der simulationsbasierenden Modelle auf bestimmte Teile des Systems (z.B. Protokoll, Typ von Switches) und ignoriert dadurch das große Bild der Systemlandschaft. ...
KW  - Modellierung
KW  - Leistungsbewertung
KW  - Netzwerk
KW  - Meta-modeling
KW  - Model transformation
KW  - Performance analysis
KW  - Simulation
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-146235
ER  - 
TY  - THES
A1  - Walter, Jürgen Christian
T1  - Automation in Software Performance Engineering Based on a Declarative Specification of Concerns
T1  - Automatisierung im Software-Performance-Engineering basierend auf einer deklarativen Beschreibung von Performance-Anliegen
N2  - Software performance is of particular relevance to software system design, operation, and evolution because it has a significant impact on key business indicators. During the life-cycle of a software system, its implementation, configuration, and deployment are subject to multiple changes that may affect the end-to-end performance characteristics. Consequently, performance analysts continually need to provide answers to and act based on performance-relevant concerns. To ensure a desired level of performance, software performance engineering provides a plethora of methods, techniques, and tools for measuring, modeling, and evaluating performance properties of software systems. However, the answering of performance concerns is subject to a significant semantic gap between the level on which performance concerns are formulated and the technical level on which performance evaluations are actually conducted. Performance evaluation approaches come with different strengths and limitations concerning, for example, accuracy, time-to-result, or system overhead. For the involved stakeholders, it can be an elaborate process to reasonably select, parameterize and correctly apply performance evaluation approaches, and to filter and interpret the obtained results. An additional challenge is that available performance evaluation artifacts may change over time, which requires to switch between different measurement-based and model-based performance evaluation approaches during the system evolution. At model-based analysis, the effort involved in creating performance models can also outweigh their benefits. 
To overcome the deficiencies and enable an automatic and holistic evaluation of performance throughout the software engineering life-cycle requires an approach that: (i) integrates multiple types of performance concerns and evaluation approaches, (ii) automates performance model creation, and (iii) automatically selects an evaluation methodology tailored to a specific scenario. This thesis presents a declarative approach —called Declarative Performance Engineering (DPE)— to automate performance evaluation based on a humanreadable specification of performance-related concerns. To this end, we separate the definition of performance concerns from their solution. The primary scientific contributions presented in this thesis are:

A declarative language to express performance-related concerns and a corresponding processing framework: 
We provide a language to specify performance concerns independent of a concrete performance evaluation approach. Besides the specification of functional aspects, the language allows to include non-functional tradeoffs optionally. To answer these concerns, we provide a framework architecture and a corresponding reference implementation to process performance concerns automatically. It allows to integrate arbitrary performance evaluation approaches and is accompanied by reference implementations for model-based and measurement-based performance evaluation.

Automated creation of architectural performance models from execution traces:
The creation of performance models can be subject to significant efforts outweighing the benefits of model-based performance evaluation. We provide a model extraction framework that creates architectural performance models based on execution traces, provided by monitoring tools.The framework separates the derivation of generic information from model creation routines. To derive generic information, the framework combines state-of-the-art extraction and estimation techniques. We isolate object creation routines specified in a generic model builder interface based on concepts present in multiple performance-annotated architectural modeling formalisms. To create model extraction for a novel performance modeling formalism, developers only need to write object creation routines instead of creating model extraction software from scratch when reusing the generic framework.

Automated and extensible decision support for performance evaluation approaches:
We present a methodology and tooling for the automated selection of a performance evaluation approach tailored to the user concerns and application scenario. To this end, we propose to decouple the complexity of selecting a performance evaluation approach for a given scenario by providing solution approach capability models and a generic decision engine. The proposed capability meta-model enables to describe functional and non-functional capabilities of performance evaluation approaches and tools at different granularities. In contrast to existing tree-based decision support mechanisms, the decoupling approach allows to easily update characteristics of solution approaches as well as appending new rating criteria and thereby stay abreast of evolution in performance evaluation tooling and system technologies. 

Time-to-result estimation for model-based performance prediction: 
The time required to execute a model-based analysis plays an important role in different decision processes. For example, evaluation scenarios might require the prediction results to be available in a limited period of time such that the system can be adapted in time to ensure the desired quality of service. We propose a method to estimate the time-to-result for modelbased performance prediction based on model characteristics and analysis parametrization. We learn a prediction model using performancerelevant features thatwe determined using statistical tests. We implement the approach and demonstrate its practicability by applying it to analyze a simulation-based multi-step performance evaluation approach for a representative architectural performance modeling formalism.

We validate each of the contributions based on representative case studies. The evaluation of automatic performance model extraction for two case study systems shows that the resulting models can accurately predict the performance behavior. Prediction accuracy errors are below 3% for resource utilization and mostly less than 20% for service response time. The separate evaluation of the reusability shows that the presented approach lowers the implementation efforts for automated model extraction tools by up to 91%. Based on two case studies applying measurement-based and model-based performance evaluation techniques, we demonstrate the suitability of the declarative performance engineering framework to answer multiple kinds of performance concerns customized to non-functional goals. Subsequently, we discuss reduced efforts in applying performance analyses using the integrated and automated declarative approach. Also, the evaluation of the declarative framework reviews benefits and savings integrating performance evaluation approaches into the declarative performance engineering framework. We demonstrate the applicability of the decision framework for performance evaluation approaches by applying it to depict existing decision trees. Then, we show how we can quickly adapt to the evolution of performance evaluation methods which is challenging for static tree-based decision support systems. At this, we show how to cope with the evolution of functional and non-functional capabilities of performance evaluation software and explain how to integrate new approaches. Finally, we evaluate the accuracy of the time-to-result estimation for a set of machinelearning algorithms and different training datasets. The predictions exhibit a mean percentage error below 20%, which can be further improved by including performance evaluations of the considered model into the training data. The presented contributions represent a significant step towards an integrated performance engineering process that combines the strengths of model-based and measurement-based performance evaluation. The proposed performance concern language in conjunction with the processing framework significantly reduces the complexity of applying performance evaluations for all stakeholders. Thereby it enables performance awareness throughout the software engineering life-cycle. The proposed performance concern language removes the semantic gap between the level on which performance concerns are formulated and the technical level on which performance evaluations are actually conducted by the user.
N2  - Die Performanz von Software ist von herausgehobener Relevanz für das Design, den Betrieb und die Evolution von Softwaresystemen, da sie den Geschäftserfolg stark beinflusst. Während des Softwarelebenszyklus ändern sich die Implementierung und die Art der Bereitstellung mehrfach, was jeweils das Ende-zu-Ende Verhalten bezüglich der Performanz beeinflussen kann. Folglich muss sich kontinuierlich mit Fragestellungen der Leistungsbewertung beschäftigt werden. Um performantes Verhalten sicherzustellen gibt es im “Software Performance Engineering” bereits eine Vielzahl an Methoden, Techniken und Werkzeugen um Performanzeigenschaften von Softwaresystemen zu messen, zu modellieren und zu evaluieren. Jedoch unterliegt die Beantwortung von konkreten Fragestellungen einem Missverhältnis zwischen dem einfachen Formulieren von Fragestellungen und dem sehr technischen Level auf dem die Fragen beantwortet werden. Verfahren zur Bestimmung von Performanzmetriken haben unterschiedliche Stärken und Einschränkungen, u.a. bezüglich Genauigkeit, Lösungsgeschwindigkeit oder der erzeugten Last auf dem System. Für die beteiligten Personen ist es ein nicht-trivialer Prozess ein passendes Verfahren zur Performanzevaluation auszuwählen, es sinnvoll zu parametrisieren, auszuführen, sowie die Ergebnisse zu filtern und zu interpretieren. Eine zusätzliche Herausforderung ist, dass sich die Artefakte, um die Leistung eines Systemes zu evaluieren, im zeitlichen Verlauf ändern, was einenWechsel zwischen messbasierten und modellbasierten Verfahren im Rahmen der Systemevolution nötig macht. Bei der modellbasierten Analyse kann zudem der Aufwand für die Erstellung von Performance-Modellen den Nutzen überwiegen.
Um die genannten Defizite zu überwinden und eine ganzheitliche, automatisierte Evaluierung der Leistung während des Software-Entwicklungszyklus zu erreichen ist ein Ansatz von Nöten, der: (i) unterschiedliche Arten von Performanzanliegen und Evaluationsmethoden integriert, (ii) die Erstellung von Performanzmodellen automatisiert und (iii) automatisch eine Methodik zur Evaluation zugeschnitten auf ein spezielles Analyseszenario auswählt. Diese Arbeit präsentiert einen beschreibenden Ansatz, Declarative Performance Engineering (DPE) genannt, um die Evaluation von Performanzfragestellungen basierend auf einem menschenlesbaren Spezifikation zu automatisieren.

Zu diesem Zweck trennen wir die Spezifikation von Performanzanliegen von deren Beantwortung. Die wissenschaftlichen Hauptbeiträge dieser Arbeit sind:

Eine beschreibende Sprache um performanzrelevante Fragestellungen auszudrücken und ein Framework um diese zu beantworten: 
Wir präsentieren eine Sprache, um Performanzanliegen unabhängig von der Evaluationsmethodik zu beschreiben. Neben der Spezifikation von funktionalen Aspekten können auch nicht-funktionale Abwägungsentscheidungen beschrieben werden. Um die spezifizierten Anliegen zu beantworten präsentieren wir eine Frameworkarchitektur und eine entsprechende Referenzimplementierung,um Anliegen automatisch zu beantworten. Das Framework bietet die Möglichkeit beliebige Evaluationsmethodiken zu integrieren und wird ergänzt durch Referenzimplementierungen zur messbasierten und modellbasierten Performanzevaluation.

Automatische Extraktion von architekturellen Performanzemodellen aus Messdatenzur Anwendungsperformanz:
Der signifikante Aufwand zur Erstellung von Performanzmodellen kann deren Vorteile überlagern. Wir schlagen einen Framework zur automatischen Erstellung vor, welches Modelle aus Messdaten extrahiert. Das präsentierte Framework trennt das Lernen von generischen Aspekten von Modellerstellungsroutinen. Um generische Aspekte zu lernen kombiniert unser Framework modernste Extraktionsund Schätztechniken. Wir isolieren Objekterstellungsroutinen, die in einer generischen Schnittstelle zur Modellerzeugung angegeben sind, basierend auf Konzepten die in mehreren Performanz-annotierten Architekturmodellen vorhanden sind. Um eine Modellextraktion für einen neuen Formalismus zu erstellen müssen Entwickler müssen nur die Erstellung von Objekterstellungsroutinen schreiben statt eine Modell-Extraktionssoftware von Grund auf neu zu schreiben.

Automatisierte und erweiterbare Entscheidungsunterstützung für Leistungsbewertungsansätze:
Wir präsentieren eine Methodik und Werkzeuge für die automatisierte Auswahl eines auf die Belange und Anwendungenszenarien der Benutzer zugeschnittenen Leistungsbewertungsansatzes. Zu diesem Zweck schlagen wir vor, die Komplexität der Auswahl eines Leistungsbewertungsansatzes für ein gegebenes Szenario zu entkoppeln. Dies geschieht durch Bereitstellung von Fähigkeitsmodellen für die Lösungsansätze und einen generische Entscheidungsmechanismus. Das vorgeschlagene Fähigkeits-Metamodell ermöglicht es, funktionale und nichtfunktionale Fähigkeiten von Leistungsbewertungsansätzen und Werkzeugen in verschiedenen Granularitäten zu modellieren. Im Gegensatz zu bestehenden baumbasierten Entscheidungensmechanismen ermöglicht unser Ansatz die einfache Aktualisierung von Merkmalen von Lösungsansätzen sowie das Hinzufügen neuer Bewertungskriterien und kann dadurch einfach aktuell gehalten werden.

Eine Methode zur Schätzung der Analysezeit für die modellbasierte Leistungsvorhersage:
Die Zeit, die für die Durchführung einer modellbasierten Analyse benötigt wird, spielt in verschiedenen Entscheidungsprozessen eine wichtige Rolle. Beispielsweise können Auswertungsszenarien erfordern, dass die Vorhersageergebnisse in einem begrenzten Zeitraum zur Verfügung stehen, so dass das System rechtzeitig angepasst werden kann, um die Dienstgüte sicherzustellen.Wir schlagen eine Methode vor, um die Zeit bis zum Ergebnis für modellbasierte Leistungsvorhersage basierend auf Modelleigenschaften und Analyseparametrisierung zu schätzen. Wir lernen ein Vorhersagemodell anhand von leistungsrelevanten Merkmalen, die wir mittels statistischer Tests ermittelt haben. Wir implementieren den Ansatz und demonstrieren seine Praktikabilität, indem wir ihn auf einen mehrstufiger Leistungsbewertungsansatz anwenden. Wir validieren jeden der Beiträge anhand repräsentativer Fallstudien. Die Evaluierung der Leistungsmodellextraktion für mehrere Fallstudiensysteme zeigt, dass die resultierenden Modelle das Leistungsverhalten genau vorhersagen können. Fehler bei der Vorhersagegenauigkeit liegen für die Ressourcennutzung unter 3% und meist weniger als 20% für die Service-Reaktionszeit. Die getrennte Bewertung derWiederverwendbarkeit zeigt, dass der Implementierungsaufwand zur Erstellung von Modellextraktionswerkzeugen um bis zu 91% gesenkt werden kann. 

Wir zeigen die Eignung unseres Framworks zur deklarativen Leistungsbewertung basierend auf zwei Fallstudien die mess- und model-basierte Leistungsbewertungstechniken zur Beantwortung verschiedenster Performance-Anliegen zugeschnitten auf Nutzerbedürfnisse anwenden. Anschließend diskutieren wir die Einsparungen durch den integrierten und automatisierten Ansatz. Des weiteren untersuchen wir die Vorteile der Integration vonweiteren Leistungsbewertungsansätzen in den deklarativen Ansatz.Wir demonstrieren die Anwendbarkeit unseres Entscheidungsframeworks für Leistungsbewertungsansätze, indem wir den Stand der Technik für Entscheidungsunterstützung abbilden. Anschließend zeigen wir die leichte Anpassbarkeit, was für baumbasierte Entscheidungsunterstützungssysteme eine signifikante Herausforderung darstellt. Hierbei zeigen wir wie man Änderungen funktionaler und nichtfunktionaler Fähigkeiten von Leistungsbewertungssoftware sowie neue Ansätze integriert. Abschließend bewerten wir die Genauigkeit der Zeit-zu-Ergebnis-Schätzung für eine Reihe von maschinellen Lernalgorithmen und verschiedenen Trainingsdatensätzen. Unser Vorhersagen zeigen einen mittleren prozentualen Fehler von weniger als 20%, die weiter verbessert werden können durch Berücksichtigung von Leistungsbewertungen des betrachteten Modells in den Trainingsdaten.

Die vorgestellten Beiträge sind ein bedeutender Schritt hin zu einem integrierten Performance-Engineering-Prozess, der die Stärken von modellbasierter und messbasierter Leistungsbewertung kombiniert. Die vorgeschlagene Sprache um Performanzanliegen zu spezifizieren reduziert in Verbindung mit dem Beantwortungsframework die Komplexität der Anwendung von Leistungsbewertungen für alle Beteiligten deutlich und ermöglicht dadurch ein Leistungsbewusstsein im gesamten Softwarelebenszyklus. Damit entfernt die vorgeschlagene Sprache die Diskrepanz zwischen einem einfachen Fragen bezüglich der Leistung und der sehr technische Ebene auf der Leistungsbewertungen tatsächlich ausgeführt werden.
KW  - Software
KW  - Declarative Performance Engineering
KW  - Model-based Performance Prediction
KW  - Measurement-based Analysis
KW  - Decision Support
KW  - Leistungsbewertung
KW  - Software Performance Engineering
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-180904
ER  - 
TY  - THES
A1  - Eismann, Simon
T1  - Performance Engineering of Serverless Applications and Platforms
T1  - Performanz Engineering von Serverless Anwendungen und Plattformen
N2  - Serverless computing is an emerging cloud computing paradigm that offers a highlevel
application programming model with utilization-based billing. It enables the
deployment of cloud applications without managing the underlying resources or
worrying about other operational aspects. Function-as-a-Service (FaaS) platforms
implement serverless computing by allowing developers to execute code on-demand
in response to events with continuous scaling while having to pay only for the
time used with sub-second metering. Cloud providers have further introduced
many fully managed services for databases, messaging buses, and storage that also
implement a serverless computing model. Applications composed of these fully
managed services and FaaS functions are quickly gaining popularity in both industry
and in academia.
However, due to this rapid adoption, much information surrounding serverless
computing is inconsistent and often outdated as the serverless paradigm evolves.
This makes the performance engineering of serverless applications and platforms
challenging, as there are many open questions, such as: What types of applications
is serverless computing well suited for, and what are its limitations? How should
serverless applications be designed, configured, and implemented? Which design
decisions impact the performance properties of serverless platforms and how can
they be optimized? These and many other open questions can be traced back to an
inconsistent understanding of serverless applications and platforms, which could
present a major roadblock in the adoption of serverless computing.
In this thesis, we address the lack of performance knowledge surrounding serverless
applications and platforms from multiple angles: we conduct empirical studies
to further the understanding of serverless applications and platforms, we introduce
automated optimization methods that simplify the operation of serverless applications,
and we enable the analysis of design tradeoffs of serverless platforms by
extending white-box performance modeling.
N2  - Serverless Computing ist ein neues Cloud-Computing-Paradigma, das ein High-Level-Anwendungsprogrammiermodell mit nutzungsbasierter Abrechnung bietet. Es ermöglicht die Bereitstellung von Cloud-Anwendungen, ohne dass die zugrunde liegenden Ressourcen verwaltet werden müssen oder man sich um andere betriebliche Aspekte kümmern muss. FaaS-Plattformen implementieren Serverless Computing, indem sie Entwicklern die Möglichkeit geben, Code nach Bedarf als Reaktion auf Ereignisse mit kontinuierlicher Skalierung auszuführen, während sie nur für die genutzte Zeit mit sekundengenauer Abrechnung zahlen müssen. Cloud-Anbieter haben darüber hinaus viele vollständig verwaltete Dienste für Datenbanken, Messaging-Busse und Orchestrierung eingeführt, die ebenfalls ein Serverless Computing-Modell implementieren. Anwendungen, die aus diesen vollständig verwalteten Diensten und FaaS-Funktionen bestehen, werden sowohl in der Industrie als auch in der Wissenschaft immer beliebter.

Aufgrund dieser schnellen Verbreitung sind jedoch viele Informationen zum Serverless Computing inkonsistent und oft veraltet, da sich das Serverless Paradigma weiterentwickelt. Dies macht das Performanz-Engineering von Serverless Anwendungen und Plattformen zu einer Herausforderung, da es viele offene Fragen gibt, wie zum Beispiel: Für welche Arten von Anwendungen ist Serverless Computing gut geeignet und wo liegen seine Grenzen? Wie sollten Serverless Anwendungen konzipiert, konfiguriert und implementiert werden? Welche Designentscheidungen wirken sich auf die Performanzeigenschaften von Serverless Plattformen aus und wie können sie optimiert werden? Diese und viele andere offene Fragen lassen sich auf ein uneinheitliches Verständnis von Serverless Anwendungen und Plattformen zurückführen, was ein großes Hindernis für die Einführung von Serverless Computing darstellen könnte.

In dieser Arbeit adressieren wir den Mangel an Performanzwissen zu Serverless Anwendungen und Plattformen aus mehreren Blickwinkeln: Wir führen empirische Studien durch, um das Verständnis von Serverless Anwendungen und Plattformen zu fördern, wir stellen automatisierte Optimierungsmethoden vor, die das benötigte Wissen für den Betrieb von Serverless Anwendungen reduzieren, und wir erweitern die White-Box-Performanzmodellierungerung für die Analyse von Designkompromissen von Serverless Plattformen.
KW  - Leistungsbewertung
KW  - software performance
KW  - Cloud Computing
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-303134
ER  - 
TY  - THES
A1  - Spinner, Simon
T1  - Self-Aware Resource Management in Virtualized Data Centers
T1  - Selbstwahrnehmende Ressourcenverwaltung in virtualisierten Rechenzentren
N2  - Enterprise applications in virtualized data centers are often subject to time-varying workloads, i.e., the load intensity and request mix change over time, due to seasonal patterns and trends, or unpredictable bursts in user requests. Varying workloads result in frequently changing resource demands to the underlying hardware infrastructure. Virtualization technologies enable sharing and on-demand allocation of hardware resources between multiple applications. In this context, the resource allocations to virtualized applications should be continuously adapted in an elastic fashion, so that "at each point in time the available resources match the current demand as closely as possible" (Herbst el al., 2013). Autonomic approaches to resource management promise significant increases in resource efficiency while avoiding violations of performance and availability requirements during peak workloads.

Traditional approaches for autonomic resource management use threshold-based rules (e.g., Amazon EC2) that execute pre-defined reconfiguration actions when a metric reaches a certain threshold (e.g., high resource utilization or load imbalance). However, many business-critical applications are subject to Service-Level-Objectives defined on an application performance metric (e.g., response time or throughput). To determine thresholds so that the end-to-end application SLO is fulfilled poses a major challenge due to the complex relationship between the resource allocation to an application and the application performance. Furthermore, threshold-based approaches are inherently prone to an oscillating behavior resulting in unnecessary reconfigurations.

In order to overcome the deficiencies of threshold-based
approaches and enable a fully automated approach to dynamically control the resource allocations of virtualized applications, model-based approaches are required that can predict the impact of a reconfiguration on the application performance in advance. However, existing model-based approaches are severely limited in their learning capabilities. They either require complete performance models of the application as input, or use a pre-identified model structure and only learn certain model parameters from empirical data at run-time. The former requires high manual efforts and deep system knowledge to create the performance models. The latter does not provide the flexibility to capture the specifics of complex and heterogeneous system architectures.

This thesis presents a self-aware approach to the resource management in virtualized data centers. In this context, self-aware means that it automatically learns performance models of the application and the virtualized infrastructure and reasons based on these models to autonomically adapt the resource allocations in accordance with given application SLOs. Learning a performance model requires the extraction of the model structure representing the system architecture as well as the estimation of model parameters, such as resource demands. The estimation of resource demands is a key challenge as they cannot be observed directly in most systems.
The major scientific contributions of this thesis are:
- A reference architecture for online model learning in virtualized systems. Our reference architecture is based on a set of model extraction agents. Each agent focuses on specific tasks to automatically create and update model skeletons capturing its local knowledge of the system and collaborates with other agents to extract the structural parts of a global performance model of the system. We define different agent roles in the reference architecture and propose a model-based collaboration mechanism for the agents. The agents may be bundled within virtual appliances and may be tailored to include knowledge about the software stack deployed in a specific virtual appliance.
- An online method for the statistical estimation of resource demands. For a given request processed by an application, the resource time consumed for a specified resource within the system (e.g., CPU or I/O device), referred to as resource demand, is the total average time the resource is busy processing the request. A request could be any unit of work (e.g., web page request, database transaction, batch job) processed by the system. We provide a systematization of existing statistical approaches to resource demand estimation and conduct an extensive experimental comparison to evaluate the accuracy of these approaches. We propose a novel method to automatically select estimation approaches and demonstrate that it increases the robustness and accuracy of the estimated resource demands significantly.
- Model-based controllers for autonomic vertical scaling of virtualized applications. We design two controllers based on online model-based reasoning techniques in order to vertically scale applications at run-time in accordance with application SLOs. The controllers exploit the knowledge from the automatically extracted performance models when determining necessary reconfigurations. The first controller adds and removes virtual CPUs to an application depending on the current demand. It uses a layered performance model to also consider the physical resource contention when determining the required resources. The second controller adapts the resource allocations proactively to ensure the availability of the application during workload peaks and avoid reconfiguration during phases of high workload.

We demonstrate the applicability of our approach in current virtualized environments and show its effectiveness leading to significant increases in resource efficiency and improvements of the application performance and availability under time-varying workloads. The evaluation of our approach is based on two case studies representative of widely used enterprise applications in virtualized data centers. In our case studies, we were able to reduce the amount of required CPU resources by up to 23% and the number of reconfigurations by up to 95% compared to a rule-based approach while ensuring full compliance with application SLO. Furthermore, using workload forecasting techniques we were able to schedule expensive reconfigurations (e.g., changes to the memory size) during phases of load load and thus were able to reduce their impact on application availability by over 80% while significantly improving application performance compared to a reactive controller. The methods and techniques for resource demand estimation and vertical application scaling were developed and evaluated in close collaboration with VMware and Google.
N2  - Unternehmensanwendungen in virtualisierten Rechenzentren unterliegen häufig zeitabhängigen Arbeitslasten, d.h. die Lastintensität und der Anfragemix ändern sich mit der Zeit wegen saisonalen Mustern und Trends, sowie  unvorhergesehenen Lastspitzen bei den Nutzeranfragen. Variierende Arbeitslasten führen dazu, dass sich die Ressourcenanforderungen an die darunterliegende Hardware-Infrastruktur häufig ändern. Virtualisierungstechniken erlauben die gemeinsame Nutzung und bedarfsgesteuerte Zuteilung von Hardware-Ressourcen zwischen mehreren Anwendungen. In diesem Zusammenhang sollte die Zuteilung von Ressourcen an virtualisierte Anwendungen fortwährend in einer elastischen Art und Weise angepasst werden, um sicherzustellen, dass "zu jedem Zeitpunkt die verfügbaren Ressourcen dem derzeitigen Bedarf möglichst genau entsprechen" (Herbst et al., 2013). Autonome Ansätze zur Ressourcenverwaltung versprechen eine deutliche Steigerung der Ressourceneffizienz wobei Verletzungen der Anforderungen hinsichtlich Performanz und Verfügbarkeit bei Lastspitzen vermieden werden.

Herkömmliche Ansätze zur autonomen Ressourcenverwaltung nutzen feste Regeln  (z.B., Amazon EC2), die vordefinierte Rekonfigurationen durchführen sobald eine Metrik einen bestimmten Schwellwert erreicht (z.B., hohe Ressourcenauslastung oder ungleichmäßige Lastverteilung). Viele geschäftskritische Anwendungen unterliegen jedoch Zielvorgaben hinsichtlich der Dienstgüte (SLO, engl. Service Level Objectives), die auf Performanzmetriken der Anwendung definiert sind (z.B., Antwortzeit oder Durchsatz). Die Bestimmung von Schwellwerten, sodass die Ende-zu-Ende Anwendungs-SLOs erfüllt werden, stellt aufgrund des komplexen Zusammenspiels zwischen der Ressourcenzuteilung und der Performanz einer Anwendung eine bedeutende Herausforderung dar. Des Weiteren sind Ansätze basierend auf Schwellwerten inhärent anfällig für Oszillationen, die zu überflüssigen Rekonfigurationen führen können.

Um die Schwächen schwellwertbasierter Ansätze zu lösen und einen vollständig automatisierten Ansatz zur dynamischen Steuerung von Ressourcenzuteilungen virtualisierter Anwendungen zu ermöglichen, bedarf es modellbasierter Ansätze, die den Einfluss einer Rekonfiguration auf die Performanz einer Anwendung im Voraus vorhersagen können. Bestehende modellbasierte Ansätze sind jedoch stark eingeschränkt hinsichtlich ihrer Lernfähigkeiten. Sie erfordern entweder vollständige Performanzmodelle der Anwendung als Eingabe oder nutzen vorbestimmte Modellstrukturen und lernen nur bestimmte Modellparameter auf Basis von empirischen Daten zur Laufzeit. Erstere erfordern hohe manuelle Aufwände und eine tiefe Systemkenntnis um die Performanzmodelle zu erstellen. Letztere bieten nur eingeschränkte Möglichkeiten um die Besonderheiten von komplexen und heterogenen Systemarchitekturen zu erfassen.

Diese Arbeit stellt einen selbstwahrnehmenden (engl. self-aware) Ansatz zur Ressourcenverwaltung in virtualisierten Rechenzentren vor. In diesem Zusammenhang bedeutet Selbstwahrnehmung, dass der Ansatz automatisch Performanzmodelle der Anwendung und der virtualisierten Infrastruktur lernt Basierend auf diesen Modellen entscheidet er autonom wie die Ressourcenzuteilungen angepasst werden, um die Anwendungs-SLOs zu erfüllen. Das Lernen von Performanzmodellen erfordert sowohl die Extraktion der Modellstruktur, die die Systemarchitektur abbildet, als auch die Schätzung von Modellparametern, wie zum Beispiel der Ressourcenverbräuche einzelner Funktionen. Die Schätzung der Ressourcenverbräuche stellt hier eine zentrale Herausforderung dar, da diese in den meisten Systemen nicht direkt gemessen werden können. Die wissenschaftlichen Hauptbeiträge dieser Arbeit sind wie folgt:

- Eine Referenzarchitektur, die das Lernen von Modellen in virtualisierten Systemen während des Betriebs ermöglicht. Unsere Referenzarchitektur basiert auf einer Menge von Modellextraktionsagenten. Jeder Agent fokussiert sich auf bestimmte Aufgaben um automatisch ein Modellskeleton, das sein lokales Wissen über das System erfasst, zu erstellen und zu aktualisieren. Jeder Agent arbeitet mit anderen Agenten zusammen um die strukturellen Teile eines globalen Performanzmodells des Systems zu extrahieren. Die Rereferenzarchitektur definiert unterschiedliche Agentenrollen und beinhaltet einen modellbasierten Mechanismus, der die  Kooperation unterschiedlicher Agenten ermöglicht. Die Agenten können als Teil virtuellen Appliances gebündelt werden und können dabei maßgeschneidertes Wissen über die Software-Strukturen in dieser virtuellen Appliance beinhalten.	
- Eine Methode zur fortwährenden statistischen Schätzung von Ressourcenverbräuchen. Der Ressourcenverbrauch (engl. resource demand) einer Anfrage, die von einer Anwendung verarbeitet wird, entspricht der Zeit, die an einer spezifischen Ressource im System (z.B., CPU oder I/O-Gerät) verbraucht wird.	Eine Anfrage kann dabei eine beliebige Arbeitseinheit, die von einem System verarbeitet wird, darstellen (z.B. eine Webseitenanfrage, eine Datenbanktransaktion, oder ein Stapelverarbeitungsauftrag). Die vorliegende Arbeit bietet eine Systematisierung existierender Ansätze zur statistischen Schätzung von Ressourcenverbräuchen und führt einen umfangreichen, auf Experimenten aufbauenden Vergleich zur Bewertung der Genauigkeit dieser Ansätze durch. Es wird eine neuartige Methode zur automatischen Auswahl eines Schätzverfahrens vorgeschlagen und gezeigt, dass diese die Robustheit und Genauigkeit der geschätzten Ressourcenverbräuche maßgeblich verbessert.	
- Modellbasierte Regler für das autonome, vertikale Skalieren von virtualisierten Anwendungen. Es werden zwei Regler entworfen, die auf modellbasierten Entscheidungstechniken basieren, um Anwendungen zur Laufzeit vertikal in Übereinstimmung mit Anwendungs-SLOs zu skalieren. Die Regler nutzen das Wissen aus automatisch extrahierten Performanzmodellen bei der Bestimmung notwendiger Rekonfigurationen. Der erste Regler fügt virtuelle CPUs zu Anwendungen hinzu und entfernt sie wieder in Abhängigkeit vom aktuellen Bedarf. Er nutzt ein geschichtetes Performanzmodell, um bei der Bestimmung der benötigten Ressourcen die Konkurrenzsituation der physikalischen Ressourcen zu beachten. Der zweite Regler passt Ressourcenzuteilungen proaktiv an, um die Verfügbarkeit einer Anwendung während Lastspitzen sicherzustellen und Rekonfigurationen unter großer Last zu vermeiden.

Die Arbeit demonstriert die Anwendbarkeit unseres Ansatzes in aktuellen virtualisierten Umgebungen und zeigt seine Effektivität bei der Erhöhung der Ressourceneffizienz und der Verbesserung der Anwendungsperformanz und -verfügbarkeit unter zeitabhängigen Arbeitslasten.
Die Evaluation des Ansatzes basiert auf zwei Fallstudien, die repräsentativ für gängige Unternehmensanwendungen in virtualisierten Rechenzentren sind. In den Fallstudien wurde eine Reduzierung der benötigten CPU-Ressourcen von bis zu 23% und der Anzahl der Rekonfigurationen von bis zu 95% im Vergleich zu regel-basierten Ansätzen erreicht, bei gleichzeitiger Erfüllung der Anwendungs-SLOs. Mit Hilfe von Vorhersagetechniken für die Arbeitslast konnten außerdem aufwändige Rekonfigurationen (z.B., Änderungen bei der Menge an zugewiesenem Arbeitsspeicher) so geplant werden, dass sie in Phasen geringer Last durchgeführt werden. Dadurch konnten deren Auswirkungen auf die Verfügbarkeit der Anwendung um mehr als 80% verringert werden bei gleichzeitiger Verbesserung der Anwendungsperformanz verglichen mit einem reaktiven Regler. Die Methoden und Techniken zur Schätzung von Ressourcenverbräuchen und zur vertikalen Skalierung von Anwendungen wurden in enger Zusammenarbeit mit VMware und Google entwickelt und evaluiert.
KW  - Cloud Computing
KW  - Virtualisierung
KW  - Leistungsbewertung
KW  - Autonomic Computing
KW  - Self-Aware Computing
KW  - Model extraction
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-153754
ER  -