Refine
Has Fulltext
- yes (3)
Is part of the Bibliography
- yes (3)
Year of publication
- 2022 (3) (remove)
Document Type
- Doctoral Thesis (2)
- Journal article (1)
Keywords
- Maschinelles Lernen (3) (remove)
Institute
Increasing global competition forces organizations to improve their processes to gain a competitive advantage. In the manufacturing sector, this is facilitated through tremendous digital transformation. Fundamental components in such digitalized environments are process-aware information systems that record the execution of business processes, assist in process automation, and unlock the potential to analyze processes. However, most enterprise information systems focus on informational aspects, process automation, or data collection but do not tap into predictive or prescriptive analytics to foster data-driven decision-making. Therefore, this dissertation is set out to investigate the design of analytics-enabled information systems in five independent parts, which step-wise introduce analytics capabilities and assess potential opportunities for process improvement in real-world scenarios.
To set up and extend analytics-enabled information systems, an essential prerequisite is identifying success factors, which we identify in the context of process mining as a descriptive analytics technique. We combine an established process mining framework and a success model to provide a structured approach for assessing success factors and identifying challenges, motivations, and perceived business value of process mining from employees across organizations as well as process mining experts and consultants. We extend the existing success model and provide lessons for business value generation through process mining based on the derived findings. To assist the realization of process mining enabled business value, we design an artifact for context-aware process mining. The artifact combines standard process logs with additional context information to assist the automated identification of process realization paths associated with specific context events. Yet, realizing business value is a challenging task, as transforming processes based on informational insights is time-consuming.
To overcome this, we showcase the development of a predictive process monitoring system for disruption handling in a production environment. The system leverages state-of-the-art machine learning algorithms for disruption type classification and duration prediction. It combines the algorithms with additional organizational data sources and a simple assignment procedure to assist the disruption handling process. The design of such a system and analytics models is a challenging task, which we address by engineering a five-phase method for predictive end-to-end enterprise process network monitoring leveraging multi-headed deep neural networks. The method facilitates the integration of heterogeneous data sources through dedicated neural network input heads, which are concatenated for a prediction. An evaluation based on a real-world use-case highlights the superior performance of the resulting multi-headed network.
Even the improved model performance provides no perfect results, and thus decisions about assigning agents to solve disruptions have to be made under uncertainty. Mathematical models can assist here, but due to complex real-world conditions, the number of potential scenarios massively increases and limits the solution of assignment models. To overcome this and tap into the potential of prescriptive process monitoring systems, we set out a data-driven approximate dynamic stochastic programming approach, which incorporates multiple uncertainties for an assignment decision. The resulting model has significant performance improvement and ultimately highlights the particular importance of analytics-enabled information systems for organizational process improvement.
Künstliche Intelligenz (KI) dringt vermehrt in sensible Bereiche des alltäglichen menschlichen Lebens ein. Es werden nicht mehr nur noch einfache Entscheidungen durch intelligente Systeme getroffen, sondern zunehmend auch komplexe Entscheidungen. So entscheiden z. B. intelligente Systeme, ob Bewerber in ein Unternehmen eingestellt werden sollen oder nicht. Oftmals kann die zugrundeliegende Entscheidungsfindung nur schwer nachvollzogen werden und ungerechtfertigte Entscheidungen können dadurch unerkannt bleiben, weshalb die Implementierung einer solchen KI auch häufig als sogenannte Blackbox bezeichnet wird. Folglich steigt die Bedrohung, durch unfaire und diskriminierende Entscheidungen einer KI benachteiligt behandelt zu werden. Resultieren diese Verzerrungen aus menschlichen Handlungen und Denkmustern spricht man von einer kognitiven Verzerrung oder einem kognitiven Bias. Aufgrund der Neuigkeit dieser Thematik ist jedoch bisher nicht ersichtlich, welche verschiedenen kognitiven Bias innerhalb eines KI-Projektes auftreten können. Ziel dieses Beitrages ist es, anhand einer strukturierten Literaturanalyse, eine gesamtheitliche Darstellung zu ermöglichen. Die gewonnenen Erkenntnisse werden anhand des in der Praxis weit verbreiten Cross-Industry Standard Process for Data Mining (CRISP-DM) Modell aufgearbeitet und klassifiziert. Diese Betrachtung zeigt, dass der menschliche Einfluss auf eine KI in jeder Entwicklungsphase des Modells gegeben ist und es daher wichtig ist „mensch-ähnlichen“ Bias in einer KI explizit zu untersuchen.
One consequence of the recent coronavirus pandemic is increased demand and use of online services around the globe. At the same time, performance requirements for modern technologies are becoming more stringent as users become accustomed to higher standards. These increased performance and availability requirements, coupled with the unpredictable usage growth, are driving an increasing proportion of applications to run on public cloud platforms as they promise better scalability and reliability.
With data centers already responsible for about one percent of the world's power consumption, optimizing resource usage is of paramount importance. Simultaneously, meeting the increasing and changing resource and performance requirements is only possible by optimizing resource management without introducing additional overhead. This requires the research and development of new modeling approaches to understand the behavior of running applications with minimal information.
However, the emergence of modern software paradigms makes it increasingly difficult to derive such models and renders previous performance modeling techniques infeasible. Modern cloud applications are often deployed as a collection of fine-grained and interconnected components called microservices. Microservice architectures offer massive benefits but also have broad implications for the performance characteristics of the respective systems. In addition, the microservices paradigm is typically paired with a DevOps culture, resulting in frequent application and deployment changes. Such applications are often referred to as cloud-native applications. In summary, the increasing use of ever-changing cloud-hosted microservice applications introduces a number of unique challenges for modeling the performance of modern applications. These include the amount, type, and structure of monitoring data, frequent behavioral changes, or infrastructure variabilities. This violates common assumptions of the state of the art and opens a research gap for our work.
In this thesis, we present five techniques for automated learning of performance models for cloud-native software systems. We achieve this by combining machine learning with traditional performance modeling techniques. Unlike previous work, our focus is on cloud-hosted and continuously evolving microservice architectures, so-called cloud-native applications. Therefore, our contributions aim to solve the above challenges to deliver automated performance models with minimal computational overhead and no manual intervention. Depending on the cloud computing model, privacy agreements, or monitoring capabilities of each platform, we identify different scenarios where performance modeling, prediction, and optimization techniques can provide great benefits. Specifically, the contributions of this thesis are as follows:
Monitorless: Application-agnostic prediction of performance degradations.
To manage application performance with only platform-level monitoring, we propose Monitorless, the first truly application-independent approach to detecting performance degradation. We use machine learning to bridge the gap between platform-level monitoring and application-specific measurements, eliminating the need for application-level monitoring. Monitorless creates a single and holistic resource saturation model that can be used for heterogeneous and untrained applications. Results show that Monitorless infers resource-based performance degradation with 97% accuracy. Moreover, it can achieve similar performance to typical autoscaling solutions, despite using less monitoring information.
SuanMing: Predicting performance degradation using tracing.
We introduce SuanMing to mitigate performance issues before they impact the user experience. This contribution is applied in scenarios where tracing tools enable application-level monitoring. SuanMing predicts explainable causes of expected performance degradations and prevents performance degradations before they occur. Evaluation results show that SuanMing can predict and pinpoint future performance degradations with an accuracy of over 90%.
SARDE: Continuous and autonomous estimation of resource demands.
We present SARDE to learn application models for highly variable application deployments. This contribution focuses on the continuous estimation of application resource demands, a key parameter of performance models. SARDE represents an autonomous ensemble estimation technique. It dynamically and continuously optimizes, selects, and executes an ensemble of approaches to estimate resource demands in response to changes in the application or its environment. Through continuous online adaptation, SARDE efficiently achieves an average resource demand estimation error of 15.96% in our evaluation.
DepIC: Learning parametric dependencies from monitoring data.
DepIC utilizes feature selection techniques in combination with an ensemble regression approach to automatically identify and characterize parametric dependencies. Although parametric dependencies can massively improve the accuracy of performance models, DepIC is the first approach to automatically learn such parametric dependencies from passive monitoring data streams. Our evaluation shows that DepIC achieves 91.7% precision in identifying dependencies and reduces the characterization prediction error by 30% compared to the best individual approach.
Baloo: Modeling the configuration space of databases.
To study the impact of different configurations within distributed DBMSs, we introduce Baloo. Our last contribution models the configuration space of databases considering measurement variabilities in the cloud. More specifically, Baloo dynamically estimates the required benchmarking measurements and automatically builds a configuration space model of a given DBMS. Our evaluation of Baloo on a dataset consisting of 900 configuration points shows that the framework achieves a prediction error of less than 11% while saving up to 80% of the measurement effort.
Although the contributions themselves are orthogonally aligned, taken together they provide a holistic approach to performance management of modern cloud-native microservice applications.
Our contributions are a significant step forward as they specifically target novel and cloud-native software development and operation paradigms, surpassing the capabilities and limitations of previous approaches.
In addition, the research presented in this paper also has a significant impact on the industry, as the contributions were developed in collaboration with research teams from Nokia Bell Labs, Huawei, and Google.
Overall, our solutions open up new possibilities for managing and optimizing cloud applications and improve cost and energy efficiency.