OPUS Würzburg

9 search hits

1 to 9

Sort by

Ad Hoc Information Extraction in a Clinical Data Warehouse with Case Studies for Data Exploration and Consistency Checks (2019)

Dietrich, Georg

The importance of Clinical Data Warehouses (CDW) has increased significantly in recent years as they support or enable many applications such as clinical trials, data mining, and decision making. CDWs integrate Electronic Health Records which still contain a large amount of text data, such as discharge letters or reports on diagnostic findings in addition to structured and coded data like ICD-codes of diagnoses. Existing CDWs hardly support features to gain information covered in texts. Information extraction methods offer a solution for this problem but they have a high and long development effort, which can only be carried out by computer scientists. Moreover, such systems only exist for a few medical domains. This paper presents a method empowering clinicians to extract information from texts on their own. Medical concepts can be extracted ad hoc from e.g. discharge letters, thus physicians can work promptly and autonomously. The proposed system achieves these improvements by efficient data storage, preprocessing, and with powerful query features. Negations in texts are recognized and automatically excluded, as well as the context of information is determined and undesired facts are filtered, such as historical events or references to other persons (family history). Context-sensitive queries ensure the semantic integrity of the concepts to be extracted. A new feature not available in other CDWs is to query numerical concepts in texts and even filter them (e.g. BMI > 25). The retrieved values can be extracted and exported for further analysis. This technique is implemented within the efficient architecture of the PaDaWaN CDW and evaluated with comprehensive and complex tests. The results outperform similar approaches reported in the literature. Ad hoc IE determines the results in a few (milli-) seconds and a user friendly GUI enables interactive working, allowing flexible adaptation of the extraction. In addition, the applicability of this system is demonstrated in three real-world applications at the Würzburg University Hospital (UKW). Several drug trend studies are replicated: Findings of five studies on high blood pressure, atrial fibrillation and chronic renal failure can be partially or completely confirmed in the UKW. Another case study evaluates the prevalence of heart failure in inpatient hospitals using an algorithm that extracts information with ad hoc IE from discharge letters and echocardiogram report (e.g. LVEF < 45 ) and other sources of the hospital information system. This study reveals that the use of ICD codes leads to a significant underestimation (31%) of the true prevalence of heart failure. The third case study evaluates the consistency of diagnoses by comparing structured ICD-10-coded diagnoses with the diagnoses described in the diagnostic section of the discharge letter. These diagnoses are extracted from texts with ad hoc IE, using synonyms generated with a novel method. The developed approach can extract diagnoses from the discharge letter with a high accuracy and furthermore it can prove the degree of consistency between the coded and reported diagnoses.

An Optimization-Based Approach for Continuous Map Generalization (2019)

Peng, Dongliang

Maps are the main tool to represent geographical information. Users often zoom in and out to access maps at different scales. Continuous map generalization tries to make the changes between different scales smooth, which is essential to provide users with comfortable zooming experience. In order to achieve continuous map generalization with high quality, we optimize some important aspects of maps. In this book, we have used optimization in the generalization of land-cover areas, administrative boundaries, buildings, and coastlines. According to our experiments, continuous map generalization indeed benefits from optimization.

Automation in Software Performance Engineering Based on a Declarative Specification of Concerns (2019)

Walter, Jürgen Christian

Software performance is of particular relevance to software system design, operation, and evolution because it has a significant impact on key business indicators. During the life-cycle of a software system, its implementation, configuration, and deployment are subject to multiple changes that may affect the end-to-end performance characteristics. Consequently, performance analysts continually need to provide answers to and act based on performance-relevant concerns. To ensure a desired level of performance, software performance engineering provides a plethora of methods, techniques, and tools for measuring, modeling, and evaluating performance properties of software systems. However, the answering of performance concerns is subject to a significant semantic gap between the level on which performance concerns are formulated and the technical level on which performance evaluations are actually conducted. Performance evaluation approaches come with different strengths and limitations concerning, for example, accuracy, time-to-result, or system overhead. For the involved stakeholders, it can be an elaborate process to reasonably select, parameterize and correctly apply performance evaluation approaches, and to filter and interpret the obtained results. An additional challenge is that available performance evaluation artifacts may change over time, which requires to switch between different measurement-based and model-based performance evaluation approaches during the system evolution. At model-based analysis, the effort involved in creating performance models can also outweigh their benefits. To overcome the deficiencies and enable an automatic and holistic evaluation of performance throughout the software engineering life-cycle requires an approach that: (i) integrates multiple types of performance concerns and evaluation approaches, (ii) automates performance model creation, and (iii) automatically selects an evaluation methodology tailored to a specific scenario. This thesis presents a declarative approach —called Declarative Performance Engineering (DPE)— to automate performance evaluation based on a humanreadable specification of performance-related concerns. To this end, we separate the definition of performance concerns from their solution. The primary scientific contributions presented in this thesis are: A declarative language to express performance-related concerns and a corresponding processing framework: We provide a language to specify performance concerns independent of a concrete performance evaluation approach. Besides the specification of functional aspects, the language allows to include non-functional tradeoffs optionally. To answer these concerns, we provide a framework architecture and a corresponding reference implementation to process performance concerns automatically. It allows to integrate arbitrary performance evaluation approaches and is accompanied by reference implementations for model-based and measurement-based performance evaluation. Automated creation of architectural performance models from execution traces: The creation of performance models can be subject to significant efforts outweighing the benefits of model-based performance evaluation. We provide a model extraction framework that creates architectural performance models based on execution traces, provided by monitoring tools.The framework separates the derivation of generic information from model creation routines. To derive generic information, the framework combines state-of-the-art extraction and estimation techniques. We isolate object creation routines specified in a generic model builder interface based on concepts present in multiple performance-annotated architectural modeling formalisms. To create model extraction for a novel performance modeling formalism, developers only need to write object creation routines instead of creating model extraction software from scratch when reusing the generic framework. Automated and extensible decision support for performance evaluation approaches: We present a methodology and tooling for the automated selection of a performance evaluation approach tailored to the user concerns and application scenario. To this end, we propose to decouple the complexity of selecting a performance evaluation approach for a given scenario by providing solution approach capability models and a generic decision engine. The proposed capability meta-model enables to describe functional and non-functional capabilities of performance evaluation approaches and tools at different granularities. In contrast to existing tree-based decision support mechanisms, the decoupling approach allows to easily update characteristics of solution approaches as well as appending new rating criteria and thereby stay abreast of evolution in performance evaluation tooling and system technologies. Time-to-result estimation for model-based performance prediction: The time required to execute a model-based analysis plays an important role in different decision processes. For example, evaluation scenarios might require the prediction results to be available in a limited period of time such that the system can be adapted in time to ensure the desired quality of service. We propose a method to estimate the time-to-result for modelbased performance prediction based on model characteristics and analysis parametrization. We learn a prediction model using performancerelevant features thatwe determined using statistical tests. We implement the approach and demonstrate its practicability by applying it to analyze a simulation-based multi-step performance evaluation approach for a representative architectural performance modeling formalism. We validate each of the contributions based on representative case studies. The evaluation of automatic performance model extraction for two case study systems shows that the resulting models can accurately predict the performance behavior. Prediction accuracy errors are below 3% for resource utilization and mostly less than 20% for service response time. The separate evaluation of the reusability shows that the presented approach lowers the implementation efforts for automated model extraction tools by up to 91%. Based on two case studies applying measurement-based and model-based performance evaluation techniques, we demonstrate the suitability of the declarative performance engineering framework to answer multiple kinds of performance concerns customized to non-functional goals. Subsequently, we discuss reduced efforts in applying performance analyses using the integrated and automated declarative approach. Also, the evaluation of the declarative framework reviews benefits and savings integrating performance evaluation approaches into the declarative performance engineering framework. We demonstrate the applicability of the decision framework for performance evaluation approaches by applying it to depict existing decision trees. Then, we show how we can quickly adapt to the evolution of performance evaluation methods which is challenging for static tree-based decision support systems. At this, we show how to cope with the evolution of functional and non-functional capabilities of performance evaluation software and explain how to integrate new approaches. Finally, we evaluate the accuracy of the time-to-result estimation for a set of machinelearning algorithms and different training datasets. The predictions exhibit a mean percentage error below 20%, which can be further improved by including performance evaluations of the considered model into the training data. The presented contributions represent a significant step towards an integrated performance engineering process that combines the strengths of model-based and measurement-based performance evaluation. The proposed performance concern language in conjunction with the processing framework significantly reduces the complexity of applying performance evaluations for all stakeholders. Thereby it enables performance awareness throughout the software engineering life-cycle. The proposed performance concern language removes the semantic gap between the level on which performance concerns are formulated and the technical level on which performance evaluations are actually conducted by the user.

Extracting and Learning Semantics from Social Web Data (2019)

Niebler, Thomas

Making machines understand natural language is a dream of mankind that existed since a very long time. Early attempts at programming machines to converse with humans in a supposedly intelligent way with humans relied on phrase lists and simple keyword matching. However, such approaches cannot provide semantically adequate answers, as they do not consider the specific meaning of the conversation. Thus, if we want to enable machines to actually understand language, we need to be able to access semantically relevant background knowledge. For this, it is possible to query so-called ontologies, which are large networks containing knowledge about real-world entities and their semantic relations. However, creating such ontologies is a tedious task, as often extensive expert knowledge is required. Thus, we need to find ways to automatically construct and update ontologies that fit human intuition of semantics and semantic relations. More specifically, we need to determine semantic entities and find relations between them. While this is usually done on large corpora of unstructured text, previous work has shown that we can at least facilitate the first issue of extracting entities by considering special data such as tagging data or human navigational paths. Here, we do not need to detect the actual semantic entities, as they are already provided because of the way those data are collected. Thus we can mainly focus on the problem of assessing the degree of semantic relatedness between tags or web pages. However, there exist several issues which need to be overcome, if we want to approximate human intuition of semantic relatedness. For this, it is necessary to represent words and concepts in a way that allows easy and highly precise semantic characterization. This also largely depends on the quality of data from which these representations are constructed. In this thesis, we extract semantic information from both tagging data created by users of social tagging systems and human navigation data in different semantic-driven social web systems. Our main goal is to construct high quality and robust vector representations of words which can the be used to measure the relatedness of semantic concepts. First, we show that navigation in the social media systems Wikipedia and BibSonomy is driven by a semantic component. After this, we discuss and extend methods to model the semantic information in tagging data as low-dimensional vectors. Furthermore, we show that tagging pragmatics influences different facets of tagging semantics. We then investigate the usefulness of human navigational paths in several different settings on Wikipedia and BibSonomy for measuring semantic relatedness. Finally, we propose a metric-learning based algorithm in adapt pre-trained word embeddings to datasets containing human judgment of semantic relatedness. This work contributes to the field of studying semantic relatedness between words by proposing methods to extract semantic relatedness from web navigation, learn highquality and low-dimensional word representations from tagging data, and to learn semantic relatedness from any kind of vector representation by exploiting human feedback. Applications first and foremest lie in ontology learning for the Semantic Web, but also semantic search or query expansion.

Measuring, Rating, and Predicting the Energy Efficiency of Servers (2019)

von Kistowski, Jóakim Gunnarsson

Energy efficiency of computing systems has become an increasingly important issue over the last decades. In 2015, data centers were responsible for 2% of the world's greenhouse gas emissions, which is roughly the same as the amount produced by air travel. In addition to these environmental concerns, power consumption of servers in data centers results in significant operating costs, which increase by at least 10% each year. To address this challenge, the U.S. EPA and other government agencies are considering the use of novel measurement methods in order to label the energy efficiency of servers. The energy efficiency and power consumption of a server is subject to a great number of factors, including, but not limited to, hardware, software stack, workload, and load level. This huge number of influencing factors makes measuring and rating of energy efficiency challenging. It also makes it difficult to find an energy-efficient server for a specific use-case. Among others, server provisioners, operators, and regulators would profit from information on the servers in question and on the factors that affect those servers' power consumption and efficiency. However, we see a lack of measurement methods and metrics for energy efficiency of the systems under consideration. Even assuming that a measurement methodology existed, making decisions based on its results would be challenging. Power prediction methods that make use of these results would aid in decision making. They would enable potential server customers to make better purchasing decisions and help operators predict the effects of potential reconfigurations. Existing energy efficiency benchmarks cannot fully address these challenges, as they only measure single applications at limited sets of load levels. In addition, existing efficiency metrics are not helpful in this context, as they are usually a variation of the simple performance per power ratio, which is only applicable to single workloads at a single load level. Existing data center efficiency metrics, on the other hand, express the efficiency of the data center space and power infrastructure, not focusing on the efficiency of the servers themselves. Power prediction methods for not-yet-available systems that could make use of the results provided by a comprehensive power rating methodology are also lacking. Existing power prediction models for hardware designers have a very fine level of granularity and detail that would not be useful for data center operators. This thesis presents a measurement and rating methodology for energy efficiency of servers and an energy efficiency metric to be applied to the results of this methodology. We also design workloads, load intensity and distribution models, and mechanisms that can be used for energy efficiency testing. Based on this, we present power prediction mechanisms and models that utilize our measurement methodology and its results for power prediction. Specifically, the six major contributions of this thesis are: We present a measurement methodology and metrics for energy efficiency rating of servers that use multiple, specifically chosen workloads at different load levels for a full system characterization. We evaluate the methodology and metric with regard to their reproducibility, fairness, and relevance. We investigate the power and performance variations of test results and show fairness of the metric through a mathematical proof and a correlation analysis on a set of 385 servers. We evaluate the metric's relevance by showing the relationships that can be established between metric results and third-party applications. We create models and extraction mechanisms for load profiles that vary over time, as well as load distribution mechanisms and policies. The models are designed to be used to define arbitrary dynamic load intensity profiles that can be leveraged for benchmarking purposes. The load distribution mechanisms place workloads on computing resources in a hierarchical manner. Our load intensity models can be extracted in less than 0.2 seconds and our resulting models feature a median modeling error of 12.7% on average. In addition, our new load distribution strategy can save up to 10.7% of power consumption on a single server node. We introduce an approach to create small-scale workloads that emulate the power consumption-relevant behavior of large-scale workloads by approximating their CPU performance counter profile, and we introduce TeaStore, a distributed, micro-service-based reference application. TeaStore can be used to evaluate power and performance model accuracy, elasticity of cloud auto-scalers, and the effectiveness of power saving mechanisms for distributed systems. We show that we are capable of emulating the power consumption behavior of realistic workloads with a mean deviation less than 10% and down to 0.2 watts (1%). We demonstrate the use of TeaStore in the context of performance model extraction and cloud auto-scaling also showing that it may generate workloads with different effects on the power consumption of the system under consideration. We present a method for automated selection of interpolation strategies for performance and power characterization. We also introduce a configuration approach for polynomial interpolation functions of varying degrees that improves prediction accuracy for system power consumption for a given system utilization. We show that, in comparison to regression, our automated interpolation method selection and configuration approach improves modeling accuracy by 43.6% if additional reference data is available and by 31.4% if it is not. We present an approach for explicit modeling of the impact a virtualized environment has on power consumption and a method to predict the power consumption of a software application. Both methods use results produced by our measurement methodology to predict the respective power consumption for servers that are otherwise not available to the person making the prediction. Our methods are able to predict power consumption reliably for multiple hypervisor configurations and for the target application workloads. Application workload power prediction features a mean average absolute percentage error of 9.5%. Finally, we propose an end-to-end modeling approach for predicting the power consumption of component placements at run-time. The model can also be used to predict the power consumption at load levels that have not yet been observed on the running system. We show that we can predict the power consumption of two different distributed web applications with a mean absolute percentage error of 2.2%. In addition, we can predict the power consumption of a system at a previously unobserved load level and component distribution with an error of 1.2%. The contributions of this thesis already show a significant impact in science and industry. The presented efficiency rating methodology, including its metric, have been adopted by the U.S. EPA in the latest version of the ENERGY STAR Computer Server program. They are also being considered by additional regulatory agencies, including the EU Commission and the China National Institute of Standardization. In addition, the methodology's implementation and the underlying methodology itself have already found use in several research publications. Regarding future work, we see a need for new workloads targeting specialized server hardware. At the moment, we are witnessing a shift in execution hardware to specialized machine learning chips, general purpose GPU computing, FPGAs being embedded into compute servers, etc. To ensure that our measurement methodology remains relevant, workloads covering these areas are required. Similarly, power prediction models must be extended to cover these new scenarios.

Magnetic Attitude Control of Miniature Satellites and its Extension towards Orbit Control using an Electric Propulsion System (2019)

Bangert, Philip

The attitude and orbit control system of pico- and nano-satellites to date is one of the bottle necks for future scientific and commercial applications. A performance increase while keeping with the satellites’ restrictions will enable new space missions especially for the smallest of the CubeSat classes. This work addresses methods to measure and improve the satellite’s attitude pointing and orbit control performance based on advanced sensor data analysis and optimized on-board software concepts. These methods are applied to spaceborne satellites and future CubeSat missions to demonstrate their validity. An in-orbit calibration procedure for a typical CubeSat attitude sensor suite is developed and applied to the UWE-3 satellite in space. Subsequently, a method to estimate the attitude determination accuracy without the help of an external reference sensor is developed. Using this method, it is shown that the UWE-3 satellite achieves an in-orbit attitude determination accuracy of about 2°. An advanced data analysis of the attitude motion of a miniature satellite is used in order to estimate the main attitude disturbance torque in orbit. It is shown, that the magnetic disturbance is by far the most significant contribution for miniature satellites and a method to estimate the residual magnetic dipole moment of a satellite is developed. Its application to three CubeSats currently in orbit reveals that magnetic disturbances are a common issue for this class of satellites. The dipole moments measured are between 23.1mAm² and 137.2mAm². In order to autonomously estimate and counteract this disturbance in future missions an on-board magnetic dipole estimation algorithm is developed. The autonomous neutralization of such disturbance torques together with the simplification of attitude control for the satellite operator is the focus of a novel on-board attitude control software architecture. It incorporates disturbance torques acting on the satellite and automatically optimizes the control output. Its application is demonstrated in space on board of the UWE-3 satellite through various attitude control experiments of which the results are presented here. The integration of a miniaturized electric propulsion system will enable CubeSats to perform orbit control and, thus, open up new application scenarios. The in-orbit characterization, however, poses the problem of precisely measuring very low thrust levels in the order of µN. A method to measure this thrust based on the attitude dynamics of the satellite is developed and evaluated in simulation. It is shown, that the demonstrator mission UWE-4 will be able to measure these thrust levels with a high accuracy of 1% for thrust levels higher than 1µN. The orbit control capabilities of UWE-4 using its electric propulsion system are evaluated and a hybrid attitude control system making use of the satellite’s magnetorquers and the electric propulsion system is developed. It is based on the flexible attitude control architecture mentioned before and thrust vector pointing accuracies of better than 2° can be achieved. This results in a thrust delivery of more than 99% of the desired acceleration in the target direction.

Resilience, Availabilty, and Serviceability Evaluation in Software-defined Networks (2019)

Metter, Christopher Valentin

With the introduction of Software-defined Networking (SDN) in the late 2000s, not only a new research field has been created, but a paradigm shift was initiated in the broad field of networking. The programmable network control by SDN is a big step, but also a stumbling block for many of the established network operators and vendors. As with any new technology the question about the maturity and the productionreadiness of it arises. Therefore, this thesis picks specific features of SDN and analyzes its performance, reliability, and availability in scenarios that can be expected in production deployments. The first SDN topic is the performance impact of application traffic in the data plane on the control plane. Second, reliability and availability concerns of SDN deployments are exemplary analyzed by evaluating the detection performance of a common SDN controller. Thirdly, the performance of P4, a technology that enhances SDN, or better its impact of certain control operations on the processing performance is evaluated.

Intelligent analysis of medical data in a generic telemedicine infrastructure (2019)

Albert, Michael

Telemedicine uses telecommunication and information technology to provide health care services over spatial distances. In the upcoming demographic changes towards an older average population age, especially rural areas suffer from a decreasing doctor to patient ratio as well as a limited amount of available medical specialists in acceptable distance. These areas could benefit the most from telemedicine applications as they are known to improve access to medical services, medical expertise and can also help to mitigate critical or emergency situations. Although the possibilities of telemedicine applications exist in the entire range of healthcare, current systems focus on one specific disease while using dedicated hardware to connect the patient with the supervising telemedicine center. This thesis describes the development of a telemedical system which follows a new generic design approach. This bridges the gap of existing approaches that only tackle one specific application. The proposed system on the contrary aims at supporting as many diseases and use cases as possible by taking all the stakeholders into account at the same time. To address the usability and acceptance of the system it is designed to use standardized hardware like commercial medical sensors and smartphones for collecting medical data of the patients and transmitting them to the telemedical center. The smartphone can also act as interface to the patient for health questionnaires or feedback. The system can handle the collection and transport of medical data, analysis and visualization of the data as well as providing a real time communication with video and audio between the users. On top of the generic telemedical framework the issue of scalability is addressed by integrating a rule-based analysis tool for the medical data. Rules can be easily created by medical personnel via a visual editor and can be personalized for each patient. The rule-based analysis tool is extended by multiple options for visualization of the data, mechanisms to handle complex rules and options for performing actions like raising alarms or sending automated messages. It is sometimes hard for the medical experts to formulate their knowledge into rules and there may be information in the medical data that is not yet known. This is why a machine learning module was integrated into the system. It uses the incoming medical data of the patients to learn new rules that are then presented to the medical personnel for inspection. This is in line with European legislation where the human still needs to be in charge of such decisions. Overall, we were able to show the benefit of the generic approach by evaluating it in three completely different medical use cases derived from specific application needs: monitoring of COPD (chronic obstructive pulmonary disease) patients, support of patients performing dialysis at home and councils of intensive-care experts. In addition the system was used for a non-medical use case: monitoring and optimization of industrial machines and robots. In all of the mentioned cases, we were able to prove the robustness of the generic approach with real users of the corresponding domain. This is why we can propose this approach for future development of telemedical systems.

Optimization of Controller Placement and Information Flow in Softwarized Networks (2019)

Lange, Stanislav

The Software Defined Networking (SDN) paradigm offers network operators numerous improvements in terms of flexibility, scalability, as well as cost efficiency and vendor independence. However, in order to maximize the benefit from these features, several new challenges in areas such as management and orchestration need to be addressed. This dissertation makes contributions towards three key topics from these areas. Firstly, we design, implement, and evaluate two multi-objective heuristics for the SDN controller placement problem. Secondly, we develop and apply mechanisms for automated decision making based on the Pareto frontiers that are returned by the multi-objective optimizers. Finally, we investigate and quantify the performance benefits for the SDN control plane that can be achieved by integrating information from external entities such as Network Management Systems (NMSs) into the control loop. Our evaluation results demonstrate the impact of optimizing various parameters of softwarized networks at different levels and are used to derive guidelines for an efficient operation.

1 to 9

Refine

Has Fulltext

Is part of the Bibliography

Year of publication

Document Type

Language

Keywords

Author

Institute

Sonstige beteiligte Institutionen

9 search hits