Refine
Has Fulltext
- yes (381)
Year of publication
Document Type
- Doctoral Thesis (165)
- Journal article (147)
- Working Paper (40)
- Conference Proceeding (11)
- Report (7)
- Master Thesis (6)
- Bachelor Thesis (3)
- Book (1)
- Study Thesis (term paper) (1)
Language
- English (342)
- German (38)
- Multiple languages (1)
Keywords
- Leistungsbewertung (29)
- virtual reality (19)
- Datennetz (14)
- Quality of Experience (12)
- Netzwerk (10)
- Robotik (10)
- machine learning (9)
- Kleinsatellit (8)
- Modellierung (8)
- Simulation (8)
Institute
- Institut für Informatik (381) (remove)
Schriftenreihe
Sonstige beteiligte Institutionen
- Cologne Game Lab (3)
- Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Raumfahrtsysteme (2)
- Open University of the Netherlands (2)
- Siemens AG (2)
- Zentrum für Telematik e.V. (2)
- Airbus Defence and Space GmbH (1)
- Beuth Hochschule für Technik Berlin (1)
- Birmingham City University (1)
- California Institute of Technology (1)
- DLR (1)
In recent years, great progress has been made in the area of Artificial Intelligence (AI) due to the possibilities of Deep Learning which steadily yielded new state-of-the-art results especially in many image recognition tasks.
Currently, in some areas, human performance is achieved or already exceeded.
This great development already had an impact on the area of Optical Music Recognition (OMR) as several novel methods relying on Deep Learning succeeded in specific tasks.
Musicologists are interested in large-scale musical analysis and in publishing digital transcriptions in a collection enabling to develop tools for searching and data retrieving.
The application of OMR promises to simplify and thus speed-up the transcription process by either providing fully-automatic or semi-automatic approaches.
This thesis focuses on the automatic transcription of Medieval music with a focus on square notation which poses a challenging task due to complex layouts, highly varying handwritten notations, and degradation.
However, since handwritten music notations are quite complex to read, even for an experienced musicologist, it is to be expected that even with new techniques of OMR manual corrections are required to obtain the transcriptions.
This thesis presents several new approaches and open source software solutions for layout analysis and Automatic Text Recognition (ATR) for early documents and for OMR of Medieval manuscripts providing state-of-the-art technology.
Fully Convolutional Networks (FCN) are applied for the segmentation of historical manuscripts and early printed books, to detect staff lines, and to recognize neume notations.
The ATR engine Calamari is presented which allows for ATR of early prints and also the recognition of lyrics.
Configurable CNN/LSTM-network architectures which are trained with the segmentation-free CTC-loss are applied to the sequential recognition of text but also monophonic music.
Finally, a syllable-to-neume assignment algorithm is presented which represents the final step to obtain a complete transcription of the music.
The evaluations show that the performances of any algorithm is highly depending on the material at hand and the number of training instances.
The presented staff line detection correctly identifies staff lines and staves with an $F_1$-score of above $99.5\%$.
The symbol recognition yields a diplomatic Symbol Accuracy Rate (dSAR) of above $90\%$ by counting the number of correct predictions in the symbols sequence normalized by its length.
The ATR of lyrics achieved a Character Error Rate (CAR) (equivalently the number of correct predictions normalized by the sentence length) of above $93\%$ trained on 771 lyric lines of Medieval manuscripts and of 99.89\% when training on around 3.5 million lines of contemporary printed fonts.
The assignment of syllables and their corresponding neumes reached $F_1$-scores of up to $99.2\%$.
A direct comparison to previously published performances is difficult due to different materials and metrics.
However, estimations show that the reported values of this thesis exceed the state-of-the-art in the area of square notation.
A further goal of this thesis is to enable musicologists without technical background to apply the developed algorithms in a complete workflow by providing a user-friendly and comfortable Graphical User Interface (GUI) encapsulating the technical details.
For this purpose, this thesis presents the web-application OMMR4all.
Its fully-functional workflow includes the proposed state-of-the-art machine-learning algorithms and optionally allows for a manual intervention at any stage to correct the output preventing error propagation.
To simplify the manual (post-) correction, OMMR4all provides an overlay-editor that superimposes the annotations with a scan of the original manuscripts so that errors can easily be spotted.
The workflow is designed to be iteratively improvable by training better models as soon as new Ground Truth (GT) is available.
Failure prediction is an important aspect of self-aware computing systems. Therefore, a multitude of different approaches has been proposed in the literature over the past few years. In this work, we propose a taxonomy for organizing works focusing on the prediction of Service Level Objective (SLO) failures. Our taxonomy classifies related work along the dimensions of the prediction target (e.g., anomaly detection, performance prediction, or failure prediction), the time horizon (e.g., detection or prediction, online or offline application), and the applied modeling type (e.g., time series forecasting, machine learning, or queueing theory). The classification is derived based on a systematic mapping of relevant papers in the area. Additionally, we give an overview of different techniques in each sub-group and address remaining challenges in order to guide future research.
In the present day, unmanned aerial vehicles become seemingly more popular every year, but, without regulation of the increasing number of these vehicles, the air space could become chaotic and uncontrollable. In this work, a framework is proposed to combine self-aware computing with multirotor formations to address this problem. The self-awareness is envisioned to improve the dynamic behavior of multirotors. The formation scheme that is implemented is called platooning, which arranges vehicles in a string behind the lead vehicle and is proposed to bring order into chaotic air space. Since multirotors define a general category of unmanned aerial vehicles, the focus of this thesis are quadcopters, platforms with four rotors. A modification for the LRA-M self-awareness loop is proposed and named Platooning Awareness. The implemented framework is able to offer two flight modes that enable waypoint following and the self-awareness module to find a path through scenarios, where obstacles are present on the way, onto a goal position. The evaluation of this work shows that the proposed framework is able to use self-awareness to learn about its environment, avoid obstacles, and can successfully move a platoon of drones through multiple scenarios.
An Intelligent Semi-Automatic Workflow for Optical Character Recognition of Historical Printings
(2020)
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years great progress has been made in the area of historical OCR resulting in several powerful open-source tools for preprocessing, layout analysis and segmentation, Automatic Text Recognition (ATR) and postcorrection. Their major drawback is that they only offer limited applicability by non-technical users like humanist scholars, in particular when it comes to the combined use of several tools in a workflow. Furthermore, depending on the material, these tools are usually not able to fully automatically achieve sufficiently low error rates, let alone perfect results, creating a demand for an interactive postcorrection functionality which, however, is generally not incorporated.
This thesis addresses these issues by presenting an open-source OCR software called OCR4all which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow. While a variety of materials can already be processed fully automatically, books with more complex layouts require manual intervention by the users. This is mostly due to the fact that the required Ground Truth (GT) for training stronger mixed models (for segmentation as well as text recognition) is not available, yet, neither in the desired quantity nor quality.
To deal with this issue in the short run, OCR4all offers better recognition capabilities in combination with a very comfortable Graphical User Interface (GUI) that allows error corrections not only in the final output, but already in early stages to minimize error propagation. In the long run this constant manual correction produces large quantities of valuable, high quality training material which can be used to improve fully automatic approaches. Further on, extensive configuration capabilities are provided to set the degree of automation of the workflow and to make adaptations to the carefully selected default parameters for specific printings, if necessary. The architecture of OCR4all allows for an easy integration (or substitution) of newly developed tools for its main components by supporting standardized interfaces like PageXML, thus aiming at continual higher automation for historical printings.
In addition to OCR4all, several methodical extensions in the form of accuracy improving techniques for training and recognition are presented. Most notably an effective, sophisticated, and adaptable voting methodology using a single ATR engine, a pretraining procedure, and an Active Learning (AL) component are proposed. Experiments showed that combining pretraining and voting significantly improves the effectiveness of book-specific training, reducing the obtained Character Error Rates (CERs) by more than 50%.
The proposed extensions were further evaluated during two real world case studies: First, the voting and pretraining techniques are transferred to the task of constructing so-called mixed models which are trained on a variety of different fonts. This was done by using 19th century Fraktur script as an example, resulting in a considerable improvement over a variety of existing open-source and commercial engines and models. Second, the extension from ATR on raw text to the adjacent topic of typography recognition was successfully addressed by thoroughly indexing a historical lexicon that heavily relies on different font types in order to encode its complex semantic structure.
During the main experiments on very complex early printed books even users with minimal or no experience were able to not only comfortably deal with the challenges presented by the complex layout, but also to recognize the text with manageable effort and great quality, achieving excellent CERs below 0.5%. Furthermore, the fully automated application on 19th century novels showed that OCR4all (average CER of 0.85%) can considerably outperform the commercial state-of-the-art tool ABBYY Finereader (5.3%) on moderate layouts if suitably pretrained mixed ATR models are available.
Recent advances in Natural Language Preprocessing (NLP) allow for a fully automatic extraction of character networks for an incoming text. These networks serve as a compact and easy to grasp representation of literary fiction. They offer an aggregated view of the text, which can be used during distant reading approaches for the analysis of literary hypotheses. In their core, the networks consist of nodes, which represent literary characters, and edges, which represent relations between characters. For an automatic extraction of such a network, the first step is the detection of the references of all fictional entities that are of importance for a text. References to the fictional entities appear in the form of names, noun phrases and pronouns and prior to this work, no components capable of automatic detection of character references were available. Existing tools are only capable of detecting proper nouns, a subset of all character references. When evaluated on the task of detecting proper nouns in the domain of literary fiction, they still underperform at an F1-score of just about 50%. This thesis uses techniques from the field of semi-supervised learning, such as Distant supervision and Generalized Expectations, and improves the results of an existing tool to about 82%, when evaluated on all three categories in literary fiction, but without the need for annotated data in the target domain. However, since this quality is still not sufficient, the decision to annotate DROC, a corpus comprising 90 fragments of German novels was made. This resulted in a new general purpose annotation environment titled as ATHEN, as well as annotated data that spans about 500.000 tokens in total. Using this data, the combination of supervised algorithms and a tailored rule based algorithm, which in combination are able to exploit both - local consistencies as well as global consistencies - yield an algorithm with an F1-score of about 93%. This component is referred to as the Kallimachos tagger.
A character network can not directly display references however, instead they need to be clustered so that all references that belong to a real world or fictional entity are grouped together. This process widely known as coreference resolution is a hard problem in the focus of research for more than half a century. This work experimented with adaptations of classical feature based machine learning, with a dedicated rule based algorithm and with modern techniques of Deep Learning, but no approach can surpass 55% B-Cubed F1, when evaluated on DROC. Due to this barrier, many researchers do not use a fully-fledged coreference resolution when they extract character networks, but only focus on a more forgiving subset- the names. For novels such as Alice's Adventures in Wonderland by Lewis Caroll, this would however only result in a network in which many important characters are missing. In order to integrate important characters into the network that are not named by the author, this work makes use of automatic detection of speaker and addressees for direct speech utterances (all entities involved in a dialog are considered to be of importance). This problem is by itself not an easy task, however the most successful system analysed in this thesis is able to correctly determine the speaker to about 85% of the utterances as well as about 65% of the addressees. This speaker information can not only help to identify the most dominant characters, but also serves as a way to model the relations between entities.
During the span of this work, components have been developed to model relations between characters using speaker attribution, using co-occurrences as well as by the usage of true interactions, for which yet again a dataset was annotated using ATHEN. Furthermore, since relations between characters are usually typed, a component for the extraction of a typed relation was developed. Similar to the experiments for the character reference detection, a combination of a rule based and a Maximum Entropy classifier yielded the best overall results, with the extraction of family relations showing a score of about 80% and the quality of love relations with a score of about 50%. For family relations, a kernel for a Support Vector Machine was developed that even exceeded the scores of the combined approach but is behind on the other labels.
In addition, this work presents new ways to evaluate automatically extracted networks without the need of domain experts, instead it relies on the usage of expert summaries. It also refrains from the uses of social network analysis for the evaluation, but instead presents ranked evaluations using Precision@k and the Spearman Rank correlation coefficient for the evaluation of the nodes and edges of the network. An analysis using these metrics showed, that the central characters of a novel are contained with high probability but the quality drops rather fast if more than five entities are analyzed. The quality of the edges is mainly dominated by the quality of the coreference resolution and the correlation coefficient between gold edges and system edges therefore varies between 30 and 60%.
All developed components are aggregated alongside a large set of other preprocessing modules in the Kallimachos pipeline and can be reused without any restrictions.
The DFG project “SDN-enabled Application-aware Network Control Architectures and their Performance Assessment” (DFG SDN-App) focused in phase 1 (Jan 2017 – Dec 2019) on software defined networking (SDN). Being a fundamental paradigm shift, SDN enables a remote control of networking devices made by different vendors from a logically centralized controller. In principle, this enables a more dynamic and flexible management of network resources compared to the traditional legacy networks. Phase 1 focused on multimedia applications and their users’ Quality of Experience (QoE).
This documents reports the achievements of the first phase (Jan 2017 – Dec 2019), which is jointly carried out by the Technical University of Munich, Technical University of Berlin, and University of Würzburg. The project started at the institutions in Munich and Würzburg in January 2017 and lasted until December 2019.
In Phase 1, the project targeted the development of fundamental control mechanisms for network-aware application control and application-aware network control in Software Defined Networks (SDN) so to enhance the user perceived quality (QoE). The idea is to leverage the QoE from multiple applications as control input parameter for application-and network control mechanisms. These mechanisms are implemented by an Application Control Plane (ACP) and a Network Control Plane (NCP). In order to obtain a global view of the current system state, applications and network parameters are monitored and communicated to the respective control plane interface. Network and application information and their demands are exchanged between the control planes so to derive appropriate control actions. To this end, a methodology is developed to assess the application performance and in particular the QoE. This requires an appropriate QoE modeling of the applications considered in the project as well as metrics like QoE fairness to be utilized within QoE management.
In summary, the application-network interaction can improve the QoE for multi-application scenarios. This is ensured by utilizing information from the application layer, which are mapped by appropriate QoS-QoE models to QoE within a network control plane. On the other hand, network information is monitored and communicated to the application control plane. Network and application information and their demands are exchanged between the control planes so to derive appropriate control actions.
Von technischen Systemen wird in der heutigen Zeit erwartet, dass diese stets fehlerfrei funktionieren, um einen reibungslosen Ablauf des Alltags zu gewährleisten. Technische Systeme jedoch können Defekte aufweisen, die deren Funktionsweise einschränken oder zu deren Totalausfall führen können. Grundsätzlich zeigen sich Defekte durch eine Veränderung im Verhalten von einzelnen Komponenten. Diese Abweichungen vom Nominalverhalten nehmen dabei an Intensität zu, je näher die entsprechende Komponente an einem Totalausfall ist. Aus diesem Grund sollte das Fehlverhalten von Komponenten rechtzeitig erkannt werden, um permanenten Schaden zu verhindern. Von besonderer Bedeutung ist dies für die Luft- und Raumfahrt. Bei einem Satelliten kann keine Wartung seiner Komponenten durchgeführt werden, wenn er sich bereits im Orbit befindet. Der Defekt einer einzelnen Komponente, wie der Batterie der Energieversorgung, kann hierbei den Verlust der gesamten Mission bedeuten. Grundsätzlich lässt sich Fehlererkennung manuell durchführen, wie es im Satellitenbetrieb oft üblich ist. Hierfür muss ein menschlicher Experte, ein sogenannter Operator, das System überwachen. Diese Form der Überwachung ist allerdings stark von der Zeit, Verfügbarkeit und Expertise des Operators, der die Überwachung durchführt, abhängig. Ein anderer Ansatz ist die Verwendung eines dedizierten Diagnosesystems. Dieses kann das technische System permanent überwachen und selbstständig Diagnosen berechnen. Die Diagnosen können dann durch einen Experten eingesehen werden, der auf ihrer Basis Aktionen durchführen kann. Das in dieser Arbeit vorgestellte modellbasierte Diagnosesystem verwendet ein quantitatives Modell eines technischen Systems, das dessen Nominalverhalten beschreibt. Das beobachtete Verhalten des technischen Systems, gegeben durch Messwerte, wird mit seinem erwarteten Verhalten, gegeben durch simulierte Werte des Modells, verglichen und Diskrepanzen bestimmt. Jede Diskrepanz ist dabei ein Symptom. Diagnosen werden dadurch berechnet, dass zunächst zu jedem Symptom eine sogenannte Konfliktmenge berechnet wird. Dies ist eine Menge von Komponenten, sodass der Defekt einer dieser Komponenten das entsprechende Symptom erklären könnte. Mithilfe dieser Konfliktmengen werden sogenannte Treffermengen berechnet. Eine Treffermenge ist eine Menge von Komponenten, sodass der gleichzeitige Defekt aller Komponenten dieser Menge alle beobachteten Symptome erklären könnte. Jede minimale Treffermenge entspricht dabei einer Diagnose. Zur Berechnung dieser Mengen nutzt das Diagnosesystem ein Verfahren, bei dem zunächst abhängige Komponenten bestimmt werden und diese von symptombehafteten Komponenten belastet und von korrekt funktionierenden Komponenten entlastet werden. Für die einzelnen Komponenten werden Bewertungen auf Basis dieser Be- und Entlastungen berechnet und mit ihnen Diagnosen gestellt. Da das Diagnosesystem auf ausreichend genaue Modelle angewiesen ist und die manuelle Kalibrierung dieser Modelle mit erheblichem Aufwand verbunden ist, wurde ein Verfahren zur automatischen Kalibrierung entwickelt. Dieses verwendet einen Zyklischen Genetischen Algorithmus, um mithilfe von aufgezeichneten Werten der realen Komponenten Modellparameter zu bestimmen, sodass die Modelle die aufgezeichneten Daten möglichst gut reproduzieren können. Zur Evaluation der automatischen Kalibrierung wurden ein Testaufbau und verschiedene dynamische und manuell schwierig zu kalibrierende Komponenten des Qualifikationsmodells eines realen Nanosatelliten, dem SONATE-Nanosatelliten modelliert und kalibriert. Der Testaufbau bestand dabei aus einem Batteriepack, einem Laderegler, einem Tiefentladeschutz, einem Entladeregler, einem Stepper Motor HAT und einem Motor. Er wurde zusätzlich zur automatischen Kalibrierung unabhängig manuell kalibriert. Die automatisch kalibrierten Satellitenkomponenten waren ein Reaktionsrad, ein Entladeregler, Magnetspulen, bestehend aus einer Ferritkernspule und zwei Luftspulen, eine Abschlussleiterplatine und eine Batterie. Zur Evaluation des Diagnosesystems wurde die Energieversorgung des Qualifikationsmodells des SONATE-Nanosatelliten modelliert. Für die Batterien, die Entladeregler, die Magnetspulen und die Reaktionsräder wurden die vorher automatisch kalibrierten Modelle genutzt. Für das Modell der Energieversorgung wurden Fehler simuliert und diese diagnostiziert. Die Ergebnisse der Evaluation der automatischen Kalibrierung waren, dass die automatische Kalibrierung eine mit der manuellen Kalibrierung vergleichbare Genauigkeit für den Testaufbau lieferte und diese sogar leicht übertraf und dass die automatisch kalibrierten Satellitenkomponenten eine durchweg hohe Genauigkeit aufwiesen und damit für den Einsatz im Diagnosesystem geeignet waren. Die Ergebnisse der Evaluation des Diagnosesystems waren, dass die simulierten Fehler zuverlässig gefunden wurden und dass das Diagnosesystem in der Lage war die plausiblen Ursachen dieser Fehler zu diagnostizieren.
Semantic Fusion for Natural Multimodal Interfaces using Concurrent Augmented Transition Networks
(2018)
Semantic fusion is a central requirement of many multimodal interfaces. Procedural methods like finite-state transducers and augmented transition networks have proven to be beneficial to implement semantic fusion. They are compliant with rapid development cycles that are common for the development of user interfaces, in contrast to machine-learning approaches that require time-costly training and optimization. We identify seven fundamental requirements for the implementation of semantic fusion: Action derivation, continuous feedback, context-sensitivity, temporal relation support, access to the interaction context, as well as the support of chronologically unsorted and probabilistic input. A subsequent analysis reveals, however, that there is currently no solution for fulfilling the latter two requirements. As the main contribution of this article, we thus present the Concurrent Cursor concept to compensate these shortcomings. In addition, we showcase a reference implementation, the Concurrent Augmented Transition Network (cATN), that validates the concept’s feasibility in a series of proof of concept demonstrations as well as through a comparative benchmark. The cATN fulfills all identified requirements and fills the lack amongst previous solutions. It supports the rapid prototyping of multimodal interfaces by means of five concrete traits: Its declarative nature, the recursiveness of the underlying transition network, the network abstraction constructs of its description language, the utilized semantic queries, and an abstraction layer for lexical information. Our reference implementation was and is used in various student projects, theses, as well as master-level courses. It is openly available and showcases that non-experts can effectively implement multimodal interfaces, even for non-trivial applications in mixed and virtual reality.
Asynchronous Traffic Shaping enabled bounded latency with low complexity for time sensitive networking without the need for time synchronization. However, its main focus is the guaranteed maximum delay. Jitter-sensitive applications may still be forced towards synchronization. This work proposes traffic damping to reduce end-to-end delay jitter. It discusses its application and shows that both the prerequisites and the guaranteed delay of traffic damping and ATS are very similar. Finally, it presents a brief evaluation of delay jitter in an example topology by means of a simulation and worst case estimation.
Background
Medication trend studies show the changes of medication over the years and may be replicated using a clinical Data Warehouse (CDW). Even nowadays, a lot of the patient information, like medication data, in the EHR is stored in the format of free text. As the conventional approach of information extraction (IE) demands a high developmental effort, we used ad hoc IE instead. This technique queries information and extracts it on the fly from texts contained in the CDW.
Methods
We present a generalizable approach of ad hoc IE for pharmacotherapy (medications and their daily dosage) presented in hospital discharge letters. We added import and query features to the CDW system, like error tolerant queries to deal with misspellings and proximity search for the extraction of the daily dosage. During the data integration process in the CDW, negated, historical and non-patient context data are filtered. For the replication studies, we used a drug list grouped by ATC (Anatomical Therapeutic Chemical Classification System) codes as input for queries to the CDW.
Results
We achieve an F1 score of 0.983 (precision 0.997, recall 0.970) for extracting medication from discharge letters and an F1 score of 0.974 (precision 0.977, recall 0.972) for extracting the dosage. We replicated three published medical trend studies for hypertension, atrial fibrillation and chronic kidney disease. Overall, 93% of the main findings could be replicated, 68% of sub-findings, and 75% of all findings. One study could be completely replicated with all main and sub-findings.
Conclusion
A novel approach for ad hoc IE is presented. It is very suitable for basic medical texts like discharge letters and finding reports. Ad hoc IE is by definition more limited than conventional IE and does not claim to replace it, but it substantially exceeds the search capabilities of many CDWs and it is convenient to conduct replication studies fast and with high quality.
This short letter proposes more consolidated explicit solutions for the forces and torques acting on typical rover wheels, that can be used as a method to determine their average mobility characteristics in planetary soils. The closed loop solutions stand in one of the verified methods, but at difference of the previous, observables are decoupled requiring a less amount of physical parameters to measure. As a result, we show that with knowledge of terrain properties, wheel driving performance rely in a single observable only. Because of their generality, the formulated equations established here can have further implications in autonomy and control of rovers or planetary soil characterization.
Knowledge encoding in game mechanics: transfer-oriented knowledge learning in desktop-3D and VR
(2019)
Affine Transformations (ATs) are a complex and abstract learning content. Encoding the AT knowledge in Game Mechanics (GMs) achieves a repetitive knowledge application and audiovisual demonstration. Playing a serious game providing these GMs leads to motivating and effective knowledge learning. Using immersive Virtual Reality (VR) has the potential to even further increase the serious game’s learning outcome and learning quality. This paper compares the effectiveness and efficiency of desktop-3D and VR in respect to the achieved learning outcome. Also, the present study analyzes the effectiveness of an enhanced audiovisual knowledge encoding and the provision of a debriefing system. The results validate the effectiveness of the knowledge encoding in GMs to achieve knowledge learning. The study also indicates that VR is beneficial for the overall learning quality and that an enhanced audiovisual encoding has only a limited effect on the learning outcome.
In recent years several community testbeds as well as participatory sensing platforms have successfully established themselves to provide open data to everyone interested. Each of them with a specific goal in mind, ranging from collecting radio coverage data up to environmental and radiation data. Such data can be used by the community in their decision making, whether to subscribe to a specific mobile phone service that provides good coverage in an area or in finding a sunny and warm region for the summer holidays.
However, the existing platforms are usually limiting themselves to directly measurable network QoS. If such a crowdsourced data set provides more in-depth derived measures, this would enable an even better decision making. A community-driven crowdsensing platform that derives spatial application-layer user experience from resource-friendly bandwidth estimates would be such a case, video streaming services come to mind as a prime example. In this paper we present a concept for such a system based on an initial prototype that eases the collection of data necessary to determine mobile-specific QoE at large scale. In addition we reason why the simple quality metric proposed here can hold its own.
White Paper on Crowdsourced Network and QoE Measurements – Definitions, Use Cases and Challenges
(2020)
The goal of the white paper at hand is as follows. The definitions of the terms build a framework for discussions around the hype topic ‘crowdsourcing’. This serves as a basis for differentiation and a consistent view from different perspectives on crowdsourced network measurements, with the goal to provide a commonly accepted definition in the community. The focus is on the context of mobile and fixed network operators, but also on measurements of different layers (network, application, user layer). In addition, the white paper shows the value of crowdsourcing for selected use cases, e.g., to improve QoE or regulatory issues. Finally, the major challenges and issues for researchers and practitioners are highlighted.
This white paper is the outcome of the Würzburg seminar on “Crowdsourced Network and QoE Measurements” which took place from 25-26 September 2019 in Würzburg, Germany. International experts were invited from industry and academia. They are well known in their communities, having different backgrounds in crowdsourcing, mobile networks, network measurements, network performance, Quality of Service (QoS), and Quality of Experience (QoE). The discussions in the seminar focused on how crowdsourcing will support vendors, operators, and regulators to determine the Quality of Experience in new 5G networks that enable various new applications and network architectures. As a result of the discussions, the need for a white paper manifested, with the goal of providing a scientific discussion of the terms “crowdsourced network measurements” and “crowdsourced QoE measurements”, describing relevant use cases for such crowdsourced data, and its underlying challenges. During the seminar, those main topics were identified, intensively discussed in break-out groups, and brought back into the plenum several times. The outcome of the seminar is this white paper at hand which is – to our knowledge – the first one covering the topic of crowdsourced network and QoE measurements.
OCR4all—An open-source tool providing a (semi-)automatic OCR workflow for historical printings
(2019)
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years, great progress has been made in the area of historical OCR, resulting in several powerful open-source tools for preprocessing, layout analysis and segmentation, character recognition, and post-processing. The drawback of these tools often is their limited applicability by non-technical users like humanist scholars and in particular the combined use of several tools in a workflow. In this paper, we present an open-source OCR software called OCR4all, which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow. While a variety of materials can already be processed fully automatically, books with more complex layouts require manual intervention by the users. This is mostly due to the fact that the required ground truth for training stronger mixed models (for segmentation, as well as text recognition) is not available, yet, neither in the desired quantity nor quality. To deal with this issue in the short run, OCR4all offers a comfortable GUI that allows error corrections not only in the final output, but already in early stages to minimize error propagations. In the long run, this constant manual correction produces large quantities of valuable, high quality training material, which can be used to improve fully automatic approaches. Further on, extensive configuration capabilities are provided to set the degree of automation of the workflow and to make adaptations to the carefully selected default parameters for specific printings, if necessary. During experiments, the fully automated application on 19th Century novels showed that OCR4all can considerably outperform the commercial state-of-the-art tool ABBYY Finereader on moderate layouts if suitably pretrained mixed OCR models are available. Furthermore, on very complex early printed books, even users with minimal or no experience were able to capture the text with manageable effort and great quality, achieving excellent Character Error Rates (CERs) below 0.5%. The architecture of OCR4all allows the easy integration (or substitution) of newly developed tools for its main components by standardized interfaces like PageXML, thus aiming at continual higher automation for historical printings.
The correct behavior of spacecraft components is the foundation of unhindered mission operation. However, no technical system is free of wear and degradation. A malfunction of one single component might significantly alter the behavior of the whole spacecraft and may even lead to a complete mission failure. Therefore, abnormal component behavior must be detected early in order to be able to perform counter measures. A dedicated fault detection system can be employed, as opposed to classical health monitoring, performed by human operators, to decrease the response time to a malfunction. In this paper, we present a generic model-based diagnosis system, which detects faults by analyzing the spacecraft’s housekeeping data. The observed behavior of the spacecraft components, given by the housekeeping data is compared to their expected behavior, obtained through simulation. Each discrepancy between the observed and the expected behavior of a component generates a so-called symptom. Given the symptoms, the diagnoses are derived by computing sets of components whose malfunction might cause the observed discrepancies. We demonstrate the applicability of the diagnosis system by using modified housekeeping data of the qualification model of an actual spacecraft and outline the advantages and drawbacks of our approach.
Background: Natural language processing (NLP) is a powerful tool supporting the generation of Real-World Evidence (RWE). There is no NLP system that enables the extensive querying of parameters specific to multiple myeloma (MM) out of unstructured medical reports. We therefore created a MM-specific ontology to accelerate the information extraction (IE) out of unstructured text. Methods: Our MM ontology consists of extensive MM-specific and hierarchically structured attributes and values. We implemented “A Rule-based Information Extraction System” (ARIES) that uses this ontology. We evaluated ARIES on 200 randomly selected medical reports of patients diagnosed with MM. Results: Our system achieved a high F1-Score of 0.92 on the evaluation dataset with a precision of 0.87 and recall of 0.98. Conclusions: Our rule-based IE system enables the comprehensive querying of medical reports. The IE accelerates the extraction of data and enables clinicians to faster generate RWE on hematological issues. RWE helps clinicians to make decisions in an evidence-based manner. Our tool easily accelerates the integration of research evidence into everyday clinical practice.
Das Thema dieser Dissertation lautet „Konzeption und Evaluation eines webbasierten Patienteninformationsprogrammes zur Überprüfung internistischer Verdachtsdiagnosen“. Zusammen mit dem Institut für Informatik wurde das wissensbasierte second-opinion-System SymptomCheck entwickelt. Das Programm dient zur Überprüfung von Verdachtsdiagnosen. Es wurden Wissensbasen erstellt, in denen Symptome, Befunde und Untersuchungen nach einem Bewertungsschema beurteilt werden. Folgend wurde eine online erreichbare Startseite erstellt, auf der Nutzer vornehmlich internistische Verdachtsdiagnosen überprüfen können. Das Programm wurde in zwei Studien bezüglich seiner Sensitivität und Spezifität sowie der Benutzerfreundlichkeit getestet. In der ersten Studie wurden die Verdachtsdiagnosen ambulanter Patienten mit den ärztlich gestellten Diagnosen verglichen, eine zweite an die Allgemeinbevölkerung gerichtete Onlinestudie galt vor allem der Bewertung der Benutzerfreundlichkeit. Soweit bekannt ist dies die erste Studie in der ein selbst entwickeltes Programm selbstständig an echten Patienten getestet wurde.
Virtual reality and related media and communication technologies have a growing
impact on professional application fields and our daily life. Virtual environments
have the potential to change the way we perceive ourselves and how we interact
with others. In comparison to other technologies, virtual reality allows for the
convincing display of a virtual self-representation, an avatar, to oneself and also to
others. This is referred to as user embodiment. Avatars can be of varying realism
and abstraction in their appearance and in the behaviors they convey. Such userembodying
interfaces, in turn, can impact the perception of the self as well as
the perception of interactions. For researchers, designers, and developers it is of
particular interest to understand these perceptual impacts, to apply them to therapy,
assistive applications, social platforms, or games, for example. The present thesis
investigates and relates these impacts with regard to three areas: intrapersonal
effects, interpersonal effects, and effects of social augmentations provided by the
simulation.
With regard to intrapersonal effects, we specifically explore which simulation
properties impact the illusion of owning and controlling a virtual body, as well
as a perceived change in body schema. Our studies lead to the construction of
an instrument to measure these dimensions and our results indicate that these
dimensions are especially affected by the level of immersion, the simulation latency,
as well as the level of personalization of the avatar.
With regard to interpersonal effects we compare physical and user-embodied social
interactions, as well as different degrees of freedom in the replication of nonverbal
behavior. Our results suggest that functional levels of interaction are maintained,
whereas aspects of presence can be affected by avatar-mediated interactions, and
collaborative motor coordination can be disturbed by immersive simulations.
Social interaction is composed of many unknown symbols and harmonic patterns
that define our understanding and interpersonal rapport. For successful virtual
social interactions, a mere replication of physical world behaviors to virtual environments
may seem feasible. However, the potential of mediated social interactions
goes beyond this mere replication. In a third vein of research, we propose and
evaluate alternative concepts on how computers can be used to actively engage in
mediating social interactions, namely hybrid avatar-agent technologies. Specifically,
we investigated the possibilities to augment social behaviors by modifying and
transforming user input according to social phenomena and behavior, such as nonverbal
mimicry, directed gaze, joint attention, and grouping. Based on our results
we argue that such technologies could be beneficial for computer-mediated social
interactions such as to compensate for lacking sensory input and disturbances in
data transmission or to increase aspects of social presence by visual substitution or
amplification of social behaviors.
Based on related work and presented findings, the present thesis proposes the
perspective of considering computers as social mediators. Concluding from prototypes
and empirical studies, the potential of technology to be an active mediator of social
perception with regard to the perception of the self, as well as the perception of
social interactions may benefit our society by enabling further methods for diagnosis,
treatment, and training, as well as the inclusion of individuals with social disorders.
To this regard, we discuss implications for our society and ethical aspects. This
thesis extends previous empirical work and further presents novel instruments,
concepts, and implications to open up new perspectives for the development of
virtual reality, mixed reality, and augmented reality applications.
Bridge-local latency computation is often regarded with caution, as historic efforts with the Credit-Based Shaper (CBS) showed that CBS requires network wide information for tight bounds. Recently, new shaping mechanisms and timed gates were applied to achieve such guarantees nonetheless, but they require support for these new mechanisms in the forwarding devices.
This document presents a per-hop latency bound for individual streams in a class-based network that applies the IEEE 802.1Q strict priority transmission selection algorithm. It is based on self-pacing talkers and uses the accumulated latency fields during the reservation process to provide upper bounds with bridge-local information. The presented delay bound is proven mathematically and then evaluated with respect to its accuracy. It indicates the required information that must be provided for admission control, e.g., implemented by a resource reservation protocol such as IEEE 802.1Qdd. Further, it hints at potential improvements regarding new mechanisms and higher accuracy given more information.
This paper proposes an attitude determination system for small Unmanned Aerial Vehicles (UAV) with a weight limit of 5 kg and a small footprint of 0.5m x 0.5 m. The system is realized by coupling single-frequency Global Positioning System (GPS) code and carrier-phase measurements with the data acquired from a Micro-Electro-Mechanical System (MEMS) Inertial Measurement Unit (IMU) using consumer-grade Components-Off-The-Shelf (COTS) only. The sensor fusion is accomplished using two Extended Kalman Filters (EKF) that are coupled by exchanging information about the currently estimated baseline. With a baseline of 48 cm, the static heading accuracy of the proposed system is comparable to the one of a commercial single-frequency GPS heading system with an accuracy of approximately 0.25°/m. Flight testing shows that the proposed system is able to obtain a reliable and stable GPS heading estimation without an aiding magnetometer.
The importance of Clinical Data Warehouses (CDW) has increased significantly in recent years as they support or enable many applications such as clinical trials, data mining, and decision making.
CDWs integrate Electronic Health Records which still contain a large amount of text data, such as discharge letters or reports on diagnostic findings in addition to structured and coded data like ICD-codes of diagnoses.
Existing CDWs hardly support features to gain information covered in texts.
Information extraction methods offer a solution for this problem but they have a high and long development effort, which can only be carried out by computer scientists.
Moreover, such systems only exist for a few medical domains.
This paper presents a method empowering clinicians to extract information from texts on their own. Medical concepts can be extracted ad hoc from e.g. discharge letters, thus physicians can work promptly and autonomously. The proposed system achieves these improvements by efficient data storage, preprocessing, and with powerful query features. Negations in texts are recognized and automatically excluded, as well as the context of information is determined and undesired facts are filtered, such as historical events or references to other persons (family history).
Context-sensitive queries ensure the semantic integrity of the concepts to be extracted.
A new feature not available in other CDWs is to query numerical concepts in texts and even filter them (e.g. BMI > 25).
The retrieved values can be extracted and exported for further analysis.
This technique is implemented within the efficient architecture of the PaDaWaN CDW and evaluated with comprehensive and complex tests.
The results outperform similar approaches reported in the literature.
Ad hoc IE determines the results in a few (milli-) seconds and a user friendly GUI enables interactive working, allowing flexible adaptation of the extraction.
In addition, the applicability of this system is demonstrated in three real-world applications at the Würzburg University Hospital (UKW).
Several drug trend studies are replicated: Findings of five studies on high blood pressure, atrial fibrillation and chronic renal failure can be partially or completely confirmed in the UKW. Another case study evaluates the prevalence of heart failure in inpatient hospitals using an algorithm that extracts information with ad hoc IE from discharge letters and echocardiogram report (e.g. LVEF < 45 ) and other sources of the hospital information system.
This study reveals that the use of ICD codes leads to a significant underestimation (31%) of the true prevalence of heart failure.
The third case study evaluates the consistency of diagnoses by comparing structured ICD-10-coded diagnoses with the diagnoses described in the diagnostic section of the discharge letter.
These diagnoses are extracted from texts with ad hoc IE, using synonyms generated with a novel method.
The developed approach can extract diagnoses from the discharge letter with a high accuracy and furthermore it can prove the degree of consistency between the coded and reported diagnoses.
Maps are the main tool to represent geographical information. Users often zoom in and out to access maps at different scales. Continuous map generalization tries to make the changes between different scales smooth, which is essential to provide users with comfortable zooming experience.
In order to achieve continuous map generalization with high quality, we optimize some important aspects of maps. In this book, we have used optimization in the generalization of land-cover areas, administrative boundaries, buildings, and coastlines. According to our experiments, continuous map generalization indeed benefits from optimization.
Automation in Software Performance Engineering Based on a Declarative Specification of Concerns
(2019)
Software performance is of particular relevance to software system design, operation, and evolution because it has a significant impact on key business indicators. During the life-cycle of a software system, its implementation, configuration, and deployment are subject to multiple changes that may affect the end-to-end performance characteristics. Consequently, performance analysts continually need to provide answers to and act based on performance-relevant concerns. To ensure a desired level of performance, software performance engineering provides a plethora of methods, techniques, and tools for measuring, modeling, and evaluating performance properties of software systems. However, the answering of performance concerns is subject to a significant semantic gap between the level on which performance concerns are formulated and the technical level on which performance evaluations are actually conducted. Performance evaluation approaches come with different strengths and limitations concerning, for example, accuracy, time-to-result, or system overhead. For the involved stakeholders, it can be an elaborate process to reasonably select, parameterize and correctly apply performance evaluation approaches, and to filter and interpret the obtained results. An additional challenge is that available performance evaluation artifacts may change over time, which requires to switch between different measurement-based and model-based performance evaluation approaches during the system evolution. At model-based analysis, the effort involved in creating performance models can also outweigh their benefits.
To overcome the deficiencies and enable an automatic and holistic evaluation of performance throughout the software engineering life-cycle requires an approach that: (i) integrates multiple types of performance concerns and evaluation approaches, (ii) automates performance model creation, and (iii) automatically selects an evaluation methodology tailored to a specific scenario. This thesis presents a declarative approach —called Declarative Performance Engineering (DPE)— to automate performance evaluation based on a humanreadable specification of performance-related concerns. To this end, we separate the definition of performance concerns from their solution. The primary scientific contributions presented in this thesis are:
A declarative language to express performance-related concerns and a corresponding processing framework:
We provide a language to specify performance concerns independent of a concrete performance evaluation approach. Besides the specification of functional aspects, the language allows to include non-functional tradeoffs optionally. To answer these concerns, we provide a framework architecture and a corresponding reference implementation to process performance concerns automatically. It allows to integrate arbitrary performance evaluation approaches and is accompanied by reference implementations for model-based and measurement-based performance evaluation.
Automated creation of architectural performance models from execution traces:
The creation of performance models can be subject to significant efforts outweighing the benefits of model-based performance evaluation. We provide a model extraction framework that creates architectural performance models based on execution traces, provided by monitoring tools.The framework separates the derivation of generic information from model creation routines. To derive generic information, the framework combines state-of-the-art extraction and estimation techniques. We isolate object creation routines specified in a generic model builder interface based on concepts present in multiple performance-annotated architectural modeling formalisms. To create model extraction for a novel performance modeling formalism, developers only need to write object creation routines instead of creating model extraction software from scratch when reusing the generic framework.
Automated and extensible decision support for performance evaluation approaches:
We present a methodology and tooling for the automated selection of a performance evaluation approach tailored to the user concerns and application scenario. To this end, we propose to decouple the complexity of selecting a performance evaluation approach for a given scenario by providing solution approach capability models and a generic decision engine. The proposed capability meta-model enables to describe functional and non-functional capabilities of performance evaluation approaches and tools at different granularities. In contrast to existing tree-based decision support mechanisms, the decoupling approach allows to easily update characteristics of solution approaches as well as appending new rating criteria and thereby stay abreast of evolution in performance evaluation tooling and system technologies.
Time-to-result estimation for model-based performance prediction:
The time required to execute a model-based analysis plays an important role in different decision processes. For example, evaluation scenarios might require the prediction results to be available in a limited period of time such that the system can be adapted in time to ensure the desired quality of service. We propose a method to estimate the time-to-result for modelbased performance prediction based on model characteristics and analysis parametrization. We learn a prediction model using performancerelevant features thatwe determined using statistical tests. We implement the approach and demonstrate its practicability by applying it to analyze a simulation-based multi-step performance evaluation approach for a representative architectural performance modeling formalism.
We validate each of the contributions based on representative case studies. The evaluation of automatic performance model extraction for two case study systems shows that the resulting models can accurately predict the performance behavior. Prediction accuracy errors are below 3% for resource utilization and mostly less than 20% for service response time. The separate evaluation of the reusability shows that the presented approach lowers the implementation efforts for automated model extraction tools by up to 91%. Based on two case studies applying measurement-based and model-based performance evaluation techniques, we demonstrate the suitability of the declarative performance engineering framework to answer multiple kinds of performance concerns customized to non-functional goals. Subsequently, we discuss reduced efforts in applying performance analyses using the integrated and automated declarative approach. Also, the evaluation of the declarative framework reviews benefits and savings integrating performance evaluation approaches into the declarative performance engineering framework. We demonstrate the applicability of the decision framework for performance evaluation approaches by applying it to depict existing decision trees. Then, we show how we can quickly adapt to the evolution of performance evaluation methods which is challenging for static tree-based decision support systems. At this, we show how to cope with the evolution of functional and non-functional capabilities of performance evaluation software and explain how to integrate new approaches. Finally, we evaluate the accuracy of the time-to-result estimation for a set of machinelearning algorithms and different training datasets. The predictions exhibit a mean percentage error below 20%, which can be further improved by including performance evaluations of the considered model into the training data. The presented contributions represent a significant step towards an integrated performance engineering process that combines the strengths of model-based and measurement-based performance evaluation. The proposed performance concern language in conjunction with the processing framework significantly reduces the complexity of applying performance evaluations for all stakeholders. Thereby it enables performance awareness throughout the software engineering life-cycle. The proposed performance concern language removes the semantic gap between the level on which performance concerns are formulated and the technical level on which performance evaluations are actually conducted by the user.
This paper proposes a 3-D local pose estimation system for a small Unmanned Aerial Vehicle (UAV) with a weight limit of 200 g and a very small footprint of 10 cm×10cm. The system is realized by fusing 3-D position estimations from an Ultra-Wide Band (UWB) transceiver network with Inertial Measurement Unit (IMU) sensor data and data from a barometric pressure sensor. The 3-D position from the UWB network is estimated using Multi-Dimensional Scaling (MDS) and range measurements between the transceivers. The range measurements are obtained using Double-Sided Two-Way Ranging (DS-TWR), thus eliminating the need for an additional clock synchronization mechanism. The sensor fusion is accomplished using a loosely coupled Extended Kalman Filter (EKF) architecture. Extensive evaluation of the proposed system shows that a position accuracy with a Root-Mean-Square Error (RMSE) of 0.20cm can be obtained. The orientation angle can be estimated with an RMSE of 1.93°.
Making machines understand natural language is a dream of mankind that existed
since a very long time. Early attempts at programming machines to converse with
humans in a supposedly intelligent way with humans relied on phrase lists and simple
keyword matching. However, such approaches cannot provide semantically adequate
answers, as they do not consider the specific meaning of the conversation. Thus, if we
want to enable machines to actually understand language, we need to be able to access
semantically relevant background knowledge. For this, it is possible to query so-called
ontologies, which are large networks containing knowledge about real-world entities
and their semantic relations. However, creating such ontologies is a tedious task, as often
extensive expert knowledge is required. Thus, we need to find ways to automatically
construct and update ontologies that fit human intuition of semantics and semantic
relations. More specifically, we need to determine semantic entities and find relations
between them. While this is usually done on large corpora of unstructured text, previous
work has shown that we can at least facilitate the first issue of extracting entities by
considering special data such as tagging data or human navigational paths. Here, we do
not need to detect the actual semantic entities, as they are already provided because of
the way those data are collected. Thus we can mainly focus on the problem of assessing
the degree of semantic relatedness between tags or web pages. However, there exist
several issues which need to be overcome, if we want to approximate human intuition of
semantic relatedness. For this, it is necessary to represent words and concepts in a way
that allows easy and highly precise semantic characterization. This also largely depends
on the quality of data from which these representations are constructed.
In this thesis, we extract semantic information from both tagging data created by users
of social tagging systems and human navigation data in different semantic-driven social
web systems. Our main goal is to construct high quality and robust vector representations
of words which can the be used to measure the relatedness of semantic concepts.
First, we show that navigation in the social media systems Wikipedia and BibSonomy is
driven by a semantic component. After this, we discuss and extend methods to model
the semantic information in tagging data as low-dimensional vectors. Furthermore, we
show that tagging pragmatics influences different facets of tagging semantics. We then
investigate the usefulness of human navigational paths in several different settings on
Wikipedia and BibSonomy for measuring semantic relatedness. Finally, we propose
a metric-learning based algorithm in adapt pre-trained word embeddings to datasets
containing human judgment of semantic relatedness.
This work contributes to the field of studying semantic relatedness between words
by proposing methods to extract semantic relatedness from web navigation, learn highquality
and low-dimensional word representations from tagging data, and to learn
semantic relatedness from any kind of vector representation by exploiting human
feedback. Applications first and foremest lie in ontology learning for the Semantic Web,
but also semantic search or query expansion.
Einleitung:
Multiple-Choice-Klausuren spielen immer noch eine herausragende Rolle für fakultätsinterne medizinische Prüfungen. Neben inhaltlichen Arbeiten stellt sich die Frage, wie die technische Abwicklung optimiert werden kann. Für Dozenten in der Medizin gibt es zunehmend drei Optionen zur Durchführung von MC-Klausuren: Papierklausuren mit oder ohne Computerunterstützung oder vollständig elektronische Klausuren. Kritische Faktoren sind der Aufwand für die Formatierung der Klausur, der logistische Aufwand bei der Klausurdurchführung, die Qualität, Schnelligkeit und der Aufwand der Klausurkorrektur, die Bereitstellung der Dokumente für die Einsichtnahme, und die statistische Analyse der Klausurergebnisse.
Methoden:
An der Universität Würzburg wird seit drei Semestern ein Computerprogramm zur Eingabe und Formatierung der MC-Fragen in medizinischen und anderen Papierklausuren verwendet und optimiert, mit dem im Wintersemester (WS) 2009/2010 elf, im Sommersemester (SS) 2010 zwölf und im WS 2010/11 dreizehn medizinische Klausuren erstellt und anschließend die eingescannten Antwortblätter automatisch ausgewertet wurden. In den letzten beiden Semestern wurden die Aufwände protokolliert.
Ergebnisse:
Der Aufwand der Formatierung und der Auswertung einschl. nachträglicher Anpassung der Auswertung einer Durchschnittsklausur mit ca. 140 Teilnehmern und ca. 35 Fragen ist von 5-7 Stunden für Klausuren ohne Komplikation im WS 2009/2010 über ca. 2 Stunden im SS 2010 auf ca. 1,5 Stunden im WS 2010/11 gefallen. Einschließlich der Klausuren mit Komplikationen bei der Auswertung betrug die durchschnittliche Zeit im SS 2010 ca. 3 Stunden und im WS 10/11 ca. 2,67 Stunden pro Klausur.
Diskussion:
Für konventionelle Multiple-Choice-Klausuren bietet die computergestützte Formatierung und Auswertung von Papierklausuren einen beträchtlichen Zeitvorteil für die Dozenten im Vergleich zur manuellen Korrektur von Papierklausuren und benötigt im Vergleich zu rein elektronischen Klausuren eine deutlich einfachere technische Infrastruktur und weniger Personal bei der Klausurdurchführung.
Energy efficiency of computing systems has become an increasingly important issue over the last decades. In 2015, data centers were responsible for 2% of the world's greenhouse gas emissions, which is roughly the same as the amount produced by air travel.
In addition to these environmental concerns, power consumption of servers in data centers results in significant operating costs, which increase by at least 10% each year.
To address this challenge, the U.S. EPA and other government agencies are considering the use of novel measurement methods in order to label the energy efficiency of servers.
The energy efficiency and power consumption of a server is subject to a great number of factors, including, but not limited to, hardware, software stack, workload, and load level.
This huge number of influencing factors makes measuring and rating of energy efficiency challenging. It also makes it difficult to find an energy-efficient server for a specific use-case. Among others, server provisioners, operators, and regulators would profit from information on the servers in question and on the factors that affect those servers' power consumption and efficiency. However, we see a lack of measurement methods and metrics for energy efficiency of the systems under consideration.
Even assuming that a measurement methodology existed, making decisions based on its results would be challenging. Power prediction methods that make use of these results would aid in decision making. They would enable potential server customers to make better purchasing decisions and help operators predict the effects of potential reconfigurations.
Existing energy efficiency benchmarks cannot fully address these challenges, as they only measure single applications at limited sets of load levels. In addition, existing efficiency metrics are not helpful in this context, as they are usually a variation of the simple performance per power ratio, which is only applicable to single workloads at a single load level. Existing data center efficiency metrics, on the other hand, express the efficiency of the data center space and power infrastructure, not focusing on the efficiency of the servers themselves. Power prediction methods for not-yet-available systems that could make use of the results provided by a comprehensive power rating methodology are also lacking. Existing power prediction models for hardware designers have a very fine level of granularity and detail that would not be useful for data center operators.
This thesis presents a measurement and rating methodology for energy efficiency of servers and an energy efficiency metric to be applied to the results of this methodology. We also design workloads, load intensity and distribution models, and mechanisms that can be used for energy efficiency testing. Based on this, we present power prediction mechanisms and models that utilize our measurement methodology and its results for power prediction.
Specifically, the six major contributions of this thesis are:
We present a measurement methodology and metrics for energy efficiency rating of servers that use multiple, specifically chosen workloads at different load levels for a full system characterization.
We evaluate the methodology and metric with regard to their reproducibility, fairness, and relevance. We investigate the power and performance variations of test results and show fairness of the metric through a mathematical proof and a correlation analysis on a set of 385 servers. We evaluate the metric's relevance by showing the relationships that can be established between metric results and third-party applications.
We create models and extraction mechanisms for load profiles that vary over time, as well as load distribution mechanisms and policies. The models are designed to be used to define arbitrary dynamic load intensity profiles that can be leveraged for benchmarking purposes. The load distribution mechanisms place workloads on computing resources in a hierarchical manner.
Our load intensity models can be extracted in less than 0.2 seconds and our resulting models feature a median modeling error of 12.7% on average. In addition, our new load distribution strategy can save up to 10.7% of power consumption on a single server node.
We introduce an approach to create small-scale workloads that emulate the power consumption-relevant behavior of large-scale workloads by approximating their CPU performance counter profile, and we introduce TeaStore, a distributed, micro-service-based reference application. TeaStore can be used to evaluate power and performance model accuracy, elasticity of cloud auto-scalers, and the effectiveness of power saving mechanisms for distributed systems.
We show that we are capable of emulating the power consumption behavior of realistic workloads with a mean deviation less than 10% and down to 0.2 watts (1%). We demonstrate the use of TeaStore in the context of performance model extraction and cloud auto-scaling also showing that it may generate workloads with different effects on the power consumption of the system under consideration.
We present a method for automated selection of interpolation strategies for performance and power characterization. We also introduce a configuration approach for polynomial interpolation functions of varying degrees that improves prediction accuracy for system power consumption for a given system utilization.
We show that, in comparison to regression, our automated interpolation method selection and configuration approach improves modeling accuracy by 43.6% if additional reference data is available and by 31.4% if it is not.
We present an approach for explicit modeling of the impact a virtualized environment has on power consumption and a method to predict the power consumption of a software application. Both methods use results produced by our measurement methodology to predict the respective power consumption for servers that are otherwise not available to the person making the prediction.
Our methods are able to predict power consumption reliably for multiple hypervisor configurations and for the target application workloads. Application workload power prediction features a mean average absolute percentage error of 9.5%.
Finally, we propose an end-to-end modeling approach for predicting the power consumption of component placements at run-time. The model can also be used to predict the power consumption at load levels that have not yet been observed on the running system.
We show that we can predict the power consumption of two different distributed web applications with a mean absolute percentage error of 2.2%. In addition, we can predict the power consumption of a system at a previously unobserved load level and component distribution with an error of 1.2%.
The contributions of this thesis already show a significant impact in science and industry. The presented efficiency rating methodology, including its metric, have been adopted by the U.S. EPA in the latest version of the ENERGY STAR Computer Server program. They are also being considered by additional regulatory agencies, including the EU Commission and the China National Institute of Standardization. In addition, the methodology's implementation and the underlying methodology itself have already found use in several research publications.
Regarding future work, we see a need for new workloads targeting specialized server hardware. At the moment, we are witnessing a shift in execution hardware to specialized machine learning chips, general purpose GPU computing, FPGAs being embedded into compute servers, etc. To ensure that our measurement methodology remains relevant, workloads covering these areas are required. Similarly, power prediction models must be extended to cover these new scenarios.
The attitude and orbit control system of pico- and nano-satellites to date is one of the bottle necks for future scientific and commercial applications. A performance increase while keeping with the satellites’ restrictions will enable new space missions especially for the smallest of the CubeSat classes. This work addresses methods to measure and improve the satellite’s attitude pointing and orbit control performance based on advanced sensor data analysis and optimized on-board software concepts. These methods are applied to spaceborne satellites and future CubeSat missions to demonstrate their validity. An in-orbit calibration procedure for a typical CubeSat attitude sensor suite is developed and applied to the UWE-3 satellite in space. Subsequently, a method to estimate the attitude determination accuracy without the help of an external reference sensor is developed. Using this method, it is shown that the UWE-3 satellite achieves an in-orbit attitude determination accuracy of about 2°.
An advanced data analysis of the attitude motion of a miniature satellite is used in order to estimate the main attitude disturbance torque in orbit. It is shown, that the magnetic disturbance is by far the most significant contribution for miniature satellites and a method to estimate the residual magnetic dipole moment of a satellite is developed. Its application to three CubeSats currently in orbit reveals that magnetic disturbances are a common issue for this class of satellites. The dipole moments measured are between 23.1mAm² and 137.2mAm². In order to autonomously estimate and counteract this disturbance in future missions an on-board magnetic dipole estimation algorithm is developed.
The autonomous neutralization of such disturbance torques together with the simplification of attitude control for the satellite operator is the focus of a novel on-board attitude control software architecture. It incorporates disturbance torques acting on the satellite and automatically optimizes the control output. Its application is demonstrated in space on board of the UWE-3 satellite through various attitude control experiments of which the results are presented here.
The integration of a miniaturized electric propulsion system will enable CubeSats to perform orbit control and, thus, open up new application scenarios. The in-orbit characterization, however, poses the problem of precisely measuring very low thrust levels in the order of µN. A method to measure this thrust based on the attitude dynamics of the satellite is developed and evaluated in simulation. It is shown, that the demonstrator mission UWE-4 will be able to measure these thrust levels with a high accuracy of 1% for thrust levels higher than 1µN.
The orbit control capabilities of UWE-4 using its electric propulsion system are evaluated and a hybrid attitude control system making use of the satellite’s magnetorquers and the electric propulsion system is developed. It is based on the flexible attitude control architecture mentioned before and thrust vector pointing accuracies of better than 2° can be achieved. This results in a thrust delivery of more than 99% of the desired acceleration in the target direction.
Einleitung: Medizinische Trainingsfälle sind in der studentischen Ausbildung inzwischen weit verbreitet. In den meisten Publikationen wird über die Entwicklung und die Erfahrungen in einem Kurs mit Trainingsfällen berichtet. In diesem Beitrag vergleichen wir die Akzeptanz von verschiedenen Trainingsfallkursen, die als Ergänzung zu zahlreichen Vorlesungen der Medizinischen Fakultät der Universität Würzburg mit sehr unterschiedlichen Nutzungsraten eingesetzt wurden, über einen Zeitraum von drei Semestern.
Methoden: Die Trainingsfälle wurden mit dem Autoren- und Ablaufsystem CaseTrain erstellt und über die Moodle-basierte Würzburger Lernplattform WueCampus den Studierenden verfügbar gemacht. Dabei wurden umfangreiche Daten über die Nutzung und Akzeptanz erhoben.
Ergebnisse: Im Zeitraum vom WS 08/09 bis zum WS 09/10 waren 19 Kurse mit insgesamt ca. 200 Fällen für die Studierenden verfügbar, die pro Semester von ca. 550 verschiedenen Medizinstudenten der Universität Würzburg und weiteren 50 Studierenden anderer bayerischer Universitäten genutzt wurden. Insgesamt wurden pro Semester ca. 12000 Mal Trainingsfälle vollständig durchgespielt zu denen ca. 2000 Evaluationen von den Studierenden ausgefüllt wurden. In den verschiedenen Kursen variiert die Nutzung zwischen unter 50 Bearbeitungen in wenig frequentierten Fallsammlungen und über 5000 Bearbeitungen in stark frequentierten Fallsammlungen.
Diskussion: Auch wenn Studierende wünschen, dass zu allen Vorlesungen Trainingsfälle angeboten werden, zeigen die Daten, dass der Umfang der Nutzung nicht primär von der Qualität der verfügbaren Trainingsfälle abhängt. Dagegen werden die Trainingsfälle in fast allen Fallsammlungen kurz vor den Klausuren extrem häufig bearbeitet. Dies zeigt, dass die Nutzung von Trainingsfällen im Wesentlichen von der wahrgenommenen Klausurrelevanz der Fälle abhängt.
This paper describes the estimation of the body weight of a person in front of an RGB-D camera. A survey of different methods for body weight estimation based on depth sensors is given. First, an estimation of people standing in front of a camera is presented. Second, an approach based on a stream of depth images is used to obtain the body weight of a person walking towards a sensor. The algorithm first extracts features from a point cloud and forwards them to an artificial neural network (ANN) to obtain an estimation of body weight. Besides the algorithm for the estimation, this paper further presents an open-access dataset based on measurements from a trauma room in a hospital as well as data from visitors of a public event. In total, the dataset contains 439 measurements. The article illustrates the efficiency of the approach with experiments with persons lying down in a hospital, standing persons, and walking persons. Applicable scenarios for the presented algorithm are body weight-related dosing of emergency patients.
With the introduction of Software-defined Networking (SDN) in the late 2000s, not only a new research field has been created, but a paradigm shift was initiated in the broad field of networking. The programmable network control by SDN is a big step, but also a stumbling block for many of the established network operators and vendors. As with any new technology the question about the maturity and the productionreadiness of it arises. Therefore, this thesis picks specific features of SDN and analyzes its performance, reliability, and availability in scenarios that can be expected in production deployments.
The first SDN topic is the performance impact of application traffic in the data plane on the control plane. Second, reliability and availability concerns of SDN deployments are exemplary analyzed by evaluating the detection performance of a common SDN controller. Thirdly, the performance of P4, a technology that enhances SDN, or better its impact of certain control operations on the processing performance is evaluated.
Background:
The availability of fully sequenced genomes and the implementation of transcriptome technologies have increased the studies investigating the expression profiles for a variety of tissues, conditions, and species. In this study, using RNA-seq data for three distinct tissues (brain, liver, and muscle), we investigate how base composition affects mammalian gene expression, an issue of prime practical and evolutionary interest.
Results:
We present the transcriptome map of the mouse isochores (DNA segments with a fairly homogeneous base composition) for the three different tissues and the effects of isochores' base composition on their expression activity. Our analyses also cover the relations between the genes' expression activity and their localization in the isochore families.
Conclusions:
This study is the first where next-generation sequencing data are used to associate the effects of both genomic and genic compositional properties to their corresponding expression activity. Our findings confirm previous results, and further support the existence of a relationship between isochores and gene expression. This relationship corroborates that isochores are primarily a product of evolutionary adaptation rather than a simple by-product of neutral evolutionary processes.
Telemedicine uses telecommunication and information technology to provide health care services over spatial distances. In the upcoming demographic changes towards an older average population age, especially rural areas suffer from a decreasing doctor to patient ratio as well as a limited amount of available medical specialists in acceptable distance. These areas could benefit the most from telemedicine applications as they are known to improve access to medical services, medical expertise and can also help to mitigate critical or emergency situations. Although the possibilities of telemedicine applications exist in the entire range of healthcare, current systems focus on one specific disease while using dedicated hardware to connect the patient with the supervising telemedicine center.
This thesis describes the development of a telemedical system which follows a new generic design approach. This bridges the gap of existing approaches that only tackle one specific application. The proposed system on the contrary aims at supporting as many diseases and use cases as possible by taking all the stakeholders into account at the same time. To address the usability and acceptance of the system it is designed to use standardized hardware like commercial medical sensors and smartphones for collecting medical data of the patients and transmitting them to the telemedical center. The smartphone can also act as interface to the patient for health questionnaires or feedback.
The system can handle the collection and transport of medical data, analysis and visualization of the data as well as providing a real time communication with video and audio between the users.
On top of the generic telemedical framework the issue of scalability is addressed by integrating a rule-based analysis tool for the medical data. Rules can be easily created by medical personnel via a visual editor and can be personalized for each patient. The rule-based analysis tool is extended by multiple options for visualization of the data, mechanisms to handle complex rules and options for performing actions like raising alarms or sending automated messages.
It is sometimes hard for the medical experts to formulate their knowledge into rules and there may be information in the medical data that is not yet known. This is why a machine learning module was integrated into the system. It uses the incoming medical data of the patients to learn new rules that are then presented to the medical personnel for inspection. This is in line with European legislation where the human still needs to be in charge of such decisions.
Overall, we were able to show the benefit of the generic approach by evaluating it in three completely different medical use cases derived from specific application needs: monitoring of COPD (chronic obstructive pulmonary disease) patients, support of patients performing dialysis at home and councils of intensive-care experts. In addition the system was used for a non-medical use case: monitoring and optimization of industrial machines and robots. In all of the mentioned cases, we were able to prove the robustness of the generic approach with real users of the corresponding domain. This is why we can propose this approach for future development of telemedical systems.
The Software Defined Networking (SDN) paradigm offers network operators numerous improvements in terms of flexibility, scalability, as well as cost efficiency and vendor independence. However, in order to maximize the benefit from these features, several new challenges in areas such as management and orchestration need to be addressed. This dissertation makes contributions towards three key topics from these areas.
Firstly, we design, implement, and evaluate two multi-objective heuristics for the SDN controller placement problem. Secondly, we develop and apply mechanisms for automated decision making based on the Pareto frontiers that are returned by the multi-objective optimizers. Finally, we investigate and quantify the performance benefits for the SDN control plane that can be achieved by integrating information from external entities such as Network Management Systems (NMSs) into the control loop. Our evaluation results demonstrate the impact of optimizing various parameters of softwarized networks at different levels and are used to derive guidelines for an efficient operation.
Historical maps are fascinating documents and a valuable source of information for scientists of various disciplines. Many of these maps are available as scanned bitmap images, but in order to make them searchable in useful ways, a structured representation of the contained information is desirable.
This book deals with the extraction of spatial information from historical maps. This cannot be expected to be solved fully automatically (since it involves difficult semantics), but is also too tedious to be done manually at scale.
The methodology used in this book combines the strengths of both computers and humans: it describes efficient algorithms to largely automate information extraction tasks and pairs these algorithms with smart user interactions to handle what is not understood by the algorithm. The effectiveness of this approach is shown for various kinds of spatial documents from the 16th to the early 20th century.
The success of semantic systems has been proven over the last years.
Nowadays, Linked Data is the driver for the rapid development of ever new intelligent systems.
Especially in enterprise environments semantic systems successfully support more and more business processes.
This is especially true for after sales service in the mechanical engineering domain.
Here, service technicians need effective access to relevant technical documentation in order to diagnose and solve problems and defects.
Therefore, the usage of semantic information retrieval systems has become the new system metaphor.
Unlike classical retrieval software Linked Enterprise Data graphs are exploited to grant targeted and problem-oriented access to relevant documents.
However, huge parts of legacy technical documents have not yet been integrated into Linked Enterprise Data graphs.
Additionally, a plethora of information models for the semantic representation of technical information exists.
The semantic maturity of these information models can hardly be measured.
This thesis motivates that there is an inherent need for a self-contained semantification approach for technical documents.
This work introduces a maturity model that allows to quickly assess existing documentation.
Additionally, the approach comprises an abstracting semantic representation for technical documents that is aligned to all major standard information models.
The semantic representation combines structural and rhetorical aspects to provide access to so called Core Documentation Entities.
A novel and holistic semantification process describes how technical documents in different legacy formats can be transformed to a semantic and linked representation.
The practical significance of the semantification approach depends on tools supporting its application.
This work presents an accompanying tool chain of semantification applications, especially the semantification framework CAPLAN that is a highly integrated development and runtime environment for semantification processes.
The complete semantification approach is evaluated in four real-life projects: in a spare part augmentation project, semantification projects for earth moving technology and harvesting technology, as well as an ontology population project for special purpose vehicles.
Three additional case studies underline the broad applicability of the presented ideas.
Given points in the plane, connect them using minimum ink. Though the task seems simple, it turns out to be very time consuming. In fact, scientists believe that computers cannot efficiently solve it. So, do we have to resign? This book examines such NP-hard network-design problems, from connectivity problems in graphs to polygonal drawing problems on the plane. First, we observe why it is so hard to optimally solve these problems. Then, we go over to attack them anyway. We develop fast algorithms that find approximate solutions that are very close to the optimal ones. Hence, connecting points with slightly more ink is not hard.
This dissertation focuses on the performance evaluation of all components of Software Defined Networking (SDN) networks and covers whole their architecture. First, the isolation between virtual networks sharing the same physical resources is investigated with SDN switches of several vendors. Then, influence factors on the isolation are identified and evaluated. Second, the impact of control mechanisms on the performance of the data plane is examined through the flow rule installation time of SDN switches with different controllers. It is shown that both hardware-specific and controller instance have a specific influence on the installation time. Finally, several traffic flow monitoring methods of an SDN controller are investigated and a new monitoring approach is developed and evaluated. It is confirmed that the proposed method allows monitoring of particular flows as well as consumes fewer resources than the standard approach. Based on findings in this thesis, on the one hand, controller developers can refer to the work related to the control plane, such as flow monitoring or flow rule installation, to improve the performance of their applications. On the other hand, network administrators can apply the presented methods to select a suitable combination of controller and switches in their SDN networks, based on their performance requirements
Athletes adapt their training daily to optimize performance, as well as avoid fatigue, overtraining and other undesirable effects on their health. To optimize training load, each athlete must take his/her own personal objective and subjective characteristics into consideration and an increasing number of wearable technologies (wearables) provide convenient monitoring of various parameters. Accordingly, it is important to help athletes decide which parameters are of primary interest and which wearables can monitor these parameters most effectively. Here, we discuss the wearable technologies available for non-invasive monitoring of various parameters concerning an athlete's training and health. On the basis of these considerations, we suggest directions for future development. Furthermore, we propose that a combination of several wearables is most effective for accessing all relevant parameters, disturbing the athlete as little as possible, and optimizing performance and promoting health.
In this thesis various aspects of Quality of Experience (QoE) research are examined. The work is divided into three major blocks: QoE Assessment, QoE Monitoring, and VNF Performance Evaluation. First, prominent cloud applications such as Google Docs and a cloud-based photo album are explored. The QoE is characterized and the influence of packet loss and delay is studied. Afterwards, objective QoE monitoring for HTTP Adaptive Video Streaming (HAS) in the cloud is investigated. Additionally, by using a Virtual Network Function (VNF) for QoE monitoring in the cloud, the feasibility of an interworking of Network Function Virtualization (NFV) and cloud paradigm is evaluated. To this end, a VNF that exploits deep packet inspection technique was used to parse the video traffic. An algorithm is then designed accordingly to estimate video quality and QoE based on network and application layer parameters. To assess the accuracy of the estimation, the VNF is measured in different scenarios under different network QoS and the virtual environment of the cloud architecture. The insights show that the different geographical deployments of the VNF influence the accuracy of the video quality and QoE estimation. Various Service Function Chain (SFC) placement algorithms have been proposed and compared in the context of edge cloud networks. On the one hand, this research is aimed at cloud service providers by providing methods for evaluating QoE for cloud applications. On the other hand, network operators can learn the pitfalls and disadvantages of using the NFV paradigm for such a QoE monitoring mechanism.
A key functionality of cloud systems are automated resource management mechanisms at the infrastructure level. As part of this, elastic scaling of allocated resources is realized by so-called auto-scalers that are supposed to match the current demand in a way that the performance remains stable while resources are efficiently used.
The process of rating cloud infrastructure offerings in terms of the quality of their achieved elastic scaling remains undefined. Clear guidance for the selection and configuration of an auto-scaler for a given context is not available. Thus, existing operating solutions are optimized in a highly application specific way and usually kept undisclosed.
The common state of practice is the use of simplistic threshold-based approaches. Due to their reactive nature they incur performance degradation during the minutes of provisioning delays. In the literature, a high-number of auto-scalers has been proposed trying to overcome the limitations of reactive mechanisms by employing proactive prediction methods.
In this thesis, we identify potentials in automated cloud system resource management and its evaluation methodology. Specifically, we make the following contributions:
We propose a descriptive load profile modeling framework together with automated model extraction from recorded traces to enable reproducible workload generation with realistic load intensity variations. The proposed Descartes Load Intensity Model (DLIM) with its Limbo framework provides key functionality to stress and benchmark resource management approaches in a representative and fair manner.
We propose a set of intuitive metrics for quantifying timing, stability and accuracy aspects of elasticity. Based on these metrics, we propose a novel approach for benchmarking the elasticity of Infrastructure-as-a-Service (IaaS) cloud platforms independent of the performance exhibited by the provisioned underlying resources.
We tackle the challenge of reducing the risk of relying on a single proactive auto-scaler by proposing a new self-aware auto-scaling mechanism, called Chameleon, combining multiple different proactive methods coupled with a reactive fallback mechanism.
Chameleon employs on-demand, automated time series-based forecasting methods to predict the arriving load intensity in combination with run-time service demand estimation techniques to calculate the required resource consumption per work unit without the need for a detailed application instrumentation. It can also leverage application knowledge by solving product-form queueing networks used to derive optimized scaling actions. The Chameleon approach is first in resolving conflicts between reactive and proactive scaling decisions in an intelligent way.
We are confident that the contributions of this thesis will have a long-term impact on the way cloud resource management approaches are assessed. While this could result in an improved quality of autonomic management algorithms, we see and discuss arising challenges for future research in cloud resource management and its assessment methods: The adoption of containerization on top of virtual machine instances introduces another level of indirection. As a result, the nesting of virtual resources increases resource fragmentation and causes unreliable provisioning delays. Furthermore, virtualized compute resources tend to become more and more inhomogeneous associated with various priorities and trade-offs. Due to DevOps practices, cloud hosted service updates are released with a higher frequency which impacts the dynamics in user behavior.
Almost once a week broadcasts about earthquakes, hurricanes, tsunamis, or forest fires are filling the news. While oneself feels it is hard to watch such news, it is even harder for rescue troops to enter such areas. They need some skills to get a quick overview of the devastated area and find victims. Time is ticking, since the chance for survival shrinks the longer it takes till help is available. To coordinate the teams efficiently, all information needs to be collected at the command center. Therefore, teams investigate the destroyed houses and hollow spaces for victims. Doing so, they never can be sure that the building will not fully collapse while they
are inside. Here, rescue robots are welcome helpers, as they are replaceable and make work more secure. Unfortunately, rescue robots are not usable off-the-shelf, yet.
There is no doubt, that such a robot has to fulfil essential requirements to successfully accomplish a rescue mission. Apart from the mechanical requirements it has to be able to build
a 3D map of the environment. This is essential to navigate through rough terrain and fulfil manipulation tasks (e.g. open doors). To build a map and gather environmental information, robots are equipped with multiple sensors. Since laser scanners produce precise measurements and support a wide scanning range, they are common visual sensors utilized for mapping.
Unfortunately, they produce erroneous measurements when scanning transparent (e.g. glass, transparent plastic) or specular reflective objects (e.g. mirror, shiny metal). It is understood that such objects can be everywhere and a pre-manipulation to prevent their influences is impossible. Using additional sensors also bear risks.
The problem is that these objects are occasionally visible, based on the incident angle of the laser beam, the surface, and the type of object. Hence, for transparent objects, measurements might result from the object surface or objects behind it. For specular reflective objects, measurements might result from the object surface or a mirrored object. These mirrored objects are illustrated behind the surface which is wrong. To obtain a precise map, the surfaces need to
be recognised and mapped reliably. Otherwise, the robot navigates into it and crashes. Further, points behind the surface should be identified and treated based on the object type. Points behind a transparent surface should remain as they represent real objects. In contrast, Points behind a specular reflective surface should be erased. To do so, the object type needs to be classified. Unfortunately, none of the current approaches is capable to fulfil these requirements.
Therefore, the following thesis addresses this problem to detect transparent and specular reflective objects and to identify their influences. To give the reader a start up, the first chapters
describe: the theoretical background concerning propagation of light; sensor systems applied for range measurements; mapping approaches used in this work; and the state-of-the-art concerning detection and identification of transparent and specular reflective objects. Afterwards, the Reflection-Identification-Approach, which is the core of subject thesis is presented. It describes 2D and a 3D implementation to detect and classify such objects. Both are available as ROS-nodes. In the next chapter, various experiments demonstrate the applicability and reliability of these nodes. It proves that transparent and specular reflective objects can be detected and classified. Therefore, a Pre- and Post-Filter module is required in 2D. In 3D, classification is possible solely with the Pre-Filter. This is due to the higher amount of measurements. An
example shows that an updatable mapping module allows the robot navigation to rely on refined maps. Otherwise, two individual maps are build which require a fusion afterwards. Finally, the
last chapter summarizes the results and proposes suggestions for future work.
Understanding human navigation behavior has implications for a wide range of application scenarios. For example, insights into geo-spatial navigation in urban areas can impact city planning or public transport. Similarly, knowledge about navigation on the web can help to improve web site structures or service experience.
In this work, we focus on a hypothesis-driven approach to address the task of understanding human navigation: We aim to formulate and compare ideas — for example stemming from existing theory, literature, intuition, or previous experiments — based on a given set of navigational observations. For example, we may compare whether tourists exploring a city walk “short distances” before taking their next photo vs. they tend to "travel long distances between points of interest", or whether users browsing Wikipedia "navigate semantically" vs. "click randomly".
For this, the Bayesian method HypTrails has recently been proposed. However, while HypTrails is a straightforward and flexible approach, several major challenges remain:
i) HypTrails does not account for heterogeneity (e.g., incorporating differently behaving user groups such as tourists and locals is not possible), ii) HypTrails does not support the user in conceiving novel hypotheses when confronted with a large set of possibly relevant background information or influence factors, e.g., points of interest, popularity of locations, time of the day, or user properties, and finally iii) formulating hypotheses can be technically challenging depending on the application scenario (e.g., due to continuous observations or temporal constraints). In this thesis, we address these limitations by introducing various novel methods and tools and explore a wide range of case studies.
In particular, our main contributions are the methods MixedTrails and SubTrails which specifically address the first two limitations: MixedTrails is an approach for hypothesis comparison that extends the previously proposed HypTrails method to allow formulating and comparing heterogeneous hypotheses (e.g., incorporating differently behaving user groups). SubTrails is a method that supports hypothesis conception by automatically discovering interpretable subgroups with exceptional navigation behavior. In addition, our methodological contributions also include several tools consisting of a distributed implementation of HypTrails, a web application for visualizing geo-spatial human navigation in the context of background information, as well as a system for collecting, analyzing, and visualizing mobile participatory sensing data.
Furthermore, we conduct case studies in many application domains, which encompass — among others — geo-spatial navigation based on photos from the photo-sharing platform Flickr, browsing behavior on the social tagging system BibSonomy, and task choosing behavior on a commercial crowdsourcing platform. In the process, we develop approaches to cope with application specific subtleties (like continuous observations and temporal constraints). The corresponding studies illustrate the variety of domains and facets in which navigation behavior can be studied and, thus, showcase the expressiveness, applicability, and flexibility of our methods. Using these methods, we present new aspects of navigational phenomena which ultimately help to better understand the multi-faceted characteristics of human navigation behavior.
Der Betrieb von Satelliten wird sich in Zukunft gravierend ändern. Die bisher ausgeübte konventionelle Vorgehensweise, bei der die Planung der vom Satelliten auszuführenden Aktivitäten sowie die Kontrolle hierüber ausschließlich vom Boden aus erfolgen, stößt bei heutigen Anwendungen an ihre Grenzen. Im schlimmsten Fall verhindert dieser Umstand sogar die Erschließung bisher ungenutzter Möglichkeiten. Der Gewinn eines Satelliten, sei es in Form wissenschaftlicher Daten oder der Vermarktung satellitengestützter Dienste, wird daher nicht optimal ausgeschöpft.
Die Ursache für dieses Problem lässt sich im Grunde auf eine ausschlaggebende Tatsache zurückführen: Konventionelle Satelliten können ihr Verhalten, d.h. die Folge ihrer Tätigkeiten, nicht eigenständig anpassen. Stattdessen erstellt das Bedienpersonal am Boden - vor allem die Operatoren - mit Hilfe von Planungssoftware feste Ablaufpläne, die dann in Form von Kommandosequenzen von den Bodenstationen aus an die jeweiligen Satelliten hochgeladen werden. Dort werden die Befehle lediglich überprüft, interpretiert und strikt ausgeführt. Die Abarbeitung erfolgt linear. Situationsbedingte Änderungen, wie sie vergleichsweise bei der Codeausführung von Softwareprogrammen durch Kontrollkonstrukte, zum Beispiel Schleifen und Verzweigungen, üblich sind, sind typischerweise nicht vorgesehen. Der Operator ist daher die einzige Instanz, die das Verhalten des Satelliten mittels Kommandierung, per Upload, beeinflussen kann, und auch nur dann, wenn ein direkter Funkkontakt zwischen Satellit und Bodenstation besteht. Die dadurch möglichen Reaktionszeiten des Satelliten liegen bestenfalls bei einigen Sekunden, falls er sich im Wirkungsbereich der Bodenstation befindet. Außerhalb des Kontaktfensters kann sich die Zeitschranke, gegeben durch den Orbit und die aktuelle Position des Satelliten, von einigen Minuten bis hin zu einigen Stunden erstrecken. Die Signallaufzeiten der Funkübertragung verlängern die Reaktionszeiten um weitere Sekunden im erdnahen Bereich. Im interplanetaren Raum erstrecken sich die Zeitspannen aufgrund der immensen Entfernungen sogar auf mehrere Minuten. Dadurch bedingt liegt die derzeit technologisch mögliche, bodengestützte, Reaktionszeit von Satelliten bestenfalls im Bereich von einigen Sekunden.
Diese Einschränkung stellt ein schweres Hindernis für neuartige Satellitenmissionen, bei denen insbesondere nichtdeterministische und kurzzeitige Phänomene (z.B. Blitze und Meteoreintritte in die Erdatmosphäre) Gegenstand der Beobachtungen sind, dar. Die langen Reaktionszeiten des konventionellen Satellitenbetriebs verhindern die Realisierung solcher Missionen, da die verzögerte Reaktion erst erfolgt, nachdem das zu beobachtende Ereignis bereits abgeschlossen ist.
Die vorliegende Dissertation zeigt eine Möglichkeit, das durch die langen Reaktionszeiten entstandene Problem zu lösen, auf. Im Zentrum des Lösungsansatzes steht dabei die Autonomie. Im Wesentlichen geht es dabei darum, den Satelliten mit der Fähigkeit auszustatten, sein Verhalten, d.h. die Folge seiner Tätigkeiten, eigenständig zu bestimmen bzw. zu ändern. Dadurch wird die direkte Abhängigkeit des Satelliten vom Operator bei Reaktionen aufgehoben. Im Grunde wird der Satellit in die Lage versetzt, sich selbst zu kommandieren.
Die Idee der Autonomie wurde im Rahmen der zugrunde liegenden Forschungsarbeiten umgesetzt. Das Ergebnis ist ein autonomes Planungssystem. Dabei handelt es sich um ein Softwaresystem, mit dem sich autonomes Verhalten im Satelliten realisieren lässt. Es kann an unterschiedliche Satellitenmissionen angepasst werden. Ferner deckt es verschiedene Aspekte des autonomen Satellitenbetriebs, angefangen bei der generellen Entscheidungsfindung der Tätigkeiten, über die zeitliche Ablaufplanung unter Einbeziehung von Randbedingungen (z.B. Ressourcen) bis hin zur eigentlichen Ausführung, d.h. Kommandierung, ab. Das Planungssystem kommt als Anwendung in ASAP, einer autonomen Sensorplattform, zum Einsatz. Es ist ein optisches System und dient der Detektion von kurzzeitigen Phänomenen und Ereignissen in der Erdatmosphäre.
Die Forschungsarbeiten an dem autonomen Planungssystem, an ASAP sowie an anderen zu diesen in Bezug stehenden Systemen wurden an der Professur für Raumfahrttechnik des Lehrstuhls Informatik VIII der Julius-Maximilians-Universität Würzburg durchgeführt.
A complete simulation system is proposed that can be used as an educational tool by physicians in training basic skills of Minimally Invasive Vascular Interventions. In the first part, a surface model is developed to assemble arteries having a planar segmentation. It is based on Sweep Surfaces and can be extended to T- and Y-like bifurcations. A continuous force vector field is described, representing the interaction between the catheter and the surface. The computation time of the force field is almost unaffected when the resolution of the artery is increased.
The mechanical properties of arteries play an essential role in the study of the circulatory system dynamics, which has been becoming increasingly important in the treatment of cardiovascular diseases. In Virtual Reality Simulators, it is crucial to have a tissue model that responds in real time. In this work, the arteries are discretized by a two dimensional mesh and the nodes are connected by three kinds of linear springs. Three tissue layers (Intima, Media, Adventitia) are considered and, starting from the stretch-energy density, some of the elasticity tensor components are calculated. The physical model linearizes and homogenizes the material response, but it still contemplates the geometric nonlinearity. In general, if the arterial stretch varies by 1% or less, then the agreement between the linear and nonlinear models is trustworthy.
In the last part, the physical model of the wire proposed by Konings is improved. As a result, a simpler and more stable method is obtained to calculate the equilibrium configuration of the wire. In addition, a geometrical method is developed to perform relaxations. It is particularly useful when the wire is hindered in the physical method because of the boundary conditions. The physical and the geometrical methods are merged, resulting in efficient relaxations. Tests show that the shape of the virtual wire agrees with the experiment. The proposed algorithm allows real-time executions and the hardware to assemble the simulator has a low cost.
Beyond maximum independent set: an extended integer programming formulation for point labeling
(2017)
Map labeling is a classical problem of cartography that has frequently been approached by combinatorial optimization. Given a set of features in a map and for each feature a set of label candidates, a common problem is to select an independent set of labels (that is, a labeling without label–label intersections) that contains as many labels as possible and at most one label for each feature. To obtain solutions of high cartographic quality, the labels can be weighted and one can maximize the total weight (rather than the number) of the selected labels. We argue, however, that when maximizing the weight of the labeling, the influences of labels on other labels are insufficiently addressed. Furthermore, in a maximum-weight labeling, the labels tend to be densely packed and thus the map background can be occluded too much. We propose extensions of an existing model to overcome these limitations. Since even without our extensions the problem is NP-hard, we cannot hope for an efficient exact algorithm for the problem. Therefore, we present a formalization of our model as an integer linear program (ILP). This allows us to compute optimal solutions in reasonable time, which we demonstrate both for randomly generated point sets and an existing data set of cities. Moreover, a relaxation of our ILP allows for a simple and efficient heuristic, which yielded near-optimal solutions for our instances.
Background
Chronic kidney disease (CKD) is a common comorbid condition in coronary heart disease (CHD). CKD predisposes the patient to acute kidney injury (AKI) during hospitalization. Data on awareness of kidney dysfunction among CHD patients and their treating physicians are lacking. In the current cross-sectional analysis of the German EUROASPIRE IV sample we aimed to investigate the physician’s awareness of kidney disease of patients hospitalized for CHD and also the patient’s awareness of CKD in a study visit following hospital discharge.
Methods
All serum creatinine (SCr) values measured during the hospital stay were used to describe impaired kidney function (eGFR\(_{CKD-EPI}\) < 60 ml/min/1.73m2) at admission, discharge and episodes of AKI (KDIGO definition). Information extracted from hospital discharge letters and correct ICD coding for kidney disease was studied as a surrogate of physician’s awareness of kidney disease. All patients were interrogated 0.5 to 3 years after hospital discharge, whether they had ever been told about kidney disease by a physician.
Results
Of the 536 patients, 32% had evidence for acute or chronic kidney disease during the index hospital stay. Either condition was mentioned in the discharge letter in 22%, and 72% were correctly coded according to ICD-10. At the study visit in the outpatient setting 35% had impaired kidney function. Of 158 patients with kidney disease, 54 (34%) were aware of CKD. Determinants of patient’s awareness were severity of CKD (OR\(_{eGFR}\) 0.94; 95%CI 0.92–0.96), obesity (OR 1.97; 1.07–3.64), history of heart failure (OR 1.99; 1.00–3.97), and mentioning of kidney disease in the index event’s hospital discharge letter (OR 5.51; 2.35–12.9).
Conclusions
Although CKD is frequent in CHD, only one third of patients is aware of this condition. Patient’s awareness was associated with kidney disease being mentioned in the hospital discharge letter. Future studies should examine how raising physician’s awareness for kidney dysfunction may improve patient’s awareness of CKD.
Biologically inspired self-organization methods can help to manage the access control to the shared communication medium of Wireless Sensor Networks. One lightweight approach is the primitive of desynchronization, which relies on the periodic transmission of short control messages – similar to the periodical pulses of oscillators. This primitive of desynchronization has already been successfully implemented as MAC protocol for single-hop topologies. Moreover, there are also some concepts of such a protocol formulti-hop topologies available. However, the existing implementations may handle just a certain class of multi-hop topologies or are not robust against topology dynamics. In addition to the sophisticated access control of the sensor nodes of a Wireless Sensor Network in arbitrary multi-hop topologies, the communication protocol has to be lightweight, applicable, and scalable. These characteristics are of particular interest for distributed and randomly deployed networks (e.g., by dropping nodes off an airplane).
In this work we present the development of a self-organizing MAC protocol for dynamic multi-hop topologies. This implies the evaluation of related work, the conception of our new communication protocol based on the primitive of desynchronization as well as its implementation for sensor nodes. As a matter of course, we also analyze our realization with
regard to our specific requirements. This analysis is based on several (simulative as well as real-world) scenarios. Since we are mainly interested in the convergence behavior of our
protocol, we do not focus on the "classical" network issues, like routing behavior or data rate, within this work. Nevertheless, for this purpose we make use of several real-world testbeds, but also of our self-developed simulation framework.
According to the results of our evaluation phase, our self-organizing MAC protocol for WSNs, which is based on the primitive of desynchronization, meets all our demands. In fact, our communication protocol operates in arbitrary multi-hop topologies and copes well with topology dynamics. In this regard, our protocol is the first and only MAC protocol to the best of our knowledge. Moreover, due to its periodic transmission scheme, it may be an appropriate starting base for additional network services, like time synchronization or routing.
Imagine a technology that automatically creates a full 3D thermal model of an environment and detects temperature peaks in it. For better orientation in the model it is enhanced with color information. The current state of the art for analyzing temperature related issues is thermal imaging. It is relevant for energy efficiency but also for securing important infrastructure such as power supplies and temperature regulation systems. Monitoring and analysis of the data for a large building is tedious as stable conditions need to be guaranteed for several hours and detailed notes about the pose and the environment conditions for each image must be taken. For some applications repeated measurements are necessary to monitor changes over time. The analysis of the scene is only possible through expertise and experience.
This thesis proposes a robotic system that creates a full 3D model of the environment with color and thermal information by combining thermal imaging with the technology of terrestrial laser scanning. The addition of a color camera facilitates the interpretation of the data and allows for other application areas. The data from all sensors collected at different positions is joined in one common reference frame using calibration and scan matching. The first part of the thesis deals with 3D point cloud processing with the emphasis on accessing point cloud data efficiently, detecting planar structures in the data and registering multiple point clouds into one common coordinate system. The second part covers the autonomous exploration and data acquisition with a mobile robot with the objective to minimize the unseen area in 3D space. Furthermore, the combination of different modalities, color images, thermal images and point cloud data through calibration is elaborated. The last part presents applications for the the collected data. Among these are methods to detect the structure of building interiors for reconstruction purposes and subsequent detection and classification of windows. A system to project the gathered thermal information back into the scene is presented as well as methods to improve the color information and to join separately acquired point clouds and photo series.
A full multi-modal 3D model contains all the relevant geometric information about the recorded scene and enables an expert to fully analyze it off-site. The technology clears the path for automatically detecting points of interest thereby helping the expert to analyze the heat flow as well as localize and identify heat leaks. The concept is modular and neither limited to achieving energy efficiency nor restricted to the use in combination with a mobile platform. It also finds its application in fields such as archaeology and geology and can be extended by further sensors.
The design and implementation of a satellite mission
is divided into several different phases. Parallel to these phases an evolution of requirements will take place. Because so many people in different locations and from different background have to work in different subsystems concurrently the ideas and concepts of different subsystems and different locations will diverge. We have to bring them together again. To do this we introduce synchronization points. We bring representatives from all subsystems and all location in a Concurrent Engineering Facility (CEF) room together. Between CEF sessions the different subsystems will diverge again, but each time the
diversion will be smaller. Our subjective experience from test projects says this CEF sessions are most effective in the first phases of the development, from Requirements engineering until first coarse design. After Design and the concepts are fix, the developers are going to implementation and the concept divergences will be much smaller, therefore the CEF sessions are not a very big help any more.
The thesis focuses on Quality of Experience (QoE) of HTTP adaptive video streaming (HAS) and traffic management in access networks to improve the QoE of HAS. First, the QoE impact of adaptation parameters and time on layer was investigated with subjective crowdsourcing studies. The results were used to compute a QoE-optimal adaptation strategy for given video and network conditions. This allows video service providers to develop and benchmark improved adaptation logics for HAS. Furthermore, the thesis investigated concepts to monitor video QoE on application and network layer, which can be used by network providers in the QoE-aware traffic management cycle. Moreover, an analytic and simulative performance evaluation of QoE-aware traffic management on a bottleneck link was conducted. Finally, the thesis investigated socially-aware traffic management for HAS via Wi-Fi offloading of mobile HAS flows. A model for the distribution of public Wi-Fi hotspots and a platform for socially-aware traffic management on private home routers was presented. A simulative performance evaluation investigated the impact of Wi-Fi offloading on the QoE and energy consumption of mobile HAS.
The field of genetics faces a lot of challenges and opportunities in both research and diagnostics due to the rise of next generation sequencing (NGS), a technology that allows to sequence DNA increasingly fast and cheap.
NGS is not only used to analyze DNA, but also RNA, which is a very similar molecule also present in the cell, in both cases producing large amounts of data.
The big amount of data raises both infrastructure and usability problems, as powerful computing infrastructures are required and there are many manual steps in the data analysis which are complicated to execute.
Both of those problems limit the use of NGS in the clinic and research, by producing a bottleneck both computationally and in terms of manpower, as for many analyses geneticists lack the required computing skills.
Over the course of this thesis we investigated how computer science can help to improve this situation to reduce the complexity of this type of analysis.
We looked at how to make the analysis more accessible to increase the number of people that can perform OMICS data analysis (OMICS groups various genomics data-sources).
To approach this problem, we developed a graphical NGS data analysis pipeline aimed at a diagnostics environment while still being useful in research in close collaboration with the Human Genetics Department at the University of Würzburg.
The pipeline has been used in various research papers on covering subjects, including works with direct author participation in genomics, transcriptomics as well as epigenomics.
To further validate the graphical pipeline, a user survey was carried out which confirmed that it lowers the complexity of OMICS data analysis.
We also studied how the data analysis can be improved in terms of computing infrastructure by improving the performance of certain analysis steps.
We did this both in terms of speed improvements on a single computer (with notably variant calling being faster by up to 18 times), as well as with distributed computing to better use an existing infrastructure.
The improvements were integrated into the previously described graphical pipeline, which itself also was focused on low resource usage.
As a major contribution and to help with future development of parallel and distributed applications, for the usage in genetics or otherwise, we also looked at how to make it easier to develop such applications.
Based on the parallel object programming model (POP), we created a Java language extension called POP-Java, which allows for easy and transparent distribution of objects.
Through this development, we brought the POP model to the cloud, Hadoop clusters and present a new collaborative distributed computing model called FriendComputing.
The advances made in the different domains of this thesis have been published in various works specified in this document.
The issue of sustainability is at the top of the political and societal agenda, being considered of extreme importance and urgency. Human individual action impacts the environment both locally (e.g., local air/water quality, noise disturbance) and globally (e.g., climate change, resource use). Urban environments represent a crucial example, with an increasing realization that the most effective way of producing a change is involving the citizens themselves in monitoring campaigns (a citizen science bottom-up approach). This is possible by developing novel technologies and IT infrastructures enabling large citizen participation. Here, in the wider framework of one of the first such projects, we show results from an international competition where citizens were involved in mobile air pollution monitoring using low cost sensing devices, combined with a web-based game to monitor perceived levels of pollution. Measures of shift in perceptions over the course of the campaign are provided, together with insights into participatory patterns emerging from this study. Interesting effects related to inertia and to direct involvement in measurement activities rather than indirect information exposure are also highlighted, indicating that direct involvement can enhance learning and environmental awareness. In the future, this could result in better adoption of policies towards decreasing pollution.
Enterprise applications in virtualized data centers are often subject to time-varying workloads, i.e., the load intensity and request mix change over time, due to seasonal patterns and trends, or unpredictable bursts in user requests. Varying workloads result in frequently changing resource demands to the underlying hardware infrastructure. Virtualization technologies enable sharing and on-demand allocation of hardware resources between multiple applications. In this context, the resource allocations to virtualized applications should be continuously adapted in an elastic fashion, so that "at each point in time the available resources match the current demand as closely as possible" (Herbst el al., 2013). Autonomic approaches to resource management promise significant increases in resource efficiency while avoiding violations of performance and availability requirements during peak workloads.
Traditional approaches for autonomic resource management use threshold-based rules (e.g., Amazon EC2) that execute pre-defined reconfiguration actions when a metric reaches a certain threshold (e.g., high resource utilization or load imbalance). However, many business-critical applications are subject to Service-Level-Objectives defined on an application performance metric (e.g., response time or throughput). To determine thresholds so that the end-to-end application SLO is fulfilled poses a major challenge due to the complex relationship between the resource allocation to an application and the application performance. Furthermore, threshold-based approaches are inherently prone to an oscillating behavior resulting in unnecessary reconfigurations.
In order to overcome the deficiencies of threshold-based
approaches and enable a fully automated approach to dynamically control the resource allocations of virtualized applications, model-based approaches are required that can predict the impact of a reconfiguration on the application performance in advance. However, existing model-based approaches are severely limited in their learning capabilities. They either require complete performance models of the application as input, or use a pre-identified model structure and only learn certain model parameters from empirical data at run-time. The former requires high manual efforts and deep system knowledge to create the performance models. The latter does not provide the flexibility to capture the specifics of complex and heterogeneous system architectures.
This thesis presents a self-aware approach to the resource management in virtualized data centers. In this context, self-aware means that it automatically learns performance models of the application and the virtualized infrastructure and reasons based on these models to autonomically adapt the resource allocations in accordance with given application SLOs. Learning a performance model requires the extraction of the model structure representing the system architecture as well as the estimation of model parameters, such as resource demands. The estimation of resource demands is a key challenge as they cannot be observed directly in most systems.
The major scientific contributions of this thesis are:
- A reference architecture for online model learning in virtualized systems. Our reference architecture is based on a set of model extraction agents. Each agent focuses on specific tasks to automatically create and update model skeletons capturing its local knowledge of the system and collaborates with other agents to extract the structural parts of a global performance model of the system. We define different agent roles in the reference architecture and propose a model-based collaboration mechanism for the agents. The agents may be bundled within virtual appliances and may be tailored to include knowledge about the software stack deployed in a specific virtual appliance.
- An online method for the statistical estimation of resource demands. For a given request processed by an application, the resource time consumed for a specified resource within the system (e.g., CPU or I/O device), referred to as resource demand, is the total average time the resource is busy processing the request. A request could be any unit of work (e.g., web page request, database transaction, batch job) processed by the system. We provide a systematization of existing statistical approaches to resource demand estimation and conduct an extensive experimental comparison to evaluate the accuracy of these approaches. We propose a novel method to automatically select estimation approaches and demonstrate that it increases the robustness and accuracy of the estimated resource demands significantly.
- Model-based controllers for autonomic vertical scaling of virtualized applications. We design two controllers based on online model-based reasoning techniques in order to vertically scale applications at run-time in accordance with application SLOs. The controllers exploit the knowledge from the automatically extracted performance models when determining necessary reconfigurations. The first controller adds and removes virtual CPUs to an application depending on the current demand. It uses a layered performance model to also consider the physical resource contention when determining the required resources. The second controller adapts the resource allocations proactively to ensure the availability of the application during workload peaks and avoid reconfiguration during phases of high workload.
We demonstrate the applicability of our approach in current virtualized environments and show its effectiveness leading to significant increases in resource efficiency and improvements of the application performance and availability under time-varying workloads. The evaluation of our approach is based on two case studies representative of widely used enterprise applications in virtualized data centers. In our case studies, we were able to reduce the amount of required CPU resources by up to 23% and the number of reconfigurations by up to 95% compared to a rule-based approach while ensuring full compliance with application SLO. Furthermore, using workload forecasting techniques we were able to schedule expensive reconfigurations (e.g., changes to the memory size) during phases of load load and thus were able to reduce their impact on application availability by over 80% while significantly improving application performance compared to a reactive controller. The methods and techniques for resource demand estimation and vertical application scaling were developed and evaluated in close collaboration with VMware and Google.
The progress which has been made in semiconductor chip production in recent years enables a multitude of cores on a single die. However, due to further decreasing structure sizes, fault tolerance and energy consumption will represent key challenges. Furthermore, an efficient communication infrastructure is indispensable due to the high parallelism at those systems. The predominant communication system at such highly parallel systems is a Network on Chip (NoC). The focus of this thesis is on NoCs which are based on deflection routing. In this context, contributions are made to two domains, fault tolerance and dimensioning of the optimal link width. Both aspects are essential for the application of reliable, energy efficient, and deflection routing based NoCs.
It is expected that future semiconductor systems have to cope with high fault probabilities. The inherently given high connectivity of most NoC topologies can be exploited to tolerate the breakdown of links and other components. In this thesis, a fault-tolerant router architecture has been developed, which stands out for the deployed interconnection architecture and the method to overcome complex fault situations. The presented simulation results show, all data packets arrive at their destination, even at high fault probabilities. In contrast to routing table based architectures, the hardware costs of the herein presented architecture are lower and, in particular, independent of the number of components in the network.
Besides fault tolerance, hardware costs and energy efficiency are of great importance. The utilized link width has a decisive influence on these aspects. In particular, at deflection routing based NoCs, over- and under-sizing of the link width leads to unnecessary high hardware costs and bad performance, respectively. In the second part of this thesis, the optimal link width at deflection routing based NoCs is investigated. Additionally, a method to reduce the link width is introduced. Simulation and synthesis results show, the herein presented method allows a significant reduction of hardware costs at comparable performance.
Content Delivery Networks (CDNs) are networks that distribute content in the Internet. CDNs are increasingly responsible for the largest share of traffic in the Internet. CDNs distribute popular content to caches in many geographical areas to save bandwidth by avoiding unnecessary multihop retransmission. By bringing the content geographically closer to the user, CDNs also reduce the latency of the services.
Besides end users and content providers, which require high availability of high quality content, CDN providers and Internet Service Providers (ISPs) are interested in an efficient operation of CDNs. In order to ensure an efficient replication of the content, CDN providers have a network of (globally) distributed interconnected datacenters at different points of presence (PoPs). ISPs aim to provide reliable and high speed Internet access. They try to keep the load on the network low and to reduce cost for connectivity with other ISPs.
The increasing number of mobile devices such as smart phones and tablets, high definition video content and high resolution displays result in a continuous growth in mobile traffic. This growth in mobile traffic is further accelerated by newly emerging services, such as mobile live streaming and broadcasting services. The steep increase in mobile traffic is expected to reach by 2018 roughly 60% of total network traffic, the majority of which will be video. To handle the growth in mobile networks, the next generation of 5G mobile networks is designed to have higher access rates and an increased densification of the network infrastructure. With the explosion of access rates and number of base stations the backhaul of wireless networks will become congested.
To reduce the load on the backhaul, the research community suggests installing local caches in gateway routers between the wireless network and the Internet, in base stations of different sizes, and in end-user devices. The local deployment of caches allows keeping the traffic within the ISPs network. The caches are organized in a hierarchy, where caches in the lowest tier are requested first. The request is forwarded to the next tier, if the requested object is not found. Appropriate evaluation methods are required to optimally dimension the caches dependent on the traffic characteristics and the available resources. Additionally methods are necessary that allow performance evaluation of backhaul bandwidth aggregation systems, which further reduce the load on the backhaul.
This thesis analyses CDNs utilizing locally available resources and develops the following evaluations and optimization approaches: Characterization of CDNs and distribution of resources in the Internet, analysis and optimization of hierarchical caching systems with bandwidth constraints and performance evaluation of bandwidth aggregation systems.
Multimodal interfaces (MMIs) are a promising human-computer interaction paradigm.
They are feasible for a wide rang of environments, yet they are especially suited if interactions are spatially and temporally grounded with an environment in which the user is (physically) situated.
Real-time interactive systems (RISs) are technical realizations for situated interaction environments, originating from application areas like virtual reality, mixed reality, human-robot interaction, and computer games.
RISs include various dedicated processing-, simulation-, and rendering subsystems which collectively maintain a real-time simulation of a coherent application state.
They thus fulfil the complex functional requirements of their application areas. Two contradicting principles determine the architecture of RISs: coupling and cohesion.
On the one hand, RIS subsystems commonly use specific data structures for multiple purposes to guarantee performance and rely on close semantic and temporal coupling between each other to maintain consistency.
This coupling is exacerbated if the integration of artificial intelligence (AI) methods is necessary, such as for realizing MMIs.
On the other hand, software qualities like reusability and modifiability call for a decoupling of subsystems and architectural elements with single well-defined purposes, i.e., high cohesion.
Systems predominantly favour performance and consistency over reusability and modifiability to handle this contradiction.
They thus accept low maintainability in general and hindered scientific progress in the long-term.
This thesis presents six semantics-based techniques that extend the established entity-component system (ECS) pattern and pose a solution to this contradiction without sacrificing maintainability: semantic grounding, a semantic entity-component state, grounded actions, semantic queries, code from semantics, and decoupling by semantics.
The extension solves the ECS pattern's runtime type deficit, improves component granularity, facilitates access to entity properties outside a subsystem's component association, incorporates a concept to semantically describe behavior as complement to the state representation, and enables compatibility even between RISs.
The presented reference implementation Simulator X validates the feasibility of the six techniques and may be (re)used by other researchers due to its availability under an open-source licence.
It includes a repertoire of common multimodal input processing steps that showcase the particular adequacy of the six techniques for such processing.
The repertoire adds up to the integrated multimodal processing framework miPro, making Simulator X a RIS platform with explicit MMI support.
The six semantics-based techniques as well as the reference implementation are validated by four expert reviews, multiple proof of concept prototypes, and two explorative studies.
Informal insights gathered throughout the design and development supplement this assessment in the form of lessons learned meant to aid future development in the area.
While teleoperation of technical highly sophisticated systems has already been a wide field of research, especially for space and robotics applications, the automation industry has not yet benefited from its results. Besides the established fields of application, also production lines with industrial robots and the surrounding plant components are in need of being remotely accessible. This is especially critical for maintenance or if an unexpected problem cannot be solved by the local specialists.
Special machine manufacturers, especially robotics companies, sell their technology worldwide. Some factories, for example in emerging economies, lack qualified personnel for repair and maintenance tasks. When a severe failure occurs, an expert of the manufacturer needs to fly there, which leads to long down times of the machine or even the whole production line. With the development of data networks, a huge part of those travels can be omitted, if appropriate teleoperation equipment is provided.
This thesis describes the development of a telemaintenance system, which was established in an active production line for research purposes. The customer production site of Braun in Marktheidenfeld, a factory which belongs to Procter & Gamble, consists of a six-axis cartesian industrial robot by KUKA Industries, a two-component injection molding system and an assembly unit. The plant produces plastic parts for electric toothbrushes.
In the research projects "MainTelRob" and "Bayern.digital", during which this plant was utilised, the Zentrum für Telematik e.V. (ZfT) and its project partners develop novel technical approaches and procedures for modern telemaintenance. The term "telemaintenance" hereby refers to the integration of computer science and communication technologies into the maintenance strategy. It is particularly interesting for high-grade capital-intensive goods like industrial robots. Typical telemaintenance tasks are for example the analysis of a robot failure or difficult repair operations. The service department of KUKA Industries is responsible for the worldwide distributed customers who own more than one robot. Currently such tasks are offered via phone support and service staff which travels abroad. They want to expand their service activities on telemaintenance and struggle with the high demands of teleoperation especially regarding security infrastructure. In addition, the facility in Marktheidenfeld has to keep up with the high international standards of Procter & Gamble and wants to minimize machine downtimes. Like 71.6 % of all German companies, P&G sees a huge potential for early information on their production system, but complains about the insufficient quality and the lack of currentness of data.
The main research focus of this work lies on the human machine interface for all human tasks in a telemaintenance setup. This thesis provides own work in the use of a mobile device in context of maintenance, describes new tools on asynchronous remote analysis and puts all parts together in an integrated telemaintenance infrastructure. With the help of Augmented Reality, the user performance and satisfaction could be raised. A special regard is put upon the situation awareness of the remote expert realized by different camera viewpoints. In detail the work consists of:
- Support of maintenance tasks with a mobile device
- Development and evaluation of a context-aware inspection tool
- Comparison of a new touch-based mobile robot programming device to the former teach pendant
- Study on Augmented Reality support for repair tasks with a mobile device
- Condition monitoring for a specific plant with industrial robot
- Human computer interaction for remote analysis of a single plant cycle
- A big data analysis tool for a multitude of cycles and similar plants
- 3D process visualization for a specific plant cycle with additional virtual information
- Network architecture in hardware, software and network infrastructure
- Mobile device computer supported collaborative work for telemaintenance
- Motor exchange telemaintenance example in running production environment
- Augmented reality supported remote plant visualization for better situation awareness
Modern software is often realized as a modular combination of subsystems for, e. g.,
knowledge management, visualization, verification, or the interaction with users. As
a result, software libraries from possibly different programming languages have to
work together. Even more complex the case is if different programming paradigms
have to be combined. This type of diversification of programming languages and
paradigms in just one software application can only be mastered by mechanisms
for a seamless integration of the involved programming languages. However, the
integration of the common logic programming language Prolog and the popular
object-oriented programming language Java is complicated by various interoperability
problems which stem on the one hand from the paradigmatic gap between the
programming languages, and on the other hand, from the diversity of the available
Prolog systems.
The subject of the thesis is the investigation of novel mechanisms for the integration
of logic programming in Prolog and object–oriented programming in Java. We are
particularly interested in an object–oriented, uniform approach which is not specific
to just one Prolog system. Therefore, we have first identified several important
criteria for the seamless integration of Prolog and Java from the object–oriented
perspective. The main contribution of the thesis is a novel integration framework
called the Connector Architecture for Prolog and Java (CAPJa). The framework is
completely implemented in Java and imposes no modifications to the Java Virtual
Machine or Prolog. CAPJa provides a semi–automated mechanism for the integration
of Prolog predicates into Java. For compact, readable, and object–oriented
queries to Prolog, CAPJa exploits lambda expressions with conditional and relational
operators in Java. The communication between Java and Prolog is based
on a fully automated mapping of Java objects to Prolog terms, and vice versa. In
Java, an extensible system of gateways provides connectivity with various Prolog
system and, moreover, makes any connected Prolog system easily interchangeable,
without major adaption in Java.
This thesis contributes to several issues in the context of SDN and NFV, with an emphasis on performance and management.
The main contributions are guide lines for operators migrating to software-based networks, as well as an analytical model for the packet processing in a Linux system using the Kernel NAPI.
This article presents an immersive virtual reality (VR) system for training classroom management skills, with a specific focus on learning to manage disruptive student behavior in face-to-face, one-to-many teaching scenarios. The core of the system is a real-time 3D virtual simulation of a classroom populated by twenty-four semi-autonomous virtual students. The system has been designed as a companion tool for classroom management seminars in a syllabus for primary and secondary school teachers. This will allow lecturers to link theory with practice using the medium of VR. The system is therefore designed for two users: a trainee teacher and an instructor supervising the training session. The teacher is immersed in a real-time 3D simulation of a classroom by means of a head-mounted display and headphone. The instructor operates a graphical desktop console, which renders a view of the class and the teacher whose avatar movements are captured by a marker less tracking system. This console includes a 2D graphics menu with convenient behavior and feedback control mechanisms to provide human-guided training sessions. The system is built using low-cost consumer hardware and software. Its architecture and technical design are described in detail. A first evaluation confirms its conformance to critical usability requirements (i.e., safety and comfort, believability, simplicity, acceptability, extensibility, affordability, and mobility). Our initial results are promising and constitute the necessary first step toward a possible investigation of the efficiency and effectiveness of such a system in terms of learning outcomes and experience.
Im Rahmen dieser Arbeit werden die Nebenläufigkeit, Konsistenz und Latenz in asynchronen
Interaktiven Echtzeitsystemen durch die Techniken des Profilings und des Model
Checkings untersucht. Zu Beginn wird erläutert, warum das asynchrone Modell das vielversprechendste
für die Nebenläufigkeit in einem Interaktiven Echtzeitsystem ist. Hierzu
wird ein Vergleich zu anderen Modellen gezogen. Darüber hinaus wird ein detaillierter
Vergleich von Synchronisationstechnologien, welche die Grundlage für Konsistenz
schaffen, durchgeführt. Auf der Grundlage dieser beiden Vergleiche und der Betrachtung
anderer Systeme wird ein Synchronisationskonzept entwickelt.
Auf dieser Basis wird die Nebenläufigkeit, Konsistenz und Latenz mit zwei Verfahren
untersucht. Die erste Technik ist das Profiling, wobei einige neue Darstellungsformen von
gemessenen Daten entwickelt werden. Diese neu entwickelten Darstellungsformen werden
in der Implementierung eines Profilers verwendet. Als zweite Technik wird das Model
Checking analysiert, welches bisher noch nicht im Kontext von Interaktiven Echtzeitsystemen
verwendet wurde. Model Checking dient dazu, die Verhaltensweise eines Interaktiven
Echtzeitsystems vorherzusagen. Diese Vorhersagen werden mit den Messungen aus
dem Profiler verglichen.
An innovative framework has been developed for teamwork of two quadcopter formations, each having its specified formation geometry, assigned task, and matching control scheme. Position control for quadcopters in one of the formations has been implemented through a Linear Quadratic Regulator Proportional Integral (LQR PI) control scheme based on explicit model following scheme. Quadcopters in the other formation are controlled through LQR PI servomechanism control scheme. These two control schemes are compared in terms of their performance and control effort. Both formations are commanded by respective ground stations through virtual leaders. Quadcopters in formations are able to track desired trajectories as well as hovering at desired points for selected time duration. In case of communication loss between ground station and any of the quadcopters, the neighboring quadcopter provides the command data, received from the ground station, to the affected unit. Proposed control schemes have been validated through extensive simulations using MATLAB®/Simulink® that provided favorable results.
To protect the health of human and environment, the European Union implemented the REACH regulation for chemical substances. REACH is an acronym for Registration, Evaluation, Authorization, and Restriction of Chemicals. Under REACH, the authorities have the task of assessing chemical substances, especially those that might pose a risk to human health or environment. The work under REACH is scientifically, technically and procedurally a complex and knowledge-intensive task that is jointly performed by the European Chemicals Agency and member state authorities in Europe. The assessment of substances under REACH conducted in the German Environment Agency is supported by the knowledge-based system KnowSEC, which is used for the screening, documentation, and decision support when working on chemical substances. The software KnowSEC integrates advanced semantic technologies and strong problem solving methods. It allows for the collaborative work on substances in the context of the European REACH regulation. We discuss the applied methods and process models and we report on experiences with the implementation and use of the system.
A centralized heterogeneous formation flight position control scheme has been formulated using an explicit model following design, based on a Linear Quadratic Regulator Proportional Integral (LQR PI) controller. The leader quadcopter is a stable reference model with desired dynamics whose output is perfectly tracked by the two wingmen quadcopters. The leader itself is controlled through the pole placement control method with desired stability characteristics, while the two followers are controlled through a robust and adaptive LQR PI control method. Selected 3-D formation geometry and static stability are maintained under a number of possible perturbations. With this control scheme, formation geometry may also be switched to any arbitrary shape during flight, provided a suitable collision avoidance mechanism is incorporated. In case of communication loss between the leader and any of the followers, the other follower provides the data, received from the leader, to the affected follower. The stability of the closed-loop system has been analyzed using singular values. The proposed approach for the tightly coupled formation flight of mini unmanned aerial vehicles has been validated with the help of extensive simulations using MATLAB/Simulink, which provided promising results.
Nowadays, data centers are becoming increasingly dynamic due to the common adoption of virtualization technologies. Systems can scale their capacity on demand by growing and shrinking their resources dynamically based on the current load. However, the complexity and performance of modern data centers is influenced not only by the software architecture, middleware, and computing resources, but also by network virtualization, network protocols, network services, and configuration. The field of network virtualization is not as mature as server virtualization and there are multiple competing approaches and technologies. Performance modeling and prediction techniques provide a powerful tool to analyze the performance of modern data centers. However, given the wide variety of network virtualization approaches, no common approach exists for modeling and evaluating the performance of virtualized networks.
The performance community has proposed multiple formalisms and models for evaluating the performance of infrastructures based on different network virtualization technologies. The existing performance models can be divided into two main categories: coarse-grained analytical models and highly-detailed simulation models. Analytical performance models are normally defined at a high level of abstraction and thus they abstract many details of the real network and therefore have limited predictive power. On the other hand, simulation models are normally focused on a selected networking technology and take into account many specific performance influencing factors, resulting in detailed models that are tightly bound to a given technology, infrastructure setup, or to a given protocol stack.
Existing models are inflexible, that means, they provide a single solution method without providing means for the user to influence the solution accuracy and solution overhead. To allow for flexibility in the performance prediction, the user is required to build multiple different performance models obtaining multiple performance predictions. Each performance prediction may then have different focus, different performance metrics, prediction accuracy, and solving time.
The goal of this thesis is to develop a modeling approach that does not require the user to have experience in any of the applied performance modeling formalisms. The approach offers the flexibility in the modeling and analysis by balancing between: (a) generic character and low overhead of coarse-grained analytical models, and (b) the more detailed simulation models with higher prediction accuracy.
The contributions of this thesis intersect with technologies and research areas, such as: software engineering, model-driven software development, domain-specific modeling, performance modeling and prediction, networking and data center networks, network virtualization, Software-Defined Networking (SDN), Network Function Virtualization (NFV). The main contributions of this thesis compose the Descartes Network Infrastructure (DNI) approach and include:
• Novel modeling abstractions for virtualized network infrastructures. This includes two meta-models that define modeling languages for modeling data center network performance. The DNI and miniDNI meta-models provide means for representing network infrastructures at two different abstraction levels. Regardless of which variant of the DNI meta-model is used, the modeling language provides generic modeling elements allowing to describe the majority of existing and future network technologies, while at the same time abstracting factors that have low influence on the overall performance. I focus on SDN and NFV as examples of modern virtualization technologies.
• Network deployment meta-model—an interface between DNI and other meta- models that allows to define mapping between DNI and other descriptive models. The integration with other domain-specific models allows capturing behaviors that are not reflected in the DNI model, for example, software bottlenecks, server virtualization, and middleware overheads.
• Flexible model solving with model transformations. The transformations enable solving a DNI model by transforming it into a predictive model. The model transformations vary in size and complexity depending on the amount of data abstracted in the transformation process and provided to the solver. In this thesis, I contribute six transformations that transform DNI models into various predictive models based on the following modeling formalisms: (a) OMNeT++ simulation, (b) Queueing Petri Nets (QPNs), (c) Layered Queueing Networks (LQNs). For each of these formalisms, multiple predictive models are generated (e.g., models with different level of detail): (a) two for OMNeT++, (b) two for QPNs, (c) two for LQNs. Some predictive models can be solved using multiple alternative solvers resulting in up to ten different automated solving methods for a single DNI model.
• A model extraction method that supports the modeler in the modeling process by automatically prefilling the DNI model with the network traffic data. The contributed traffic profile abstraction and optimization method provides a trade-off by balancing between the size and the level of detail of the extracted profiles.
• A method for selecting feasible solving methods for a DNI model. The method proposes a set of solvers based on trade-off analysis characterizing each transformation with respect to various parameters such as its specific limitations, expected prediction accuracy, expected run-time, required resources in terms of CPU and memory consumption, and scalability.
• An evaluation of the approach in the context of two realistic systems. I evaluate the approach with focus on such factors like: prediction of network capacity and interface throughput, applicability, flexibility in trading-off between prediction accuracy and solving time. Despite not focusing on the maximization of the prediction accuracy, I demonstrate that in the majority of cases, the prediction error is low—up to 20% for uncalibrated models and up to 10% for calibrated models depending on the solving technique.
In summary, this thesis presents the first approach to flexible run-time performance prediction in data center networks, including network based on SDN. It provides ability to flexibly balance between performance prediction accuracy and solving overhead. The approach provides the following key benefits:
• It is possible to predict the impact of changes in the data center network on the performance. The changes include: changes in network topology, hardware configuration, traffic load, and applications deployment.
• DNI can successfully model and predict the performance of multiple different of network infrastructures including proactive SDN scenarios.
• The prediction process is flexible, that is, it provides balance between the granularity of the predictive models and the solving time. The decreased prediction accuracy is usually rewarded with savings of the solving time and consumption of resources required for solving.
• The users are enabled to conduct performance analysis using multiple different prediction methods without requiring the expertise and experience in each of the modeling formalisms.
The components of the DNI approach can be also applied to scenarios that are not considered in this thesis. The approach is generalizable and applicable for the following examples: (a) networks outside of data centers may be analyzed with DNI as long as the background traffic profile is known; (b) uncalibrated DNI models may serve as a basis for design-time performance analysis; (c) the method for extracting and compacting of traffic profiles may be used for other, non-network workloads as well.
In the present work, a simulation system is proposed that can be used as an educational tool by physicians in training basic skills of minimally invasive vascular interventions. In order to accomplish this objective, initially the physical model of the wire proposed by Konings has been improved. As a result, a simpler and more stable method was obtained to calculate the equilibrium configuration of the wire. In addition, a geometrical method is developed to perform relaxations. It is particularly useful when the wire is hindered in the physical method because of the boundary conditions. Then a recipe is given to merge the physical and the geometrical methods, resulting in efficient relaxations. Moreover, tests have shown that the shape of the virtual wire agrees with the experiment. The proposed algorithm allows real-time executions, and furthermore, the hardware to assemble the simulator has a low cost.
3D point clouds are a de facto standard for 3D documentation and modelling. The advances in laser scanning technology broadens the usability and access to 3D measurement systems. 3D point clouds are used in many disciplines such as robotics, 3D modelling, archeology and surveying. Scanners are able to acquire up to a million of points per second to represent the environment with a dense point cloud. This represents the captured environment with a very high degree of detail. The combination of laser scanning technology with photography adds color information to the point clouds. Thus the environment is represented more realistically. Full 3D models of environments, without any occlusion, require multiple scans. Merging point clouds is a challenging process. This thesis presents methods for point cloud registration based on the panorama images generated from the scans. Image representation of point clouds introduces 2D image processing methods to 3D point clouds. Several projection methods for the generation of panorama maps of point clouds are presented in this thesis. Additionally, methods for point cloud reduction and compression based on the panorama maps are proposed. Due to the large amounts of data generated from the 3D measurement systems these methods are necessary to improve the point cloud processing, transmission and archiving. This thesis introduces point cloud processing methods as a novel framework for the digitisation of archeological excavations. The framework replaces the conventional documentation methods for excavation sites. It employs point clouds for the generation of the digital documentation of an excavation with the help of an archeologist on-site. The 3D point cloud is used not only for data representation but also for analysis and knowledge generation. Finally, this thesis presents an autonomous indoor mobile mapping system. The mapping system focuses on the sensor placement planning method. Capturing a complete environment requires several scans. The sensor placement planning method solves for the minimum required scans to digitise large environments. Combining this method with a navigation system on a mobile robot platform enables it to acquire data fully autonomously. This thesis introduces a novel hole detection method for point clouds to detect obscured parts of a captured environment. The sensor placement planning method selects the next scan position with the most coverage of the obscured environment. This reduces the required number of scans. The navigation system on the robot platform consist of path planning, path following and obstacle avoidance. This guarantees the safe navigation of the mobile robot platform between the scan positions. The sensor placement planning method is designed as a stand alone process that could be used with a mobile robot platform for autonomous mapping of an environment or as an assistant tool for the surveyor on scanning projects.
This paper demonstrates an innovative and simple solution for obstacle detection and collision avoidance of unmanned aerial vehicles (UAVs) optimized for and evaluated with quadrotors. The sensors exploited in this paper are low-cost ultrasonic and infrared range finders, which are much cheaper though noisier than more expensive sensors such as laser scanners. This needs to be taken into consideration for the design, implementation, and parametrization of the signal processing and control algorithm for such a system, which is the topic of this paper. For improved data fusion, inertial and optical flow sensors are used as a distance derivative for reference. As a result, a UAV is capable of distance controlled collision avoidance, which is more complex and powerful than comparable simple solutions. At the same time, the solution remains simple with a low computational burden. Thus, memory and time-consuming simultaneous localization and mapping is not required for collision avoidance.
Diese Forschungsarbeit beschreibt alle Aspekte der Entwicklung eines neuartigen, autonomen Quadrokopters, genannt AQopterI8, zur Innenraumerkundung. Dank seiner einzigartigen modularen Komposition von Soft- und Hardware ist der AQopterI8 in der Lage auch unter widrigen Umweltbedingungen autonom zu agieren und unterschiedliche Anforderungen zu erfüllen. Die Arbeit behandelt sowohl theoretische Fragestellungen unter dem Schwerpunkt der einfachen Realisierbarkeit als auch Aspekte der praktischen Umsetzung, womit sie Themen aus den Gebieten Signalverarbeitung, Regelungstechnik, Elektrotechnik, Modellbau, Robotik und Informatik behandelt. Kernaspekt der Arbeit sind Lösungen zur Autonomie, Hinderniserkennung und Kollisionsvermeidung.
Das System verwendet IMUs (Inertial Measurement Unit, inertiale Messeinheit) zur Orientierungsbestimmung und Lageregelung und kann unterschiedliche Sensormodelle automatisch detektieren. Ultraschall-, Infrarot- und Luftdrucksensoren in Kombination mit der IMU werden zur Höhenbestimmung und Höhenregelung eingesetzt. Darüber hinaus werden bildgebende Sensoren (Videokamera, PMD), ein Laser-Scanner sowie Ultraschall- und Infrarotsensoren zur Hindernis-erkennung und Kollisionsvermeidung (Abstandsregelung) verwendet. Mit Hilfe optischer Sensoren kann der Quadrokopter basierend auf Prinzipien der Bildverarbeitung Objekte erkennen sowie seine Position im Raum bestimmen. Die genannten Subsysteme im Zusammenspiel erlauben es dem AQopterI8 ein Objekt in einem unbekannten Raum autonom, d.h. völlig ohne jedes externe Hilfsmittel, zu suchen und dessen Position auf einer Karte anzugeben. Das System kann Kollisionen mit Wänden vermeiden und Personen autonom ausweichen. Dabei verwendet der AQopterI8 Hardware, die deutlich günstiger und Dank der Redundanz gleichzeitig erheblich verlässlicher ist als vergleichbare Mono-Sensor-Systeme (z.B. Kamera- oder Laser-Scanner-basierte Systeme).
Neben dem Zweck als Forschungsarbeit (Dissertation) dient die vorliegende Arbeit auch als Dokumentation des Gesamtprojektes AQopterI8, dessen Ziel die Erforschung und Entwicklung neuartiger autonomer Quadrokopter zur Innenraumerkundung ist. Darüber hinaus wird das System zum Zweck der Lehre und Forschung an der Universität Würzburg, der Fachhochschule Brandenburg sowie der Fachhochschule Würzburg-Schweinfurt eingesetzt. Darunter fallen Laborübungen und 31 vom Autor dieser Arbeit betreute studentische Bachelor- und Masterarbeiten.
Das Projekt wurde ausgezeichnet vom Universitätsbund und der IHK Würzburg-Mainfranken mit dem Universitätsförderpreis der Mainfränkischen Wirtschaft und wird gefördert unter den Bezeichnungen „Lebensretter mit Propellern“ und „Rettungshelfer mit Propellern“. Außerdem wurde die Arbeit für den Gips-Schüle-Preis nominiert. Absicht dieser Projekte ist die Entwicklung einer Rettungsdrohne. In den Medien Zeitung, Fernsehen und Radio wurde über den AQopterI8 schon mehrfach berichtet.
Die Evaluierung zeigt, dass das System in der Lage ist, voll autonom in Innenräumen zu fliegen, Kollisionen mit Objekten zu vermeiden (Abstandsregelung), eine Suche durchzuführen, Objekte zu erkennen, zu lokalisieren und zu zählen. Da nur wenige Forschungsarbeiten diesen Grad an Autonomie erreichen, gleichzeitig aber keine Arbeit die gestellten Anforderungen vergleichbar erfüllt, erweitert die Arbeit den Stand der Forschung.
Eine wichtige Grundlage für die quantitative Analyse von Erzähltexten, etwa eine Netzwerkanalyse der Figurenkonstellation, ist die automatische Erkennung von Referenzen auf Figuren in Erzähltexten, ein Sonderfall des generischen NLP-Problems der Named Entity Recognition. Bestehende, auf Zeitungstexten trainierte Modelle sind für literarische Texte nur eingeschränkt brauchbar, da die Einbeziehung von Appellativen in die Named Entity-Definition und deren häufige Verwendung in Romantexten zu einem schlechten Ergebnis führt. Dieses Paper stellt eine anhand eines manuell annotierten Korpus auf deutschsprachige Romane des 19. Jahrhunderts angepasste NER-Komponente vor.
Virtualization allows the creation of virtual instances of physical devices, such as network and processing units. In a virtualized system, governed by a hypervisor, resources are shared among virtual machines (VMs). Virtualization has been receiving increasing interest as away to reduce costs through server consolidation and to enhance the flexibility of physical infrastructures. Although virtualization provides many benefits, it introduces new security challenges; that is, the introduction of a hypervisor introduces threats since hypervisors expose new attack surfaces.
Intrusion detection is a common cyber security mechanism whose task is to detect malicious activities in host and/or network environments. This enables timely reaction in order to stop an on-going attack, or to mitigate the impact of a security breach. The wide adoption of virtualization has resulted in the increasingly common practice of deploying conventional intrusion detection systems (IDSs), for example, hardware IDS appliances or common software-based IDSs, in designated VMs as virtual network functions (VNFs). In addition, the research and industrial communities have developed IDSs specifically designed to operate in virtualized environments (i.e., hypervisorbased IDSs), with components both inside the hypervisor and in a designated VM. The latter are becoming increasingly common with the growing proliferation of virtualized data centers and the adoption of the cloud computing paradigm, for which virtualization is as a key enabling technology.
To minimize the risk of security breaches, methods and techniques for evaluating IDSs in an accurate manner are essential. For instance, one may compare different IDSs in terms of their attack detection accuracy in order to identify and deploy the IDS that operates optimally in a given environment, thereby reducing the risks of a security breach. However, methods and techniques for realistic and accurate evaluation of the attack detection accuracy of IDSs in virtualized environments (i.e., IDSs deployed as VNFs or hypervisor-based IDSs) are lacking. That is, workloads that exercise the sensors of an evaluated IDS and contain attacks targeting hypervisors are needed. Attacks targeting hypervisors are of high severity since they may result in, for example, altering the hypervisors’s memory and thus enabling the execution of malicious code with hypervisor privileges. In addition, there are no metrics and measurement methodologies
for accurately quantifying the attack detection accuracy of IDSs in virtualized environments with elastic resource provisioning (i.e., on-demand allocation or deallocation of virtualized hardware resources to VMs). Modern hypervisors allow for hotplugging virtual CPUs and memory on the designated VM where the intrusion detection engine of hypervisor-based IDSs, as well as of IDSs deployed as VNFs, typically operates. Resource hotplugging may have a significant impact on the attack detection accuracy of an evaluated IDS, which is not taken into account by existing metrics for quantifying IDS attack detection accuracy. This may lead to inaccurate measurements, which, in turn, may result in the deployment of misconfigured or ill-performing IDSs, increasing
the risk of security breaches.
This thesis presents contributions that span the standard components of any system
evaluation scenario: workloads, metrics, and measurement methodologies. The scientific contributions of this thesis are:
A comprehensive systematization of the common practices and the state-of-theart on IDS evaluation. This includes: (i) a definition of an IDS evaluation design space allowing to put existing practical and theoretical work into a common context in a systematic manner; (ii) an overview of common practices in IDS evaluation reviewing evaluation approaches and methods related to each part of the design space; (iii) and a set of case studies demonstrating how different IDS evaluation approaches are applied in practice. Given the significant amount of existing practical and theoretical work related to IDS evaluation, the presented systematization is beneficial for improving the general understanding of the topic by providing an overview of the current state of the field. In addition, it is beneficial for identifying and contrasting advantages and disadvantages of different IDS evaluation methods and practices, while also helping to identify specific requirements and best practices for evaluating current and future IDSs.
An in-depth analysis of common vulnerabilities of modern hypervisors as well as a set of attack models capturing the activities of attackers triggering these vulnerabilities. The analysis includes 35 representative vulnerabilities of hypercall handlers (i.e., hypercall vulnerabilities). Hypercalls are software traps from a kernel of a VM to the hypervisor. The hypercall interface of hypervisors, among device drivers and VM exit events, is one of the attack surfaces that hypervisors expose. Triggering a hypercall vulnerability may lead to a crash of the hypervisor or to altering the hypervisor’s memory. We analyze the origins
of the considered hypercall vulnerabilities, demonstrate and analyze possible attacks that trigger them (i.e., hypercall attacks), develop hypercall attack models(i.e., systematized activities of attackers targeting the hypercall interface), and discuss future research directions focusing on approaches for securing hypercall interfaces.
A novel approach for evaluating IDSs enabling the generation of workloads that contain attacks targeting hypervisors, that is, hypercall attacks. We propose an approach for evaluating IDSs using attack injection (i.e., controlled execution of attacks during regular operation of the environment where an IDS under test is deployed). The injection of attacks is performed based on attack models that capture realistic attack scenarios. We use the hypercall attack models developed as part of this thesis for injecting hypercall attacks.
A novel metric and measurement methodology for quantifying the attack detection accuracy of IDSs in virtualized environments that feature elastic resource provisioning. We demonstrate how the elasticity of resource allocations in such environments may impact the IDS attack detection accuracy and show that using existing metrics in such environments may lead to practically challenging and inaccurate measurements. We also demonstrate the practical use of the metric we propose through a set of case studies, where we evaluate common conventional IDSs deployed as VNFs.
In summary, this thesis presents the first systematization of the state-of-the-art on IDS evaluation, considering workloads, metrics and measurement methodologies as integral parts of every IDS evaluation approach. In addition, we are the first to examine the hypercall attack surface of hypervisors in detail and to propose an approach using attack injection for evaluating IDSs in virtualized environments. Finally, this thesis presents the first metric and measurement methodology for quantifying the attack detection accuracy of IDSs in virtualized environments that feature elastic resource provisioning.
From a technical perspective, as part of the proposed approach for evaluating IDSsthis thesis presents hInjector, a tool for injecting hypercall attacks. We designed hInjector to enable the rigorous, representative, and practically feasible evaluation of IDSs using attack injection. We demonstrate the application and practical usefulness of hInjector, as well as of the proposed approach, by evaluating a representative hypervisor-based IDS designed to detect hypercall attacks. While we focus on evaluating the capabilities of IDSs to detect hypercall attacks, the proposed IDS evaluation approach can be generalized and applied in a broader context. For example, it may be directly used to also evaluate security mechanisms of hypervisors, such as hypercall access control (AC) mechanisms. It may also be applied to evaluate the capabilities
of IDSs to detect attacks involving operations that are functionally similar to hypercalls,
for example, the input/output control (ioctl) calls that the Kernel-based Virtual Machine (KVM) hypervisor supports. For IDSs in virtualized environments featuring elastic resource provisioning, our approach for injecting hypercall attacks can be applied in combination with the attack detection accuracy metric and measurement methodology we propose. Our approach for injecting hypercall attacks, and our metric and measurement methodology, can also be applied independently beyond the scenarios considered in this thesis. The wide spectrum of security mechanisms in virtualized environments whose evaluation can directly benefit from the contributions of this thesis (e.g., hypervisor-based IDSs, IDSs deployed as VNFs, and AC mechanisms) reflects the practical implication of the thesis.
Operators of Higher Order
(1998)
Motivated by results on interactive proof systems we investigate the computational power of quantifiers applied to well-known complexity classes.
In special, we are interested in existential, universal and probabilistic bounded error quantifiers ranging over words and sets of words, i.e. oracles if we think in a Turing machine model.
In addition to the standard oracle access mechanism, we also consider quantifiers ranging over oracles to which access is restricted in a certain way.
Computer systems have replaced human work-force in many parts of everyday life, but there still exists a large number of tasks that cannot be automated, yet. This also includes tasks, which we consider to be rather simple like the categorization of image content or subjective ratings. Traditionally, these tasks have been completed by designated employees or outsourced to specialized companies. However, recently the crowdsourcing paradigm is more and more applied to complete such human-labor intensive tasks. Crowdsourcing aims at leveraging the huge number of Internet users all around the globe, which form a potentially highly available, low-cost, and easy accessible work-force.
To enable the distribution of work on a global scale, new web-based services emerged, so called crowdsourcing platforms, that act as mediator between employers posting tasks and workers completing tasks. However, the crowdsourcing approach, especially the large anonymous worker crowd, results in two types of challenges. On the one hand, there are technical challenges like the dimensioning of crowdsourcing platform infrastructure or the interconnection of crowdsourcing platforms and machine clouds to build hybrid services. On the other hand, there are conceptual challenges like identifying reliable workers or migrating traditional off-line work to the crowdsourcing environment. To tackle these challenges, this monograph analyzes and models current crowdsourcing systems to optimize crowdsourcing workflows and the underlying infrastructure. First, a categorization of crowdsourcing tasks and platforms is developed to derive generalizable properties. Based on this categorization and an exemplary analysis of a commercial crowdsourcing platform, models for different aspects of crowdsourcing platforms and crowdsourcing mechanisms are developed. A special focus is put on quality assurance mechanisms for crowdsourcing tasks, where the models are used to assess the suitability and costs of existing approaches for different types of tasks. Further, a novel quality assurance mechanism solely based on user-interactions is proposed and its feasibility is shown. The findings from the analysis of existing platforms, the derived models, and the developed quality assurance mechanisms are finally used to derive best practices for two crowdsourcing use-cases, crowdsourcing-based network measurements and crowdsourcing-based subjective user studies. These two exemplary use-cases cover aspects typical for a large range of crowdsourcing tasks and illustrated the potential benefits, but also resulting challenges when using crowdsourcing.
With the ongoing digitalization and globalization of the labor markets, the crowdsourcing paradigm is expected to gain even more importance in the next years. This is already evident in the currently new emerging fields of crowdsourcing, like enterprise crowdsourcing or mobile crowdsourcing. The models developed in the monograph enable platform providers to optimize their current systems and employers to optimize their workflows to increase their commercial success. Moreover, the results help to improve the general understanding of crowdsourcing systems, a key for identifying necessary adaptions and future improvements.
Software frameworks for Realtime Interactive Systems (RIS), e.g., in the areas of Virtual, Augmented, and Mixed Reality (VR, AR, and MR) or computer games, facilitate a multitude of functionalities by coupling diverse software modules. In this context, no uniform methodology for coupling these modules does exist; instead various purpose-built solutions have been proposed. As a consequence, important software qualities, such as maintainability, reusability, and adaptability, are impeded.
Many modern systems provide additional support for the integration of Artificial Intelligence (AI) methods to create so called intelligent virtual environments. These methods exacerbate the above-mentioned problem of coupling software modules in the thus created Intelligent Realtime Interactive Systems (IRIS) even more. This, on the one hand, is due to the commonly applied specialized data structures and asynchronous execution schemes, and the requirement for high consistency regarding content-wise coupled but functionally decoupled forms of data representation on the other.
This work proposes an approach to decoupling software modules in IRIS, which is based on the abstraction of architecture elements using a semantic Knowledge Representation Layer (KRL). The layer facilitates decoupling the required modules, provides a means for ensuring interface compatibility and consistency, and in the end constitutes an interface for symbolic AI methods.
Small satellites contribute significantly in the rapidly evolving innovation in space engineering, in particular in distributed space systems for global Earth observation and communication services. Significant mass reduction by miniaturization, increased utilization of commercial high-tech components, and in particular standardization are the key drivers for modern miniature space technology.
This thesis addresses key fields in research and development on miniature satellite technology regarding efficiency, flexibility, and robustness. Here, these challenges are addressed by the University of Wuerzburg’s advanced pico-satellite bus, realizing a generic modular satellite architecture and standardized interfaces for all subsystems. The modular platform ensures reusability, scalability, and increased testability due to its flexible subsystem interface which allows efficient and compact integration of the entire satellite in a plug-and-play manner.
Beside systematic design for testability, a high degree of operational robustness is achieved by the consequent implementation of redundancy of crucial subsystems. This is combined with efficient fault detection, isolation and recovery mechanisms. Thus, the UWE-3 platform, and in particular the on-board data handling system and the electrical power system, offers one of the most efficient pico-satellite architectures launched in recent years and provides a solid basis for future extensions.
The in-orbit performance results of the pico-satellite UWE-3 are presented and summarize successful operations since its launch in 2013. Several software extensions and adaptations have been uploaded to UWE-3 increasing its capabilities. Thus, a very flexible platform for in-orbit software experiments and for evaluations of innovative concepts was provided and tested.
In unserem Alltag kommen wir heute ständig mit Systemen der Informations- und Kommunikationstechnik in Kontakt. Diese bestehen häufig aus mehreren interagierenden und kommunizierenden Komponenten, wie zum Beispiel nebenläufige Software zur effizienten Nutzung von Mehrkernprozessoren oder Sensornetzwerke. Systeme, die aus mehreren interagierenden und kommunizierenden Komponenten bestehen sind häufig komplex und dadurch sehr fehleranfällig. Daher ist es wichtig zuverlässige Methoden, die helfen die korrekte Funktionsweise solcher Systeme sicherzustellen, zu besitzen.
Im Rahmen dieser Doktorarbeit wurden neue Methoden zur Verbesserung der Verifizierbarkeit von asynchronen nebenläufigen Systemen durch Anwendung der symbolischen Modellprüfung mit binären Entscheidungsdiagrammen (BDDs) entwickelt. Ein asynchrones nebenläufiges System besteht aus mehreren Komponenten, von denen zu einem Zeitpunkt jeweils nur eine Komponente Transitionen ausführen kann. Die Modellprüfung ist eine Technik zur formalen Verifikation, bei der die Gültigkeit einer Menge von zu prüfenden Eigenschaften für eine gegebene Systembeschreibung automatisch durch Softwarewerkzeuge, die Modellprüfer genannt werden, entschieden wird. Das Hauptproblem der symbolischen Modellprüfung ist das Problem der Zustandsraumexplosion und es sind weitere Verbesserungen notwendig, um die symbolische Modellprüfung häufiger erfolgreich durchführen zu können.
Bei der BDD-basierten symbolischen Modellprüfung werden Mengen von Systemzuständen und Mengen von Transitionen jeweils durch BDDs repräsentiert. Zentrale Operationen bei ihr sind die Berechnung von Nachfolger- und Vorgängerzuständen von gegebenen Zustandsmengen, welche Bildberechnungen genannt werden. Um die Gültigkeit von Eigenschaften für eine gegebene Systembeschreibung zu überprüfen, werden wiederholt Bildberechnungen durchgeführt. Daher ist ihre effiziente Berechnung entscheidend für eine geringe Laufzeit und einen niedrigen Speicherbedarf der Modellprüfung. In einer Bildberechnung werden ein BDD zur Repräsentation einer Menge von Transitionen und ein BDD für eine Menge von Zuständen kombiniert, um eine Menge von Nachfolger- oder Vorgängerzuständen zu berechnen. Oft ist auch die Größe von BDDs zur Repräsentation der Transitionsrelation von Systemen entscheidend für die erfolgreiche Anwendbarkeit der Modellprüfung.
In der vorliegenden Arbeit werden neue Datenstrukturen zur Repräsentation der Transitionsrelation von asynchronen nebenläufigen Systemen bei der BDD-basierten symbolischen Modellprüfung vorgestellt. Zusätzlich werden neue Algorithmen zur Durchführung von Bildberechnungen präsentiert. Beides kann zu großen Reduktionen der Laufzeit und des Speicherbedarfs führen. Asynchrone nebenläufige Systeme besitzen häufig Symmetrien. Eine Technik zur Reduktion des Problems der Zustandsraumexplosion ist die Symmetriereduktion. In dieser Arbeit wird ebenfalls ein neuer effizienter Algorithmus zur Symmetriereduktion bei der symbolischen Modellprüfung mit BDDs aufgeführt.
BACKGROUND:
Although the repair of ventral abdominal wall hernias is one of the most commonly performed operations, many aspects of their treatment are still under debate or poorly studied. In addition, there is a lack of good definitions and classifications that make the evaluation of studies and meta-analyses in this field of surgery difficult.
MATERIALS AND METHODS:
Under the auspices of the board of the European Hernia Society and following the previously published classifications on inguinal and on ventral hernias, a working group was formed to create an online platform for registration and outcome measurement of operations for ventral abdominal wall hernias. Development of such a registry involved reaching agreement about clear definitions and classifications on patient variables, surgical procedures and mesh materials used, as well as outcome parameters. The EuraHS working group (European registry for abdominal wall hernias) comprised of a multinational European expert panel with specific interest in abdominal wall hernias. Over five working group meetings, consensus was reached on definitions for the data to be recorded in the registry.
RESULTS:
A set of well-described definitions was made. The previously reported EHS classifications of hernias will be used. Risk factors for recurrences and co-morbidities of patients were listed. A new severity of comorbidity score was defined. Post-operative complications were classified according to existing classifications as described for other fields of surgery. A new 3-dimensional numerical quality-of-life score, EuraHS-QoL score, was defined. An online platform is created based on the definitions and classifications, which can be used by individual surgeons, surgical teams or for multicentre studies. A EuraHS website is constructed with easy access to all the definitions, classifications and results from the database.
CONCLUSION:
An online platform for registration and outcome measurement of abdominal wall hernia repairs with clear definitions and classifications is offered to the surgical community. It is hoped that this registry could lead to better evidence-based guidelines for treatment of abdominal wall hernias based on hernia variables, patient variables, available hernia repair materials and techniques.
Mobile laser scanning puts high requirements on the accuracy of the positioning systems and the calibration of the measurement system. We present a novel algorithmic approach for calibration with the goal of improving the measurement accuracy of mobile laser scanners. We describe a general framework for calibrating mobile sensor platforms that estimates all configuration parameters for any arrangement of positioning sensors, including odometry. In addition, we present a novel semi-rigid Simultaneous Localization and Mapping (SLAM) algorithm that corrects the vehicle position at every point in time along its trajectory, while simultaneously improving the quality and precision of the entire acquired point cloud. Using this algorithm, the temporary failure of accurate external positioning systems or the lack thereof can be compensated for. We demonstrate the capabilities of the two newly proposed algorithms on a wide variety of datasets.
Today's Internet is no longer only controlled by a single stakeholder, e.g. a standard body or a telecommunications company.
Rather, the interests of a multitude of stakeholders, e.g. application developers, hardware vendors, cloud operators, and network operators, collide during the development and operation of applications in the Internet.
Each of these stakeholders considers different KPIs to be important and attempts to optimise scenarios in its favour.
This results in different, often opposing views and can cause problems for the complete network ecosystem.
One example of such a scenario are Signalling Storms in the mobile Internet, with one of the largest occurring in Japan in 2012 due to the release and high popularity of a free instant messaging application.
The network traffic generated by the application caused a high number of connections to the Internet being established and terminated.
This resulted in a similarly high number of signalling messages in the mobile network, causing overload and a loss of service for 2.5 million users over 4 hours.
While the network operator suffers the largest impact of this signalling overload, it does not control the application.
Thus, the network operator can not change the application traffic characteristics to generate less network signalling traffic.
The stakeholders who could prevent, or at least reduce, such behaviour, i.e. application developers or hardware vendors, have no direct benefit from modifying their products in such a way.
This results in a clash of interests which negatively impacts the network performance for all participants.
The goal of this monograph is to provide an overview over the complex structures of stakeholder relationships in today's Internet applications in mobile networks.
To this end, we study different scenarios where such interests clash and suggest methods where tradeoffs can be optimised for all participants.
If such an optimisation is not possible or attempts at it might lead to adverse effects, we discuss the reasons.
Graphs are a frequently used tool to model relationships among entities. A graph is a binary relation between objects, that is, it consists of a set of objects (vertices) and a set of pairs of objects (edges).
Networks are common examples of modeling data as a graph. For example, relationships between persons in a social network, or network links between computers in a telecommunication network can be represented by a graph.
The clearest way to illustrate the modeled data is to visualize the graphs. The field of Graph Drawing deals with the problem of finding algorithms to automatically generate graph visualizations. The task is to find a "good" drawing, which can be measured by different criteria such as number of crossings between edges or the used area. In this thesis, we study Angular Schematization in Graph Drawing. By this, we mean drawings
with large angles (for example, between the edges at common vertices or at crossing points).
The thesis consists of three parts. First, we deal with the placement of boxes. Boxes are axis-parallel rectangles that can, for example, contain text.
They can be placed on a map to label important sites, or can be used to describe semantic relationships between words in a word network. In the second part of the thesis, we consider graph drawings visually guide the
viewer. These drawings generally induce large angles between edges that meet at a vertex. Furthermore, the edges are drawn crossing-free and in a way that
makes them easy to follow for the human eye. The third and final part is devoted to crossings with large angles. In drawings with crossings, it is important to have large angles between edges at their crossing point, preferably right angles.
The development of ICT infrastructures has facilitated the emergence of new paradigms for looking at society and the environment over the last few years. Participatory environmental sensing, i.e. directly involving citizens in environmental monitoring, is one example, which is hoped to encourage learning and enhance awareness of environmental issues. In this paper, an analysis of the behaviour of individuals involved in noise sensing is presented. Citizens have been involved in noise measuring activities through the WideNoise smartphone application. This application has been designed to record both objective (noise samples) and subjective (opinions, feelings) data. The application has been open to be used freely by anyone and has been widely employed worldwide. In addition, several test cases have been organised in European countries. Based on the information submitted by users, an analysis of emerging awareness and learning is performed. The data show that changes in the way the environment is perceived after repeated usage of the application do appear. Specifically, users learn how to recognise different noise levels they are exposed to. Additionally, the subjective data collected indicate an increased user involvement in time and a categorisation effect between pleasant and less pleasant environments.
Recently, several backpack-mounted systems, also known as personal laser scanning systems, have been developed. They consist of laser scanners or cameras that are carried by a human operator to acquire measurements of the environment while walking. These systems were first designed to overcome the challenges of mapping indoor environments with doors and stairs. While the human operator inherently has the ability to open doors and to climb stairs, the flexible movements introduce irregularities of the trajectory to the system. To compete with other mapping systems, the accuracy of these systems has to be evaluated. In this paper, we present an extensive evaluation of our backpack mobile mapping system in indoor environments. It is shown that the system can deal with the normal human walking motion, but has problems with irregular jittering. Moreover, we demonstrate the applicability of the backpack in a suitable urban scenario.
Background
All international guidelines recommend perioperative antibiotic prophylaxis (PAB) should be routinely administered to patients undergoing cardiac surgery. However, the duration of PAB is heterogeneous and controversial.
Methods
Between 01.01.2011 and 31.12.2011, 1096 consecutive cardiac surgery patients were assigned to one of two groups receiving PAB with a second-generation cephalosporin for either 56 h (group I) or 32 h (group II). Patients’ characteristics, intraoperative data, and the in-hospital follow-up were analysed. Primary endpoint was the incidence of surgical site infection (deep and superficial sternal wound-, and vein harvesting site infection; DSWI/SSWI/VHSI). Secondary endpoints were the incidence of respiratory-, and urinary tract infection, as well as the mortality rate.
Results
615/1096 patients (56,1%) were enrolled (group I: n = 283 versus group II: n = 332). There were no significant differences with regard to patient characteristics, comorbidities, and procedure-related variables. No statistically significant differences were demonstrated concerning primary and secondary endpoints. The incidence of DSWI/SSWI/VHSI were 4/283 (1,4%), 5/283 (1,7%), and 1/283 (0,3%) in group I versus 6/332 (1,8%), 9/332 (2,7%), and 3/332 (0,9%) in group II (p = 0,76/0,59/0,63). In univariate analyses female gender, age, peripheral arterial obstructive disease, operating-time, ICU-duration, transfusion, and respiratory insufficiency were determinants for nosocomial infections (all ≤ 0,05). Subgroup analyses of these high-risk patients did not show any differences between the two regimes (all ≥ 0,05).
Conclusions
Reducing the duration of PAB from 56 h to 32 h in adult cardiac surgery patients was not associated with an increase of nosocomial infection rate, but contributes to reduce antibiotic resistance and health care costs.
Background
Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg.
Methods
A domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts.
Results
The final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f 1=.989 (micro average) and f 1=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout.
Conclusions
The developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports with very high precision and high recall on the majority of documents at the University Hospital of Würzburg. Extracted results populate a clinical data warehouse which supports clinical research.
Background
Although the repair of ventral abdominal wall hernias is one of the most commonly performed operations, many aspects of their treatment are still under debate or poorly studied. In addition, there is a lack of good definitions and classifications that make the evaluation of studies and meta-analyses in this field of surgery difficult.
Materials and methods
Under the auspices of the board of the European Hernia Society and following the previously published classifications on inguinal and on ventral hernias, a working group was formed to create an online platform for registration and outcome measurement of operations for ventral abdominal wall hernias. Development of such a registry involved reaching agreement about clear definitions and classifications on patient variables, surgical procedures and mesh materials used, as well as outcome parameters. The EuraHS working group (European registry for abdominal wall hernias) comprised of a multinational European expert panel with specific interest in abdominal wall hernias. Over five working group meetings, consensus was reached on definitions for the data to be recorded in the registry.
Results
A set of well-described definitions was made. The previously reported EHS classifications of hernias will be used. Risk factors for recurrences and co-morbidities of patients were listed. A new severity of comorbidity score was defined. Post-operative complications were classified according to existing classifications as described for other fields of surgery. A new 3-dimensional numerical quality-of-life score, EuraHS-QoL score, was defined. An online platform is created based on the definitions and classifications, which can be used by individual surgeons, surgical teams or for multicentre studies. A EuraHS website is constructed with easy access to all the definitions, classifications and results from the database.
Conclusion
An online platform for registration and outcome measurement of abdominal wall hernia repairs with clear definitions and classifications is offered to the surgical community. It is hoped that this registry could lead to better evidence-based guidelines for treatment of abdominal wall hernias based on hernia variables, patient variables, available hernia repair materials and techniques.
A binary tanglegram is a drawing of a pair of rooted binary trees whose leaf sets are in one-to-one correspondence; matching leaves are connected by inter-tree edges. For applications, for example, in phylogenetics, it is essential that both trees are drawn without edge crossings and that the inter-tree edges have as few crossings as possible. It is known that finding a tanglegram with the minimum number of crossings is NP-hard and that the problem is fixed-parameter tractable with respect to that number.
We prove that under the Unique Games Conjecture there is no constant-factor approximation for binary trees. We show that the problem is NP-hard even if both trees are complete binary trees. For this case we give an O(n 3)-time 2-approximation and a new, simple fixed-parameter algorithm. We show that the maximization version of the dual problem for binary trees can be reduced to a version of MaxCut for which the algorithm of Goemans and Williamson yields a 0.878-approximation.
Social interactions as introduced by Web 2.0 applications during the last decade have changed the way the Internet is used. Today, it is part of our daily lives to maintain contacts through social networks, to comment on the latest developments in microblogging services or to save and share information snippets such as photos or bookmarks online.
Social bookmarking systems are part of this development. Users can share links to interesting web pages by publishing bookmarks and providing descriptive keywords for them. The structure which evolves from the collection of annotated bookmarks is called a folksonomy. The sharing of interesting and relevant posts enables new ways of retrieving information from the Web. Users
can search or browse the folksonomy looking at resources related to specific tags or users. Ranking methods known from search engines have been adjusted to facilitate retrieval in social bookmarking systems. Hence, social bookmarking systems have become an alternative or addendum to search engines.
In order to better understand the commonalities and differences of social bookmarking systems and search engines, this thesis compares several aspects of the two systems' structure, usage behaviour and content. This includes the use of tags and query terms, the composition of the document collections and the rankings of bookmarks and search engine URLs. Searchers (recorded via session ids), their search terms and the clicked on URLs can be extracted from a search
engine query logfile. They form similar links as can be found in folksonomies where a user annotates a resource with tags. We use this analogy to build a tripartite hypergraph from query logfiles (a logsonomy), and compare structural and semantic properties of log- and folksonomies. Overall, we have found similar behavioural, structural and semantic characteristics in both systems. Driven by this insight, we investigate, if folksonomy data can be of use in web
information retrieval in a similar way to query log data: we construct training data from query logs and a folksonomy to build models for a learning-to-rank algorithm. First experiments show a positive correlation of ranking results generated from the ranking models of both systems. The research is based on various data collections from the social bookmarking systems BibSonomy and Delicious, Microsoft's search engine MSN (now Bing) and Google data.
To maintain social bookmarking systems as a good source for information retrieval, providers need to fight spam. This thesis introduces and analyses different features derived from the specific characteristics of social bookmarking systems to be used in spam detection classification algorithms. Best results can be derived from a combination of profile, activity, semantic and location-based features. Based on the experiments, a spam detection framework which identifies and eliminates spam activities for the social bookmarking system BibSonomy has been developed.
The storing and publication of user-related bookmarks and profile information raises questions about user data privacy. What kinds of personal information is collected and how do systems handle user-related items? In order to answer these questions, the thesis looks into the handling of data privacy in the social bookmarking system BibSonomy. Legal guidelines about how to deal with the private data collected and processed in social bookmarking systems are also presented. Experiments will show that the consideration of user data privacy in the process
of feature design can be a first step towards strengthening data privacy.
Der Einsatz von Multicore-Prozessoren in der industriellen Steuerungstechnik birgt sowohl Chancen als auch Risiken. Die vorliegende Dissertation entwickelt und bewertet aus diesem Grund generische Strategien zur Nutzung dieser Prozessorarchitektur unter Berücksichtigung der spezifischen Rahmenbedingungen und Anforderungen dieser Domäne.
Multicore-Prozessoren bieten die Chance zur Konsolidierung derzeit auf dedizierter Hardware ausgeführter heterogener Steuerungssubsysteme unter einer bisher nicht erreichbaren temporalen Isolation. In diesem Kontext definiert die vorliegende Dissertation die spezifischen Anforderungen, die eine integrierte Ausführung in der Domäne der industriellen Automatisierung erfüllen muss. Eine Vorbedingung für ein derartiges Szenario stellt allerdings der Einsatz einer geeigneten Konsolidierungslösung dar. Mit einem virtualisierten und einem hybriden Konsolidierungsansatz werden deshalb zwei repräsentative Lösungen für die Domäne eingebetteter Systeme vorgestellt, die schließlich hinsichtlich der zuvor definierten Kriterien evaluiert werden.
Da die Taktraten von Prozessoren physikalische Grenzen erreicht haben, werden sich in der Steuerungstechnik signifikante Performanzsteigerungen zukünftig nur durch den Einsatz von Multicore-Prozessoren erzielen lassen. Dies hat zur Vorbedingung, dass die Firmware die Parallelität dieser Prozessorarchitektur in geeigneter Weise zu nutzen vermag. Leider entstehen bei der Parallelisierung eines komplexen Systems wie einer Automatisierungs-Firmware im Allgemeinen signifikante Aufwände. Infolgedessen sollten diesbezügliche Entscheidungen nur auf Basis einer objektiven Abwägung potentieller Alternativen getroffen werden. Allerdings macht die Systemkomplexität eine Abschätzung der durch eine spezifische parallele Firmware-Architektur zu erwartenden Performanz zu einer anspruchsvollen Aufgabe. Dies gilt vor allem, da eine Parallelisierung gefordert wird, die für eine Vielzahl von Lastszenarien in Form gesteuerter Maschinen geeignet ist. Aus diesem Grund spezifiziert die vorliegende Dissertation eine anwendungsorientierte Methode zur Unterstützung von Entwurfsentscheidungen, die bei der Migration einer bestehenden Singlecore-Firmware auf eine homogene Multicore-Architektur zu treffen sind. Dies wird erreicht, indem in automatisierter Weise geeignete Firmware-Modelle auf Basis von dynamischem Profiling der Firmware unter mehreren repräsentativen Lastszenarien erstellt werden. Im Anschluss daran werden diese Modelle um das Expertenwissen von Firmware-Entwicklern erweitert, bevor mittels multikriterieller genetischer Algorithmen der Entwurfsraum der Parallelisierungsalternativen exploriert wird. Schließlich kann eine spezifische Lösung der auf diese Weise hergeleiteten Pareto-Front auf Basis ihrer Bewertungsmetriken zur Implementierung durch einen Entwickler ausgewählt werden. Die vorliegende Arbeit schließt mit einer Fallstudie, welche die zuvor beschriebene Methode auf eine numerische Steuerungs-Firmware anwendet und dabei deren Potential für eine umfassende Unterstützung einer Firmware-Parallelisierung aufzeigt.
Within this thesis a new philosophy in monitoring spacecrafts is presented: the
unification of the various kinds of monitoring techniques used during the
different lifecylce phases of a spacecraft.
The challenging requirements being set for this monitoring framework are:
- "separation of concerns" as a design principle (dividing the steps of logging
from registered sources, sending to connected sinks and displaying of
information),
- usage during all mission phases,
- usage by all actors (EGSE engineers, groundstation operators, etc.),
- configurable at runtime, especially regarding the level of detail of logging
information, and
- very low resource consumption.
First a prototype of the monitoring framework was developed as a support library
for the real-time operating system
RODOS. This prototype was tested on dedicated hardware platforms relevant for
space, and also on a satellite demonstrator used for educational purposes.
As a second step, the results and lessons learned from the development and usage
of this prototype were transfered to a real space mission: the first satellite
of the DLR compact satellite series - a space based platform for DLR's own
research activities. Within this project, the software of the avionic subsystem
was supplemented by a powerful logging component, which enhances the traditional
housekeeping capabilities and offers extensive filtering and debugging
techniques for monitoring and FDIR needs. This logging component is the major
part of the flight version of the monitoring framework. It is completed by
counterparts running on the development computers and as well as the EGSE
hardware in the integration room, making it most valuable already in the
earliest stages of traditional spacecraft development.
Future plans in terms of adding support from the groundstation as well will lead
to a seamless integration of the monitoring framework not only into to the
spacecraft itself, but into the whole space system.
The general map-labeling problem is as follows: given a set of geometric objects to be labeled, or features, in the plane, and for each feature a set of label positions, maximize the number of placed labels such that there is at most one label per feature and no two labels overlap. There are three types of features in a map: point, line, and area features. Unfortunately, one cannot expect to find efficient algorithms that solve the labeling problem optimally.
Interactive maps are digital maps that only show a small part of the entire map whereas the user can manipulate the shown part, the view, by continuously panning, zooming, rotating, and tilting (that is, changing the perspective between a top and a bird view). An example for the application of interactive maps is in navigational devices. Interactive maps are challenging in that the labeling must be updated whenever labels leave the view and, while zooming, the label size must be constant on the screen (which either makes space for further labels or makes labels overlap when zooming in or out, respectively). These updates must be computed in real time, that is, the computation must be so fast that the user does not notice that we spend time on the computation. Additionally, labels must not jump or flicker, that is, labels must not suddenly change their positions or, while zooming out, a vanished label must not appear again.
In this thesis, we present efficient algorithms that dynamically label point and line features in interactive maps. We try to label as many features as possible while we prohibit labels that overlap, jump, and flicker. We have implemented all our approaches and tested them on real-world data. We conclude that our algorithms are indeed real-time capable.
Knowledge-based systems (KBS) face an ever-increasing interest in various disciplines and contexts. Yet, the former aim to construct the ’perfect intelligent software’ continuously shifts to user-centered, participative solutions. Such systems enable users to contribute their personal knowledge to the problem solving process for increased efficiency and an ameliorated user experience. More precisely, we define non-functional key requirements of participative KBS as: Transparency (encompassing KBS status mediation), configurability (user adaptability, degree of user control/exploration), quality of the KB and UI, and evolvability (enabling the KBS to grow mature with their users). Many of those requirements depend on the respective target users, thus calling for a more user-centered development. Often, also highly expertise domains are targeted — inducing highly complex KBs — which requires a more careful and considerate UI/interaction design. Still, current KBS engineering (KBSE) approaches mostly focus on knowledge acquisition (KA) This often leads to non-optimal, little reusable, and non/little evaluated KBS front-end solutions.
In this thesis we propose a more encompassing KBSE approach. Due to the strong mutual influences between KB and UI, we suggest a novel form of intertwined UI and KB development. We base the approach on three core components for encompassing KBSE:
(1) Extensible prototyping, a tailored form of evolutionary prototyping; this builds on mature UI prototypes and offers two extension steps for the anytime creation of core KBS prototypes (KB + core UI) and fully productive KBS (core KBS prototype + common framing functionality). (2) KBS UI patterns, that define reusable solutions for the core KBS UI/interaction; we provide a basic collection of such patterns in this work. (3) Suitable usability instruments for the assessment of the KBS artifacts. Therewith, we do not strive for ’yet another’ self-contained KBS engineering methodology. Rather, we motivate to extend existing approaches by the proposed key components. We demonstrate this based on an agile KBSE model.
For practical support, we introduce the tailored KBSE tool ProKEt. ProKEt offers a basic selection of KBS core UI patterns and corresponding configuration options out of the box; their further adaption/extension is possible on various levels of expertise. For practical usability support, ProKEt offers facilities for quantitative and qualitative data collection. ProKEt explicitly fosters the suggested, intertwined development of UI and KB. For seamlessly integrating KA activities, it provides extension points for two selected external KA tools: For KnowOF, a standard office based KA environment. And for KnowWE, a semantic wiki for collaborative KA. Therewith, ProKEt offers powerful support for encompassing, user-centered KBSE.
Finally, based on the approach and the tool, we also developed a novel KBS type: Clarification KBS as a mashup of consultation and justification KBS modules. Those denote a specifically suitable realization for participative KBS in highly expertise contexts and consequently require a specific design. In this thesis, apart from more common UI solutions, we particularly also introduce KBS UI patterns especially tailored towards Clarification KBS.
The present paper describes an improved 4 DOF (x/y/z/yaw) vision based positioning solution for fully 6 DOF autonomous UAVs, optimised in terms of computation and development costs as well as robustness and performance. The positioning system combines Fourier transform-based image registration (Fourier Tracking) and differential optical flow computation to overcome the drawbacks of a single approach. The first method is capable of recognizing movement in four degree of freedom under variable lighting conditions, but suffers from low sample rate and high computational costs. Differential optical flow computation, on the other hand, enables a very high sample rate to gain control robustness. This method, however, is limited to translational movement only and performs poor in bad lighting conditions. A reliable positioning system for autonomous flights with free heading is obtained by fusing both techniques. Although the vision system can measure the variable altitude during flight, infrared and ultrasonic sensors are used for robustness. This work is part of the AQopterI8 project, which aims to develop an autonomous flying quadrocopter for indoor application and makes autonomous directed flight possible.
A number of public codes exist for GPS positioning and baseline determination in off-line mode. However, no software code exists for DGPS exploiting correction factors at base stations, without relying on double difference information. In order to accomplish it, a methodology is introduced in MATLAB environment for DGPS using C/A pseudoranges on single frequency L1 only to make it feasible for low-cost GPS receivers. Our base station is at accurately surveyed reference point. Pseudoranges and geometric ranges are compared at base station to compute the correction factors. These correction factors are then handed over to rover for all valid satellites observed during an epoch. The rover takes it into account for its own true position determination for corresponding epoch. In order to validate the proposed algorithm, our rover is also placed at a pre-determined location. The proposed code is an appropriate and simple to use tool for post-processing of GPS raw data for accurate position determination of a rover e.g. Unmanned Aerial Vehicle during post-mission analysis.
At the center of the Internet’s protocol stack stands the Internet Protocol (IP) as a common denominator that enables all communication. To make routing efficient, resilient, and scalable, several aspects must be considered. Care must be taken that traffic is well balanced to make efficient use of the existing network resources, both in failure free operation and in failure scenarios.
Finding the optimal routing in a network is an NP-complete problem. Therefore, routing optimization is usually performed using heuristics. This dissertation shows that a routing optimized with one objective function is often not good when looking at other objective functions. It can even be worse than unoptimized routing with respect to that objective function. After looking at failure-free routing and traffic distribution in different failure scenarios, the analysis is extended to include the loop-free alternate (LFA) IP fast reroute mechanism. Different application scenarios of LFAs are examined and a special focus is set on the fact that LFAs usually cannot protect all traffic in a network even against single link failures. Thus, the routing optimization for LFAs is targeted on both link utilization and failure coverage. Finally, the pre-congestion notification mechanism PCN for network admission control and overload protection is analyzed and optimized. Different design options for implementing the protocol are compared, before algorithms are developed for the calculation and optimization of protocol parameters and PCN-based routing.
The second part of the thesis tackles a routing problem that can only be resolved on a global scale. The scalability of the Internet is at risk since a major and intensifying growth of the interdomain routing tables has been observed. Several protocols and architectures are analyzed that can be used to make interdomain routing more scalable. The most promising approach is the locator/identifier (Loc/ID) split architecture which separates routing from host identification. This way, changes in connectivity, mobility of end hosts, or traffic-engineering activities are hidden from the routing in the core of the Internet and the routing tables can be kept much smaller. All of the currently proposed Loc/ID split approaches have their downsides. In particular, the fact that most architectures use the ID for routing outside the Internet’s core is a poor design, which inhibits many of the possible features of a new routing architecture. To better understand the problems and to provide a solution for a scalable routing design that implements a true Loc/ID split, the new GLI-Split protocol is developed in this thesis, which provides separation of global and local routing and uses an ID that is independent from any routing decisions.
Besides GLI-Split, several other new routing architectures implementing Loc/ID split have been proposed for the Internet. Most of them assume that a mapping system is queried for EID-to-RLOC mappings by an intermediate node at the border of an edge network. When the mapping system is queried by an intermediate node, packets are already on their way towards their destination, and therefore, the mapping system must be fast, scalable, secure, resilient, and should be able to relay packets without locators to nodes that can forward them to the correct destination. The dissertation develops a classification for all proposed mapping system architectures and shows their similarities and differences. Finally, the fast two-level mapping system FIRMS is developed. It includes security and resilience features as well as a relay service for initial packets of a flow when intermediate nodes encounter a cache miss for the EID-to-RLOC mapping.
A simple test setup has been developed at Institute of Aerospace Information Technology, University of Würzburg, Germany to realize basic functionalities for formation flight of quadrocopters. The test environment is planned to be utilized for developing and validating the algorithms for formation flying capability in real environment as well as for education purpose. An already existing test bed for single quadrocopter was extended with necessary inter-communication and distributed control mechanism to test the algorithms for formation flights in 2 degrees of freedom (roll / pitch). This study encompasses the domain of communication, control engineering and embedded systems programming. Bluetooth protocol has been used for inter-communication between two quadrocopters. A simple approach of PID control in combination with Kalman filter has been exploited. MATLAB Instrument Control Toolbox has been used for data display, plotting and analysis. Plots can be drawn in real-time and received information can also be stored in the form of files for later use and analysis. The test setup has been developed indigenously and at considerably low cost. Emphasis has been placed on simplicity to facilitate students learning process. Several lessons have been learnt during the course of development of this setup. Proposed setup is quite flexible that can be modified as per changing requirements.
Mini Unmanned Aerial Vehicles (MUAVs) are becoming popular research platform and drawing considerable attention, particularly during the last decade due to their multi-dimensional applications in almost every walk of life. MUAVs range from simple toys found at electronic supermarkets for entertainment purpose to highly sophisticated commercial platforms performing novel assignments like offshore wind power station inspection and 3D modelling of buildings. This paper presents an overview of the main aspects in the domain of distributed control of cooperating MUAVs to facilitate the potential users in this fascinating field. Furthermore it gives an overview on state of the art in MUAV technologies e.g. Photonic Mixer Devices (PMD) camera, distributed control methods and on-going work and challenges, which is the motivation for many researchers all over the world to work in this field.
Large volumes of data are collected today in many domains. Often, there is so much data available, that it is difficult to identify the relevant pieces of information. Knowledge discovery seeks to obtain novel, interesting and useful information from large datasets.
One key technique for that purpose is subgroup discovery. It aims at identifying descriptions for subsets of the data, which have an interesting distribution with respect to a predefined target concept. This work improves the efficiency and effectiveness of subgroup discovery in different directions.
For efficient exhaustive subgroup discovery, algorithmic improvements are proposed for three important variations of the standard setting: First, novel optimistic estimate bounds are derived for subgroup discovery with numeric target concepts. These allow for skipping the evaluation of large parts of the search space without influencing the results. Additionally, necessary adaptations to data structures for this setting are discussed. Second, for exceptional model mining, that is, subgroup discovery with a model over multiple attributes as target concept, a generic extension of the well-known FP-tree data structure is introduced. The modified data structure stores intermediate condensed data representations, which depend on the chosen model class, in the nodes of the trees. This allows the application for many popular model classes. Third, subgroup discovery with generalization-aware measures is investigated.
These interestingness measures compare the target share or mean value in the subgroup with the respective maximum value in all its generalizations. For this setting, a novel method for deriving optimistic estimates is proposed. In contrast to previous approaches, the novel measures are not exclusively based on the anti-monotonicity of instance coverage, but also takes the difference of coverage between the subgroup and its generalizations into account. In all three areas, the advances lead to runtime improvements of more than an order of magnitude.
The second part of the contributions focuses on the \emph{effectiveness} of subgroup discovery. These improvements aim to identify more interesting subgroups in practical applications. For that purpose, the concept of expectation-driven subgroup discovery is introduced as a new family of interestingness measures. It computes the score of a subgroup based on the difference between the actual target share and the target share that could be expected given the statistics for the separate influence factors that are combined to describe the subgroup.
In doing so, previously undetected interesting subgroups are discovered, while other, partially redundant findings are suppressed.
Furthermore, this work also approaches practical issues of subgroup discovery: In that direction, the VIKAMINE II tool is presented, which extends its predecessor with a rebuild user interface, novel algorithms for automatic discovery, new interactive mining techniques, as well novel options for result presentation and introspection. Finally, some real-world applications are described that utilized the presented techniques. These include the identification of influence factors on the success and satisfaction of university students and the description of locations using tagging data of geo-referenced images.