Refine
Has Fulltext
- yes (381)
Year of publication
Document Type
- Doctoral Thesis (165)
- Journal article (147)
- Working Paper (40)
- Conference Proceeding (11)
- Report (7)
- Master Thesis (6)
- Bachelor Thesis (3)
- Book (1)
- Study Thesis (term paper) (1)
Language
- English (342)
- German (38)
- Multiple languages (1)
Keywords
- Leistungsbewertung (29)
- virtual reality (19)
- Datennetz (14)
- Quality of Experience (12)
- Netzwerk (10)
- Robotik (10)
- machine learning (9)
- Kleinsatellit (8)
- Modellierung (8)
- Simulation (8)
Institute
- Institut für Informatik (381) (remove)
Schriftenreihe
Sonstige beteiligte Institutionen
- Cologne Game Lab (3)
- Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Raumfahrtsysteme (2)
- Open University of the Netherlands (2)
- Siemens AG (2)
- Zentrum für Telematik e.V. (2)
- Airbus Defence and Space GmbH (1)
- Beuth Hochschule für Technik Berlin (1)
- Birmingham City University (1)
- California Institute of Technology (1)
- DLR (1)
In recent years, great progress has been made in the area of Artificial Intelligence (AI) due to the possibilities of Deep Learning which steadily yielded new state-of-the-art results especially in many image recognition tasks.
Currently, in some areas, human performance is achieved or already exceeded.
This great development already had an impact on the area of Optical Music Recognition (OMR) as several novel methods relying on Deep Learning succeeded in specific tasks.
Musicologists are interested in large-scale musical analysis and in publishing digital transcriptions in a collection enabling to develop tools for searching and data retrieving.
The application of OMR promises to simplify and thus speed-up the transcription process by either providing fully-automatic or semi-automatic approaches.
This thesis focuses on the automatic transcription of Medieval music with a focus on square notation which poses a challenging task due to complex layouts, highly varying handwritten notations, and degradation.
However, since handwritten music notations are quite complex to read, even for an experienced musicologist, it is to be expected that even with new techniques of OMR manual corrections are required to obtain the transcriptions.
This thesis presents several new approaches and open source software solutions for layout analysis and Automatic Text Recognition (ATR) for early documents and for OMR of Medieval manuscripts providing state-of-the-art technology.
Fully Convolutional Networks (FCN) are applied for the segmentation of historical manuscripts and early printed books, to detect staff lines, and to recognize neume notations.
The ATR engine Calamari is presented which allows for ATR of early prints and also the recognition of lyrics.
Configurable CNN/LSTM-network architectures which are trained with the segmentation-free CTC-loss are applied to the sequential recognition of text but also monophonic music.
Finally, a syllable-to-neume assignment algorithm is presented which represents the final step to obtain a complete transcription of the music.
The evaluations show that the performances of any algorithm is highly depending on the material at hand and the number of training instances.
The presented staff line detection correctly identifies staff lines and staves with an $F_1$-score of above $99.5\%$.
The symbol recognition yields a diplomatic Symbol Accuracy Rate (dSAR) of above $90\%$ by counting the number of correct predictions in the symbols sequence normalized by its length.
The ATR of lyrics achieved a Character Error Rate (CAR) (equivalently the number of correct predictions normalized by the sentence length) of above $93\%$ trained on 771 lyric lines of Medieval manuscripts and of 99.89\% when training on around 3.5 million lines of contemporary printed fonts.
The assignment of syllables and their corresponding neumes reached $F_1$-scores of up to $99.2\%$.
A direct comparison to previously published performances is difficult due to different materials and metrics.
However, estimations show that the reported values of this thesis exceed the state-of-the-art in the area of square notation.
A further goal of this thesis is to enable musicologists without technical background to apply the developed algorithms in a complete workflow by providing a user-friendly and comfortable Graphical User Interface (GUI) encapsulating the technical details.
For this purpose, this thesis presents the web-application OMMR4all.
Its fully-functional workflow includes the proposed state-of-the-art machine-learning algorithms and optionally allows for a manual intervention at any stage to correct the output preventing error propagation.
To simplify the manual (post-) correction, OMMR4all provides an overlay-editor that superimposes the annotations with a scan of the original manuscripts so that errors can easily be spotted.
The workflow is designed to be iteratively improvable by training better models as soon as new Ground Truth (GT) is available.
Failure prediction is an important aspect of self-aware computing systems. Therefore, a multitude of different approaches has been proposed in the literature over the past few years. In this work, we propose a taxonomy for organizing works focusing on the prediction of Service Level Objective (SLO) failures. Our taxonomy classifies related work along the dimensions of the prediction target (e.g., anomaly detection, performance prediction, or failure prediction), the time horizon (e.g., detection or prediction, online or offline application), and the applied modeling type (e.g., time series forecasting, machine learning, or queueing theory). The classification is derived based on a systematic mapping of relevant papers in the area. Additionally, we give an overview of different techniques in each sub-group and address remaining challenges in order to guide future research.
In the present day, unmanned aerial vehicles become seemingly more popular every year, but, without regulation of the increasing number of these vehicles, the air space could become chaotic and uncontrollable. In this work, a framework is proposed to combine self-aware computing with multirotor formations to address this problem. The self-awareness is envisioned to improve the dynamic behavior of multirotors. The formation scheme that is implemented is called platooning, which arranges vehicles in a string behind the lead vehicle and is proposed to bring order into chaotic air space. Since multirotors define a general category of unmanned aerial vehicles, the focus of this thesis are quadcopters, platforms with four rotors. A modification for the LRA-M self-awareness loop is proposed and named Platooning Awareness. The implemented framework is able to offer two flight modes that enable waypoint following and the self-awareness module to find a path through scenarios, where obstacles are present on the way, onto a goal position. The evaluation of this work shows that the proposed framework is able to use self-awareness to learn about its environment, avoid obstacles, and can successfully move a platoon of drones through multiple scenarios.
An Intelligent Semi-Automatic Workflow for Optical Character Recognition of Historical Printings
(2020)
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years great progress has been made in the area of historical OCR resulting in several powerful open-source tools for preprocessing, layout analysis and segmentation, Automatic Text Recognition (ATR) and postcorrection. Their major drawback is that they only offer limited applicability by non-technical users like humanist scholars, in particular when it comes to the combined use of several tools in a workflow. Furthermore, depending on the material, these tools are usually not able to fully automatically achieve sufficiently low error rates, let alone perfect results, creating a demand for an interactive postcorrection functionality which, however, is generally not incorporated.
This thesis addresses these issues by presenting an open-source OCR software called OCR4all which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow. While a variety of materials can already be processed fully automatically, books with more complex layouts require manual intervention by the users. This is mostly due to the fact that the required Ground Truth (GT) for training stronger mixed models (for segmentation as well as text recognition) is not available, yet, neither in the desired quantity nor quality.
To deal with this issue in the short run, OCR4all offers better recognition capabilities in combination with a very comfortable Graphical User Interface (GUI) that allows error corrections not only in the final output, but already in early stages to minimize error propagation. In the long run this constant manual correction produces large quantities of valuable, high quality training material which can be used to improve fully automatic approaches. Further on, extensive configuration capabilities are provided to set the degree of automation of the workflow and to make adaptations to the carefully selected default parameters for specific printings, if necessary. The architecture of OCR4all allows for an easy integration (or substitution) of newly developed tools for its main components by supporting standardized interfaces like PageXML, thus aiming at continual higher automation for historical printings.
In addition to OCR4all, several methodical extensions in the form of accuracy improving techniques for training and recognition are presented. Most notably an effective, sophisticated, and adaptable voting methodology using a single ATR engine, a pretraining procedure, and an Active Learning (AL) component are proposed. Experiments showed that combining pretraining and voting significantly improves the effectiveness of book-specific training, reducing the obtained Character Error Rates (CERs) by more than 50%.
The proposed extensions were further evaluated during two real world case studies: First, the voting and pretraining techniques are transferred to the task of constructing so-called mixed models which are trained on a variety of different fonts. This was done by using 19th century Fraktur script as an example, resulting in a considerable improvement over a variety of existing open-source and commercial engines and models. Second, the extension from ATR on raw text to the adjacent topic of typography recognition was successfully addressed by thoroughly indexing a historical lexicon that heavily relies on different font types in order to encode its complex semantic structure.
During the main experiments on very complex early printed books even users with minimal or no experience were able to not only comfortably deal with the challenges presented by the complex layout, but also to recognize the text with manageable effort and great quality, achieving excellent CERs below 0.5%. Furthermore, the fully automated application on 19th century novels showed that OCR4all (average CER of 0.85%) can considerably outperform the commercial state-of-the-art tool ABBYY Finereader (5.3%) on moderate layouts if suitably pretrained mixed ATR models are available.
Recent advances in Natural Language Preprocessing (NLP) allow for a fully automatic extraction of character networks for an incoming text. These networks serve as a compact and easy to grasp representation of literary fiction. They offer an aggregated view of the text, which can be used during distant reading approaches for the analysis of literary hypotheses. In their core, the networks consist of nodes, which represent literary characters, and edges, which represent relations between characters. For an automatic extraction of such a network, the first step is the detection of the references of all fictional entities that are of importance for a text. References to the fictional entities appear in the form of names, noun phrases and pronouns and prior to this work, no components capable of automatic detection of character references were available. Existing tools are only capable of detecting proper nouns, a subset of all character references. When evaluated on the task of detecting proper nouns in the domain of literary fiction, they still underperform at an F1-score of just about 50%. This thesis uses techniques from the field of semi-supervised learning, such as Distant supervision and Generalized Expectations, and improves the results of an existing tool to about 82%, when evaluated on all three categories in literary fiction, but without the need for annotated data in the target domain. However, since this quality is still not sufficient, the decision to annotate DROC, a corpus comprising 90 fragments of German novels was made. This resulted in a new general purpose annotation environment titled as ATHEN, as well as annotated data that spans about 500.000 tokens in total. Using this data, the combination of supervised algorithms and a tailored rule based algorithm, which in combination are able to exploit both - local consistencies as well as global consistencies - yield an algorithm with an F1-score of about 93%. This component is referred to as the Kallimachos tagger.
A character network can not directly display references however, instead they need to be clustered so that all references that belong to a real world or fictional entity are grouped together. This process widely known as coreference resolution is a hard problem in the focus of research for more than half a century. This work experimented with adaptations of classical feature based machine learning, with a dedicated rule based algorithm and with modern techniques of Deep Learning, but no approach can surpass 55% B-Cubed F1, when evaluated on DROC. Due to this barrier, many researchers do not use a fully-fledged coreference resolution when they extract character networks, but only focus on a more forgiving subset- the names. For novels such as Alice's Adventures in Wonderland by Lewis Caroll, this would however only result in a network in which many important characters are missing. In order to integrate important characters into the network that are not named by the author, this work makes use of automatic detection of speaker and addressees for direct speech utterances (all entities involved in a dialog are considered to be of importance). This problem is by itself not an easy task, however the most successful system analysed in this thesis is able to correctly determine the speaker to about 85% of the utterances as well as about 65% of the addressees. This speaker information can not only help to identify the most dominant characters, but also serves as a way to model the relations between entities.
During the span of this work, components have been developed to model relations between characters using speaker attribution, using co-occurrences as well as by the usage of true interactions, for which yet again a dataset was annotated using ATHEN. Furthermore, since relations between characters are usually typed, a component for the extraction of a typed relation was developed. Similar to the experiments for the character reference detection, a combination of a rule based and a Maximum Entropy classifier yielded the best overall results, with the extraction of family relations showing a score of about 80% and the quality of love relations with a score of about 50%. For family relations, a kernel for a Support Vector Machine was developed that even exceeded the scores of the combined approach but is behind on the other labels.
In addition, this work presents new ways to evaluate automatically extracted networks without the need of domain experts, instead it relies on the usage of expert summaries. It also refrains from the uses of social network analysis for the evaluation, but instead presents ranked evaluations using Precision@k and the Spearman Rank correlation coefficient for the evaluation of the nodes and edges of the network. An analysis using these metrics showed, that the central characters of a novel are contained with high probability but the quality drops rather fast if more than five entities are analyzed. The quality of the edges is mainly dominated by the quality of the coreference resolution and the correlation coefficient between gold edges and system edges therefore varies between 30 and 60%.
All developed components are aggregated alongside a large set of other preprocessing modules in the Kallimachos pipeline and can be reused without any restrictions.
The DFG project “SDN-enabled Application-aware Network Control Architectures and their Performance Assessment” (DFG SDN-App) focused in phase 1 (Jan 2017 – Dec 2019) on software defined networking (SDN). Being a fundamental paradigm shift, SDN enables a remote control of networking devices made by different vendors from a logically centralized controller. In principle, this enables a more dynamic and flexible management of network resources compared to the traditional legacy networks. Phase 1 focused on multimedia applications and their users’ Quality of Experience (QoE).
This documents reports the achievements of the first phase (Jan 2017 – Dec 2019), which is jointly carried out by the Technical University of Munich, Technical University of Berlin, and University of Würzburg. The project started at the institutions in Munich and Würzburg in January 2017 and lasted until December 2019.
In Phase 1, the project targeted the development of fundamental control mechanisms for network-aware application control and application-aware network control in Software Defined Networks (SDN) so to enhance the user perceived quality (QoE). The idea is to leverage the QoE from multiple applications as control input parameter for application-and network control mechanisms. These mechanisms are implemented by an Application Control Plane (ACP) and a Network Control Plane (NCP). In order to obtain a global view of the current system state, applications and network parameters are monitored and communicated to the respective control plane interface. Network and application information and their demands are exchanged between the control planes so to derive appropriate control actions. To this end, a methodology is developed to assess the application performance and in particular the QoE. This requires an appropriate QoE modeling of the applications considered in the project as well as metrics like QoE fairness to be utilized within QoE management.
In summary, the application-network interaction can improve the QoE for multi-application scenarios. This is ensured by utilizing information from the application layer, which are mapped by appropriate QoS-QoE models to QoE within a network control plane. On the other hand, network information is monitored and communicated to the application control plane. Network and application information and their demands are exchanged between the control planes so to derive appropriate control actions.
Von technischen Systemen wird in der heutigen Zeit erwartet, dass diese stets fehlerfrei funktionieren, um einen reibungslosen Ablauf des Alltags zu gewährleisten. Technische Systeme jedoch können Defekte aufweisen, die deren Funktionsweise einschränken oder zu deren Totalausfall führen können. Grundsätzlich zeigen sich Defekte durch eine Veränderung im Verhalten von einzelnen Komponenten. Diese Abweichungen vom Nominalverhalten nehmen dabei an Intensität zu, je näher die entsprechende Komponente an einem Totalausfall ist. Aus diesem Grund sollte das Fehlverhalten von Komponenten rechtzeitig erkannt werden, um permanenten Schaden zu verhindern. Von besonderer Bedeutung ist dies für die Luft- und Raumfahrt. Bei einem Satelliten kann keine Wartung seiner Komponenten durchgeführt werden, wenn er sich bereits im Orbit befindet. Der Defekt einer einzelnen Komponente, wie der Batterie der Energieversorgung, kann hierbei den Verlust der gesamten Mission bedeuten. Grundsätzlich lässt sich Fehlererkennung manuell durchführen, wie es im Satellitenbetrieb oft üblich ist. Hierfür muss ein menschlicher Experte, ein sogenannter Operator, das System überwachen. Diese Form der Überwachung ist allerdings stark von der Zeit, Verfügbarkeit und Expertise des Operators, der die Überwachung durchführt, abhängig. Ein anderer Ansatz ist die Verwendung eines dedizierten Diagnosesystems. Dieses kann das technische System permanent überwachen und selbstständig Diagnosen berechnen. Die Diagnosen können dann durch einen Experten eingesehen werden, der auf ihrer Basis Aktionen durchführen kann. Das in dieser Arbeit vorgestellte modellbasierte Diagnosesystem verwendet ein quantitatives Modell eines technischen Systems, das dessen Nominalverhalten beschreibt. Das beobachtete Verhalten des technischen Systems, gegeben durch Messwerte, wird mit seinem erwarteten Verhalten, gegeben durch simulierte Werte des Modells, verglichen und Diskrepanzen bestimmt. Jede Diskrepanz ist dabei ein Symptom. Diagnosen werden dadurch berechnet, dass zunächst zu jedem Symptom eine sogenannte Konfliktmenge berechnet wird. Dies ist eine Menge von Komponenten, sodass der Defekt einer dieser Komponenten das entsprechende Symptom erklären könnte. Mithilfe dieser Konfliktmengen werden sogenannte Treffermengen berechnet. Eine Treffermenge ist eine Menge von Komponenten, sodass der gleichzeitige Defekt aller Komponenten dieser Menge alle beobachteten Symptome erklären könnte. Jede minimale Treffermenge entspricht dabei einer Diagnose. Zur Berechnung dieser Mengen nutzt das Diagnosesystem ein Verfahren, bei dem zunächst abhängige Komponenten bestimmt werden und diese von symptombehafteten Komponenten belastet und von korrekt funktionierenden Komponenten entlastet werden. Für die einzelnen Komponenten werden Bewertungen auf Basis dieser Be- und Entlastungen berechnet und mit ihnen Diagnosen gestellt. Da das Diagnosesystem auf ausreichend genaue Modelle angewiesen ist und die manuelle Kalibrierung dieser Modelle mit erheblichem Aufwand verbunden ist, wurde ein Verfahren zur automatischen Kalibrierung entwickelt. Dieses verwendet einen Zyklischen Genetischen Algorithmus, um mithilfe von aufgezeichneten Werten der realen Komponenten Modellparameter zu bestimmen, sodass die Modelle die aufgezeichneten Daten möglichst gut reproduzieren können. Zur Evaluation der automatischen Kalibrierung wurden ein Testaufbau und verschiedene dynamische und manuell schwierig zu kalibrierende Komponenten des Qualifikationsmodells eines realen Nanosatelliten, dem SONATE-Nanosatelliten modelliert und kalibriert. Der Testaufbau bestand dabei aus einem Batteriepack, einem Laderegler, einem Tiefentladeschutz, einem Entladeregler, einem Stepper Motor HAT und einem Motor. Er wurde zusätzlich zur automatischen Kalibrierung unabhängig manuell kalibriert. Die automatisch kalibrierten Satellitenkomponenten waren ein Reaktionsrad, ein Entladeregler, Magnetspulen, bestehend aus einer Ferritkernspule und zwei Luftspulen, eine Abschlussleiterplatine und eine Batterie. Zur Evaluation des Diagnosesystems wurde die Energieversorgung des Qualifikationsmodells des SONATE-Nanosatelliten modelliert. Für die Batterien, die Entladeregler, die Magnetspulen und die Reaktionsräder wurden die vorher automatisch kalibrierten Modelle genutzt. Für das Modell der Energieversorgung wurden Fehler simuliert und diese diagnostiziert. Die Ergebnisse der Evaluation der automatischen Kalibrierung waren, dass die automatische Kalibrierung eine mit der manuellen Kalibrierung vergleichbare Genauigkeit für den Testaufbau lieferte und diese sogar leicht übertraf und dass die automatisch kalibrierten Satellitenkomponenten eine durchweg hohe Genauigkeit aufwiesen und damit für den Einsatz im Diagnosesystem geeignet waren. Die Ergebnisse der Evaluation des Diagnosesystems waren, dass die simulierten Fehler zuverlässig gefunden wurden und dass das Diagnosesystem in der Lage war die plausiblen Ursachen dieser Fehler zu diagnostizieren.
Semantic Fusion for Natural Multimodal Interfaces using Concurrent Augmented Transition Networks
(2018)
Semantic fusion is a central requirement of many multimodal interfaces. Procedural methods like finite-state transducers and augmented transition networks have proven to be beneficial to implement semantic fusion. They are compliant with rapid development cycles that are common for the development of user interfaces, in contrast to machine-learning approaches that require time-costly training and optimization. We identify seven fundamental requirements for the implementation of semantic fusion: Action derivation, continuous feedback, context-sensitivity, temporal relation support, access to the interaction context, as well as the support of chronologically unsorted and probabilistic input. A subsequent analysis reveals, however, that there is currently no solution for fulfilling the latter two requirements. As the main contribution of this article, we thus present the Concurrent Cursor concept to compensate these shortcomings. In addition, we showcase a reference implementation, the Concurrent Augmented Transition Network (cATN), that validates the concept’s feasibility in a series of proof of concept demonstrations as well as through a comparative benchmark. The cATN fulfills all identified requirements and fills the lack amongst previous solutions. It supports the rapid prototyping of multimodal interfaces by means of five concrete traits: Its declarative nature, the recursiveness of the underlying transition network, the network abstraction constructs of its description language, the utilized semantic queries, and an abstraction layer for lexical information. Our reference implementation was and is used in various student projects, theses, as well as master-level courses. It is openly available and showcases that non-experts can effectively implement multimodal interfaces, even for non-trivial applications in mixed and virtual reality.
Asynchronous Traffic Shaping enabled bounded latency with low complexity for time sensitive networking without the need for time synchronization. However, its main focus is the guaranteed maximum delay. Jitter-sensitive applications may still be forced towards synchronization. This work proposes traffic damping to reduce end-to-end delay jitter. It discusses its application and shows that both the prerequisites and the guaranteed delay of traffic damping and ATS are very similar. Finally, it presents a brief evaluation of delay jitter in an example topology by means of a simulation and worst case estimation.
Background
Medication trend studies show the changes of medication over the years and may be replicated using a clinical Data Warehouse (CDW). Even nowadays, a lot of the patient information, like medication data, in the EHR is stored in the format of free text. As the conventional approach of information extraction (IE) demands a high developmental effort, we used ad hoc IE instead. This technique queries information and extracts it on the fly from texts contained in the CDW.
Methods
We present a generalizable approach of ad hoc IE for pharmacotherapy (medications and their daily dosage) presented in hospital discharge letters. We added import and query features to the CDW system, like error tolerant queries to deal with misspellings and proximity search for the extraction of the daily dosage. During the data integration process in the CDW, negated, historical and non-patient context data are filtered. For the replication studies, we used a drug list grouped by ATC (Anatomical Therapeutic Chemical Classification System) codes as input for queries to the CDW.
Results
We achieve an F1 score of 0.983 (precision 0.997, recall 0.970) for extracting medication from discharge letters and an F1 score of 0.974 (precision 0.977, recall 0.972) for extracting the dosage. We replicated three published medical trend studies for hypertension, atrial fibrillation and chronic kidney disease. Overall, 93% of the main findings could be replicated, 68% of sub-findings, and 75% of all findings. One study could be completely replicated with all main and sub-findings.
Conclusion
A novel approach for ad hoc IE is presented. It is very suitable for basic medical texts like discharge letters and finding reports. Ad hoc IE is by definition more limited than conventional IE and does not claim to replace it, but it substantially exceeds the search capabilities of many CDWs and it is convenient to conduct replication studies fast and with high quality.