Filtern
Volltext vorhanden
- ja (7)
Gehört zur Bibliographie
- ja (7)
Erscheinungsjahr
- 2020 (7) (entfernen)
Dokumenttyp
- Dissertation (7) (entfernen)
Sprache
- Englisch (7) (entfernen)
Schlagworte
Institut
- Institut für Informatik (7) (entfernen)
Sonstige beteiligte Institutionen
Affordable prices for 3D laser range finders and mature software solutions for registering multiple point clouds in a common coordinate system paved the way for new areas of application for 3D point clouds. Nowadays we see 3D laser scanners being used not only by digital surveying experts but also by law enforcement officials, construction workers or archaeologists. Whether the purpose is digitizing factory production lines, preserving historic sites as digital heritage or recording environments for gaming or virtual reality applications -- it is hard to imagine a scenario in which the final point cloud must also contain the points of "moving" objects like factory workers, pedestrians, cars or flocks of birds. For most post-processing tasks, moving objects are undesirable not least because moving objects will appear in scans multiple times or are distorted due to their motion relative to the scanner rotation.
The main contributions of this work are two postprocessing steps for already registered 3D point clouds. The first method is a new change detection approach based on a voxel grid which allows partitioning the input points into static and dynamic points using explicit change detection and subsequently remove the latter for a "cleaned" point cloud. The second method uses this cleaned point cloud as input for detecting collisions between points of the environment point cloud and a point cloud of a model that is moved through the scene.
Our approach on explicit change detection is compared to the state of the art using multiple datasets including the popular KITTI dataset. We show how our solution achieves similar or better F1-scores than an existing solution while at the same time being faster.
To detect collisions we do not produce a mesh but approximate the raw point cloud data by spheres or cylindrical volumes. We show how our data structures allow efficient nearest neighbor queries that make our CPU-only approach comparable to a massively-parallel algorithm running on a GPU. The utilized algorithms and data structures are discussed in detail. All our software is freely available for download under the terms of the GNU General Public license. Most of the datasets used in this thesis are freely available as well. We provide shell scripts that allow one to directly reproduce the quantitative results shown in this thesis for easy verification of our findings.
Nowadays, employees have to work with applications, technical services, and systems every day for hours. Hence, performance degradation of such systems might be perceived negatively by the employees, increase frustration, and might also have a negative effect on their productivity. The assessment of the application's performance in order to provide a smooth operation of the application is part of the application management. Within this process it is not sufficient to assess the system performance solely on technical performance parameters, e.g., response or loading times. These values have to be set into relation to the perceived performance quality on the user's side - the quality of experience (QoE).
This dissertation focuses on the monitoring and estimation of the QoE of enterprise applications. As building models to estimate the QoE requires quality ratings from the users as ground truth, one part of this work addresses methods to collect such ratings. Besides the evaluation of approaches to improve the quality of results of tasks and studies completed on crowdsourcing platforms, a general concept for monitoring and estimating QoE in enterprise environments is presented. Here, relevant design dimension of subjective studies are identified and their impact of the QoE is evaluated and discussed. By considering the findings, a methodology for collecting quality ratings from employees during their regular work is developed. The method is realized by implementing a tool to conduct short surveys and deployed in a cooperating company.
As a foundation for learning QoE estimation models, this work investigates the relationship between user-provided ratings and technical performance parameters. This analysis is based on a data set collected in a user study in a cooperating company during a time span of 1.5 years. Finally, two QoE estimation models are introduced and their performance is evaluated.
Time-triggered communication is widely used throughout several industry do-
mains, primarily for reliable and real-time capable data transfers. However,
existing time-triggered technologies are designed for terrestrial usage and not
directly applicable to space applications due to the harsh environment. In-
stead, specific hardware must be developed to deal with thermal, mechanical,
and especially radiation effects.
SpaceWire, as an event-triggered communication technology, has been used
for years in a large number of space missions. Its moderate complexity, her-
itage, and transmission rates up to 400 MBits/s are one of the main ad-
vantages and often without alternatives for on-board computing systems of
spacecraft. At present, real-time data transfers are either achieved by prior-
itization inside SpaceWire routers or by applying a simplified time-triggered
approach. These solutions either imply problems if they are used inside dis-
tributed on-board computing systems or in case of networks with more than
a single router are required.
This work provides a solution for the real-time problem by developing
a novel clock synchronization approach. This approach is focused on being
compatible with distributed system structures and allows time-triggered data
transfers. A significant difference to existing technologies is the remote clock
estimation by the use of pulses. They are transferred over the network and
remove the need for latency accumulation, which allows the incorporation of
standardized SpaceWire equipment. Additionally, local clocks are controlled
decentralized and provide different correction capabilities in order to handle
oscillator induced uncertainties. All these functionalities are provided by a developed Network Controller (NC), able to isolate the attached network and
to control accesses.
In recent years, great progress has been made in the area of Artificial Intelligence (AI) due to the possibilities of Deep Learning which steadily yielded new state-of-the-art results especially in many image recognition tasks.
Currently, in some areas, human performance is achieved or already exceeded.
This great development already had an impact on the area of Optical Music Recognition (OMR) as several novel methods relying on Deep Learning succeeded in specific tasks.
Musicologists are interested in large-scale musical analysis and in publishing digital transcriptions in a collection enabling to develop tools for searching and data retrieving.
The application of OMR promises to simplify and thus speed-up the transcription process by either providing fully-automatic or semi-automatic approaches.
This thesis focuses on the automatic transcription of Medieval music with a focus on square notation which poses a challenging task due to complex layouts, highly varying handwritten notations, and degradation.
However, since handwritten music notations are quite complex to read, even for an experienced musicologist, it is to be expected that even with new techniques of OMR manual corrections are required to obtain the transcriptions.
This thesis presents several new approaches and open source software solutions for layout analysis and Automatic Text Recognition (ATR) for early documents and for OMR of Medieval manuscripts providing state-of-the-art technology.
Fully Convolutional Networks (FCN) are applied for the segmentation of historical manuscripts and early printed books, to detect staff lines, and to recognize neume notations.
The ATR engine Calamari is presented which allows for ATR of early prints and also the recognition of lyrics.
Configurable CNN/LSTM-network architectures which are trained with the segmentation-free CTC-loss are applied to the sequential recognition of text but also monophonic music.
Finally, a syllable-to-neume assignment algorithm is presented which represents the final step to obtain a complete transcription of the music.
The evaluations show that the performances of any algorithm is highly depending on the material at hand and the number of training instances.
The presented staff line detection correctly identifies staff lines and staves with an $F_1$-score of above $99.5\%$.
The symbol recognition yields a diplomatic Symbol Accuracy Rate (dSAR) of above $90\%$ by counting the number of correct predictions in the symbols sequence normalized by its length.
The ATR of lyrics achieved a Character Error Rate (CAR) (equivalently the number of correct predictions normalized by the sentence length) of above $93\%$ trained on 771 lyric lines of Medieval manuscripts and of 99.89\% when training on around 3.5 million lines of contemporary printed fonts.
The assignment of syllables and their corresponding neumes reached $F_1$-scores of up to $99.2\%$.
A direct comparison to previously published performances is difficult due to different materials and metrics.
However, estimations show that the reported values of this thesis exceed the state-of-the-art in the area of square notation.
A further goal of this thesis is to enable musicologists without technical background to apply the developed algorithms in a complete workflow by providing a user-friendly and comfortable Graphical User Interface (GUI) encapsulating the technical details.
For this purpose, this thesis presents the web-application OMMR4all.
Its fully-functional workflow includes the proposed state-of-the-art machine-learning algorithms and optionally allows for a manual intervention at any stage to correct the output preventing error propagation.
To simplify the manual (post-) correction, OMMR4all provides an overlay-editor that superimposes the annotations with a scan of the original manuscripts so that errors can easily be spotted.
The workflow is designed to be iteratively improvable by training better models as soon as new Ground Truth (GT) is available.
An Intelligent Semi-Automatic Workflow for Optical Character Recognition of Historical Printings
(2020)
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years great progress has been made in the area of historical OCR resulting in several powerful open-source tools for preprocessing, layout analysis and segmentation, Automatic Text Recognition (ATR) and postcorrection. Their major drawback is that they only offer limited applicability by non-technical users like humanist scholars, in particular when it comes to the combined use of several tools in a workflow. Furthermore, depending on the material, these tools are usually not able to fully automatically achieve sufficiently low error rates, let alone perfect results, creating a demand for an interactive postcorrection functionality which, however, is generally not incorporated.
This thesis addresses these issues by presenting an open-source OCR software called OCR4all which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow. While a variety of materials can already be processed fully automatically, books with more complex layouts require manual intervention by the users. This is mostly due to the fact that the required Ground Truth (GT) for training stronger mixed models (for segmentation as well as text recognition) is not available, yet, neither in the desired quantity nor quality.
To deal with this issue in the short run, OCR4all offers better recognition capabilities in combination with a very comfortable Graphical User Interface (GUI) that allows error corrections not only in the final output, but already in early stages to minimize error propagation. In the long run this constant manual correction produces large quantities of valuable, high quality training material which can be used to improve fully automatic approaches. Further on, extensive configuration capabilities are provided to set the degree of automation of the workflow and to make adaptations to the carefully selected default parameters for specific printings, if necessary. The architecture of OCR4all allows for an easy integration (or substitution) of newly developed tools for its main components by supporting standardized interfaces like PageXML, thus aiming at continual higher automation for historical printings.
In addition to OCR4all, several methodical extensions in the form of accuracy improving techniques for training and recognition are presented. Most notably an effective, sophisticated, and adaptable voting methodology using a single ATR engine, a pretraining procedure, and an Active Learning (AL) component are proposed. Experiments showed that combining pretraining and voting significantly improves the effectiveness of book-specific training, reducing the obtained Character Error Rates (CERs) by more than 50%.
The proposed extensions were further evaluated during two real world case studies: First, the voting and pretraining techniques are transferred to the task of constructing so-called mixed models which are trained on a variety of different fonts. This was done by using 19th century Fraktur script as an example, resulting in a considerable improvement over a variety of existing open-source and commercial engines and models. Second, the extension from ATR on raw text to the adjacent topic of typography recognition was successfully addressed by thoroughly indexing a historical lexicon that heavily relies on different font types in order to encode its complex semantic structure.
During the main experiments on very complex early printed books even users with minimal or no experience were able to not only comfortably deal with the challenges presented by the complex layout, but also to recognize the text with manageable effort and great quality, achieving excellent CERs below 0.5%. Furthermore, the fully automated application on 19th century novels showed that OCR4all (average CER of 0.85%) can considerably outperform the commercial state-of-the-art tool ABBYY Finereader (5.3%) on moderate layouts if suitably pretrained mixed ATR models are available.
Recent advances in Natural Language Preprocessing (NLP) allow for a fully automatic extraction of character networks for an incoming text. These networks serve as a compact and easy to grasp representation of literary fiction. They offer an aggregated view of the text, which can be used during distant reading approaches for the analysis of literary hypotheses. In their core, the networks consist of nodes, which represent literary characters, and edges, which represent relations between characters. For an automatic extraction of such a network, the first step is the detection of the references of all fictional entities that are of importance for a text. References to the fictional entities appear in the form of names, noun phrases and pronouns and prior to this work, no components capable of automatic detection of character references were available. Existing tools are only capable of detecting proper nouns, a subset of all character references. When evaluated on the task of detecting proper nouns in the domain of literary fiction, they still underperform at an F1-score of just about 50%. This thesis uses techniques from the field of semi-supervised learning, such as Distant supervision and Generalized Expectations, and improves the results of an existing tool to about 82%, when evaluated on all three categories in literary fiction, but without the need for annotated data in the target domain. However, since this quality is still not sufficient, the decision to annotate DROC, a corpus comprising 90 fragments of German novels was made. This resulted in a new general purpose annotation environment titled as ATHEN, as well as annotated data that spans about 500.000 tokens in total. Using this data, the combination of supervised algorithms and a tailored rule based algorithm, which in combination are able to exploit both - local consistencies as well as global consistencies - yield an algorithm with an F1-score of about 93%. This component is referred to as the Kallimachos tagger.
A character network can not directly display references however, instead they need to be clustered so that all references that belong to a real world or fictional entity are grouped together. This process widely known as coreference resolution is a hard problem in the focus of research for more than half a century. This work experimented with adaptations of classical feature based machine learning, with a dedicated rule based algorithm and with modern techniques of Deep Learning, but no approach can surpass 55% B-Cubed F1, when evaluated on DROC. Due to this barrier, many researchers do not use a fully-fledged coreference resolution when they extract character networks, but only focus on a more forgiving subset- the names. For novels such as Alice's Adventures in Wonderland by Lewis Caroll, this would however only result in a network in which many important characters are missing. In order to integrate important characters into the network that are not named by the author, this work makes use of automatic detection of speaker and addressees for direct speech utterances (all entities involved in a dialog are considered to be of importance). This problem is by itself not an easy task, however the most successful system analysed in this thesis is able to correctly determine the speaker to about 85% of the utterances as well as about 65% of the addressees. This speaker information can not only help to identify the most dominant characters, but also serves as a way to model the relations between entities.
During the span of this work, components have been developed to model relations between characters using speaker attribution, using co-occurrences as well as by the usage of true interactions, for which yet again a dataset was annotated using ATHEN. Furthermore, since relations between characters are usually typed, a component for the extraction of a typed relation was developed. Similar to the experiments for the character reference detection, a combination of a rule based and a Maximum Entropy classifier yielded the best overall results, with the extraction of family relations showing a score of about 80% and the quality of love relations with a score of about 50%. For family relations, a kernel for a Support Vector Machine was developed that even exceeded the scores of the combined approach but is behind on the other labels.
In addition, this work presents new ways to evaluate automatically extracted networks without the need of domain experts, instead it relies on the usage of expert summaries. It also refrains from the uses of social network analysis for the evaluation, but instead presents ranked evaluations using Precision@k and the Spearman Rank correlation coefficient for the evaluation of the nodes and edges of the network. An analysis using these metrics showed, that the central characters of a novel are contained with high probability but the quality drops rather fast if more than five entities are analyzed. The quality of the edges is mainly dominated by the quality of the coreference resolution and the correlation coefficient between gold edges and system edges therefore varies between 30 and 60%.
All developed components are aggregated alongside a large set of other preprocessing modules in the Kallimachos pipeline and can be reused without any restrictions.
Virtual reality and related media and communication technologies have a growing
impact on professional application fields and our daily life. Virtual environments
have the potential to change the way we perceive ourselves and how we interact
with others. In comparison to other technologies, virtual reality allows for the
convincing display of a virtual self-representation, an avatar, to oneself and also to
others. This is referred to as user embodiment. Avatars can be of varying realism
and abstraction in their appearance and in the behaviors they convey. Such userembodying
interfaces, in turn, can impact the perception of the self as well as
the perception of interactions. For researchers, designers, and developers it is of
particular interest to understand these perceptual impacts, to apply them to therapy,
assistive applications, social platforms, or games, for example. The present thesis
investigates and relates these impacts with regard to three areas: intrapersonal
effects, interpersonal effects, and effects of social augmentations provided by the
simulation.
With regard to intrapersonal effects, we specifically explore which simulation
properties impact the illusion of owning and controlling a virtual body, as well
as a perceived change in body schema. Our studies lead to the construction of
an instrument to measure these dimensions and our results indicate that these
dimensions are especially affected by the level of immersion, the simulation latency,
as well as the level of personalization of the avatar.
With regard to interpersonal effects we compare physical and user-embodied social
interactions, as well as different degrees of freedom in the replication of nonverbal
behavior. Our results suggest that functional levels of interaction are maintained,
whereas aspects of presence can be affected by avatar-mediated interactions, and
collaborative motor coordination can be disturbed by immersive simulations.
Social interaction is composed of many unknown symbols and harmonic patterns
that define our understanding and interpersonal rapport. For successful virtual
social interactions, a mere replication of physical world behaviors to virtual environments
may seem feasible. However, the potential of mediated social interactions
goes beyond this mere replication. In a third vein of research, we propose and
evaluate alternative concepts on how computers can be used to actively engage in
mediating social interactions, namely hybrid avatar-agent technologies. Specifically,
we investigated the possibilities to augment social behaviors by modifying and
transforming user input according to social phenomena and behavior, such as nonverbal
mimicry, directed gaze, joint attention, and grouping. Based on our results
we argue that such technologies could be beneficial for computer-mediated social
interactions such as to compensate for lacking sensory input and disturbances in
data transmission or to increase aspects of social presence by visual substitution or
amplification of social behaviors.
Based on related work and presented findings, the present thesis proposes the
perspective of considering computers as social mediators. Concluding from prototypes
and empirical studies, the potential of technology to be an active mediator of social
perception with regard to the perception of the self, as well as the perception of
social interactions may benefit our society by enabling further methods for diagnosis,
treatment, and training, as well as the inclusion of individuals with social disorders.
To this regard, we discuss implications for our society and ethical aspects. This
thesis extends previous empirical work and further presents novel instruments,
concepts, and implications to open up new perspectives for the development of
virtual reality, mixed reality, and augmented reality applications.