000 Informatik, Informationswissenschaft, allgemeine Werke
Refine
Has Fulltext
- yes (134)
Year of publication
Document Type
- Doctoral Thesis (92)
- Journal article (20)
- Working Paper (5)
- Book (3)
- Master Thesis (3)
- Jahresbericht (2)
- Bachelor Thesis (2)
- Conference Proceeding (2)
- Report (2)
- Book article / Book chapter (1)
Keywords
- Leistungsbewertung (14)
- Quality of Experience (9)
- Cloud Computing (7)
- Maschinelles Lernen (6)
- Data Mining (5)
- Netzwerk (5)
- Mensch-Maschine-Kommunikation (4)
- Modellierung (4)
- Simulation (4)
- Telekommunikationsnetz (4)
Institute
- Institut für Informatik (101)
- Betriebswirtschaftliches Institut (9)
- Graduate School of Science and Technology (7)
- Graduate School of Life Sciences (4)
- Theodor-Boveri-Institut für Biowissenschaften (4)
- Institut für Molekulare Infektionsbiologie (3)
- Universitätsbibliothek (3)
- Institut Mensch - Computer - Medien (2)
- Universität Würzburg (2)
- Deutsches Zentrum für Herzinsuffizienz (DZHI) (1)
Sonstige beteiligte Institutionen
- Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Raumfahrtsysteme (2)
- Siemens AG (2)
- Technische Hochschule Nürnberg Georg Simon Ohm (2)
- Beuth Hochschule für Technik Berlin (1)
- California Institute of Technology (1)
- Deutsches Zentrum für Luft- und Raumfahrt e.V. (1)
- European Space Agency (1)
- Hochschule Wismar (1)
- NASA Jet Propulsion Laboratory (1)
- University of Applied Sciences and Arts Western Switzerland, Fribourg (1)
EU-Project number / Contract (GA) number
- 320377 (1)
Mini Unmanned Aerial Vehicles (MUAVs) are becoming popular research platform and
drawing considerable attention, particularly during the last decade due to their afford- ability and multi-dimensional applications in almost every walk of life. MUAVs have obvious advantages over manned platforms including their much lower manufacturing and operational costs, risk avoidance for human pilots, flying safely low and slow, and realization of operations that are beyond inherent human limitations. The advancement in Micro Electro-Mechanical System (MEMS) technology, Avionics and miniaturization of sensors also played a significant role in the evolution of MUAVs. These vehicles range from simple toys found at electronic supermarkets for entertainment purpose to highly sophisticated commercial platforms performing novel assignments like offshore wind power station inspection and 3D modelling of buildings etc. MUAVs are also more environment friendly as they cause less air pollution and noise. Unmanned is therefore unmatched. Recent research focuses on use of multiple inexpensive vehicles flying together, while maintaining required relative separations, to carry out the tasks efficiently compared to a single exorbitant vehicle. Redundancy also does away the risk of loss of a single whole-mission dependent vehicle. Some of the valuable applications in the domain of cooperative control include joint load transportation, search and rescue, mobile communication relays, pesticide spraying and weather monitoring etc. Though realization of multi-UAV coupled flight is complex, however obvious advantages justify
the laborious work involved...
The thesis focuses on Quality of Experience (QoE) of HTTP adaptive video streaming (HAS) and traffic management in access networks to improve the QoE of HAS. First, the QoE impact of adaptation parameters and time on layer was investigated with subjective crowdsourcing studies. The results were used to compute a QoE-optimal adaptation strategy for given video and network conditions. This allows video service providers to develop and benchmark improved adaptation logics for HAS. Furthermore, the thesis investigated concepts to monitor video QoE on application and network layer, which can be used by network providers in the QoE-aware traffic management cycle. Moreover, an analytic and simulative performance evaluation of QoE-aware traffic management on a bottleneck link was conducted. Finally, the thesis investigated socially-aware traffic management for HAS via Wi-Fi offloading of mobile HAS flows. A model for the distribution of public Wi-Fi hotspots and a platform for socially-aware traffic management on private home routers was presented. A simulative performance evaluation investigated the impact of Wi-Fi offloading on the QoE and energy consumption of mobile HAS.
The importance of enterprise systems is increasingly growing and they are in the center of attention and consideration by organizations in various types of business and industries from extra-large public or private organizations to small and medium-sized service sector business. These systems are continuously advancing functionally and technologically and are inevitable and ineluctable for the enterprises to maximize their productivity and integration in current competitive national and global business environments.
Also, since local software solutions could not meet the requirements of especially large enterprises functionally and technically, and as giant global enterprise software producers like SAP, Oracle and Microsoft are improving their solutions rapidly and since they are expanding their market to more corners of the globe, demand for these globally branded low-defect software solutions is daily ascending. The agreements for international ERP implementation project consultancy are, therefore, exponentially increasing, while the research on the influencing factors and know-hows is scattered and rare, and thus, a timely urgency for this field of research is being felt.
The final developed five-in-five framework of this study, for the first time, collects all mentioned-in-the-history critical success factors and project activities, while sequencing them in five phases and categorizing them in five focus areas for international ERP implementation projects. This framework provides a bird’s-eye view and draws a comprehensive roadmap or instruction for such projects.
The field of genetics faces a lot of challenges and opportunities in both research and diagnostics due to the rise of next generation sequencing (NGS), a technology that allows to sequence DNA increasingly fast and cheap.
NGS is not only used to analyze DNA, but also RNA, which is a very similar molecule also present in the cell, in both cases producing large amounts of data.
The big amount of data raises both infrastructure and usability problems, as powerful computing infrastructures are required and there are many manual steps in the data analysis which are complicated to execute.
Both of those problems limit the use of NGS in the clinic and research, by producing a bottleneck both computationally and in terms of manpower, as for many analyses geneticists lack the required computing skills.
Over the course of this thesis we investigated how computer science can help to improve this situation to reduce the complexity of this type of analysis.
We looked at how to make the analysis more accessible to increase the number of people that can perform OMICS data analysis (OMICS groups various genomics data-sources).
To approach this problem, we developed a graphical NGS data analysis pipeline aimed at a diagnostics environment while still being useful in research in close collaboration with the Human Genetics Department at the University of Würzburg.
The pipeline has been used in various research papers on covering subjects, including works with direct author participation in genomics, transcriptomics as well as epigenomics.
To further validate the graphical pipeline, a user survey was carried out which confirmed that it lowers the complexity of OMICS data analysis.
We also studied how the data analysis can be improved in terms of computing infrastructure by improving the performance of certain analysis steps.
We did this both in terms of speed improvements on a single computer (with notably variant calling being faster by up to 18 times), as well as with distributed computing to better use an existing infrastructure.
The improvements were integrated into the previously described graphical pipeline, which itself also was focused on low resource usage.
As a major contribution and to help with future development of parallel and distributed applications, for the usage in genetics or otherwise, we also looked at how to make it easier to develop such applications.
Based on the parallel object programming model (POP), we created a Java language extension called POP-Java, which allows for easy and transparent distribution of objects.
Through this development, we brought the POP model to the cloud, Hadoop clusters and present a new collaborative distributed computing model called FriendComputing.
The advances made in the different domains of this thesis have been published in various works specified in this document.
The progress which has been made in semiconductor chip production in recent years enables a multitude of cores on a single die. However, due to further decreasing structure sizes, fault tolerance and energy consumption will represent key challenges. Furthermore, an efficient communication infrastructure is indispensable due to the high parallelism at those systems. The predominant communication system at such highly parallel systems is a Network on Chip (NoC). The focus of this thesis is on NoCs which are based on deflection routing. In this context, contributions are made to two domains, fault tolerance and dimensioning of the optimal link width. Both aspects are essential for the application of reliable, energy efficient, and deflection routing based NoCs.
It is expected that future semiconductor systems have to cope with high fault probabilities. The inherently given high connectivity of most NoC topologies can be exploited to tolerate the breakdown of links and other components. In this thesis, a fault-tolerant router architecture has been developed, which stands out for the deployed interconnection architecture and the method to overcome complex fault situations. The presented simulation results show, all data packets arrive at their destination, even at high fault probabilities. In contrast to routing table based architectures, the hardware costs of the herein presented architecture are lower and, in particular, independent of the number of components in the network.
Besides fault tolerance, hardware costs and energy efficiency are of great importance. The utilized link width has a decisive influence on these aspects. In particular, at deflection routing based NoCs, over- and under-sizing of the link width leads to unnecessary high hardware costs and bad performance, respectively. In the second part of this thesis, the optimal link width at deflection routing based NoCs is investigated. Additionally, a method to reduce the link width is introduced. Simulation and synthesis results show, the herein presented method allows a significant reduction of hardware costs at comparable performance.
RNA-binding proteins (RBPs) have been extensively studied in eukaryotes, where they post-transcriptionally regulate many cellular events including RNA transport, translation, and stability. Experimental techniques, such as cross-linking and co-purification followed by either mass spectrometry or RNA sequencing has enabled the identification and characterization of RBPs, their conserved RNA-binding domains (RBDs), and the regulatory roles of these proteins on a genome-wide scale. These developments in quantitative, high-resolution, and high-throughput screening techniques have greatly expanded our understanding of RBPs in human and yeast cells. In contrast, our knowledge of number and potential diversity of RBPs in bacteria is comparatively poor, in part due to the technical challenges associated with existing global screening approaches developed in eukaryotes.
Genome- and proteome-wide screening approaches performed in silico may circumvent these technical issues to obtain a broad picture of the RNA interactome of bacteria and identify strong RBP candidates for more detailed experimental study. Here, I report APRICOT (“Analyzing Protein RNA Interaction by Combined Output Technique”), a computational pipeline for the sequence-based identification and characterization of candidate RNA-binding proteins encoded in the genomes of all domains of life using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences of an input proteome using position-specific scoring matrices and hidden Markov models of all conserved domains available in the databases and then statistically score them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them according to functionally relevant structural properties. APRICOT performed better than other existing tools for the sequence-based prediction on the known RBP data sets. The applications and adaptability of the software was demonstrated on several large bacterial RBP data sets including the complete proteome of Salmonella Typhimurium strain SL1344. APRICOT reported 1068 Salmonella proteins as RBP candidates, which were subsequently categorized using the RBDs that have been reported in both eukaryotic and bacterial proteins. A set of 131 strong RBP candidates was selected for experimental confirmation and characterization of RNA-binding activity using RNA co-immunoprecipitation followed by high-throughput sequencing (RIP-Seq) experiments. Based on the relative abundance of transcripts across the RIP-Seq libraries, a catalogue of enriched genes was established for each candidate, which shows the RNA-binding potential of 90% of these proteins. Furthermore, the direct targets of few of these putative RBPs were validated by means of cross-linking and co-immunoprecipitation (CLIP) experiments.
This thesis presents the computational pipeline APRICOT for the global screening of protein primary sequences for potential RBPs in bacteria using RBD information from all kingdoms of life. Furthermore, it provides the first bio-computational resource of putative RBPs in Salmonella, which could now be further studied for their biological and regulatory roles. The command line tool and its documentation are available at https://malvikasharan.github.io/APRICOT/.
Content Delivery Networks (CDNs) are networks that distribute content in the Internet. CDNs are increasingly responsible for the largest share of traffic in the Internet. CDNs distribute popular content to caches in many geographical areas to save bandwidth by avoiding unnecessary multihop retransmission. By bringing the content geographically closer to the user, CDNs also reduce the latency of the services.
Besides end users and content providers, which require high availability of high quality content, CDN providers and Internet Service Providers (ISPs) are interested in an efficient operation of CDNs. In order to ensure an efficient replication of the content, CDN providers have a network of (globally) distributed interconnected datacenters at different points of presence (PoPs). ISPs aim to provide reliable and high speed Internet access. They try to keep the load on the network low and to reduce cost for connectivity with other ISPs.
The increasing number of mobile devices such as smart phones and tablets, high definition video content and high resolution displays result in a continuous growth in mobile traffic. This growth in mobile traffic is further accelerated by newly emerging services, such as mobile live streaming and broadcasting services. The steep increase in mobile traffic is expected to reach by 2018 roughly 60% of total network traffic, the majority of which will be video. To handle the growth in mobile networks, the next generation of 5G mobile networks is designed to have higher access rates and an increased densification of the network infrastructure. With the explosion of access rates and number of base stations the backhaul of wireless networks will become congested.
To reduce the load on the backhaul, the research community suggests installing local caches in gateway routers between the wireless network and the Internet, in base stations of different sizes, and in end-user devices. The local deployment of caches allows keeping the traffic within the ISPs network. The caches are organized in a hierarchy, where caches in the lowest tier are requested first. The request is forwarded to the next tier, if the requested object is not found. Appropriate evaluation methods are required to optimally dimension the caches dependent on the traffic characteristics and the available resources. Additionally methods are necessary that allow performance evaluation of backhaul bandwidth aggregation systems, which further reduce the load on the backhaul.
This thesis analyses CDNs utilizing locally available resources and develops the following evaluations and optimization approaches: Characterization of CDNs and distribution of resources in the Internet, analysis and optimization of hierarchical caching systems with bandwidth constraints and performance evaluation of bandwidth aggregation systems.
This thesis contributes to several issues in the context of SDN and NFV, with an emphasis on performance and management.
The main contributions are guide lines for operators migrating to software-based networks, as well as an analytical model for the packet processing in a Linux system using the Kernel NAPI.
Im Rahmen dieser Arbeit werden die Nebenläufigkeit, Konsistenz und Latenz in asynchronen
Interaktiven Echtzeitsystemen durch die Techniken des Profilings und des Model
Checkings untersucht. Zu Beginn wird erläutert, warum das asynchrone Modell das vielversprechendste
für die Nebenläufigkeit in einem Interaktiven Echtzeitsystem ist. Hierzu
wird ein Vergleich zu anderen Modellen gezogen. Darüber hinaus wird ein detaillierter
Vergleich von Synchronisationstechnologien, welche die Grundlage für Konsistenz
schaffen, durchgeführt. Auf der Grundlage dieser beiden Vergleiche und der Betrachtung
anderer Systeme wird ein Synchronisationskonzept entwickelt.
Auf dieser Basis wird die Nebenläufigkeit, Konsistenz und Latenz mit zwei Verfahren
untersucht. Die erste Technik ist das Profiling, wobei einige neue Darstellungsformen von
gemessenen Daten entwickelt werden. Diese neu entwickelten Darstellungsformen werden
in der Implementierung eines Profilers verwendet. Als zweite Technik wird das Model
Checking analysiert, welches bisher noch nicht im Kontext von Interaktiven Echtzeitsystemen
verwendet wurde. Model Checking dient dazu, die Verhaltensweise eines Interaktiven
Echtzeitsystems vorherzusagen. Diese Vorhersagen werden mit den Messungen aus
dem Profiler verglichen.
Nowadays, data centers are becoming increasingly dynamic due to the common adoption of virtualization technologies. Systems can scale their capacity on demand by growing and shrinking their resources dynamically based on the current load. However, the complexity and performance of modern data centers is influenced not only by the software architecture, middleware, and computing resources, but also by network virtualization, network protocols, network services, and configuration. The field of network virtualization is not as mature as server virtualization and there are multiple competing approaches and technologies. Performance modeling and prediction techniques provide a powerful tool to analyze the performance of modern data centers. However, given the wide variety of network virtualization approaches, no common approach exists for modeling and evaluating the performance of virtualized networks.
The performance community has proposed multiple formalisms and models for evaluating the performance of infrastructures based on different network virtualization technologies. The existing performance models can be divided into two main categories: coarse-grained analytical models and highly-detailed simulation models. Analytical performance models are normally defined at a high level of abstraction and thus they abstract many details of the real network and therefore have limited predictive power. On the other hand, simulation models are normally focused on a selected networking technology and take into account many specific performance influencing factors, resulting in detailed models that are tightly bound to a given technology, infrastructure setup, or to a given protocol stack.
Existing models are inflexible, that means, they provide a single solution method without providing means for the user to influence the solution accuracy and solution overhead. To allow for flexibility in the performance prediction, the user is required to build multiple different performance models obtaining multiple performance predictions. Each performance prediction may then have different focus, different performance metrics, prediction accuracy, and solving time.
The goal of this thesis is to develop a modeling approach that does not require the user to have experience in any of the applied performance modeling formalisms. The approach offers the flexibility in the modeling and analysis by balancing between: (a) generic character and low overhead of coarse-grained analytical models, and (b) the more detailed simulation models with higher prediction accuracy.
The contributions of this thesis intersect with technologies and research areas, such as: software engineering, model-driven software development, domain-specific modeling, performance modeling and prediction, networking and data center networks, network virtualization, Software-Defined Networking (SDN), Network Function Virtualization (NFV). The main contributions of this thesis compose the Descartes Network Infrastructure (DNI) approach and include:
• Novel modeling abstractions for virtualized network infrastructures. This includes two meta-models that define modeling languages for modeling data center network performance. The DNI and miniDNI meta-models provide means for representing network infrastructures at two different abstraction levels. Regardless of which variant of the DNI meta-model is used, the modeling language provides generic modeling elements allowing to describe the majority of existing and future network technologies, while at the same time abstracting factors that have low influence on the overall performance. I focus on SDN and NFV as examples of modern virtualization technologies.
• Network deployment meta-model—an interface between DNI and other meta- models that allows to define mapping between DNI and other descriptive models. The integration with other domain-specific models allows capturing behaviors that are not reflected in the DNI model, for example, software bottlenecks, server virtualization, and middleware overheads.
• Flexible model solving with model transformations. The transformations enable solving a DNI model by transforming it into a predictive model. The model transformations vary in size and complexity depending on the amount of data abstracted in the transformation process and provided to the solver. In this thesis, I contribute six transformations that transform DNI models into various predictive models based on the following modeling formalisms: (a) OMNeT++ simulation, (b) Queueing Petri Nets (QPNs), (c) Layered Queueing Networks (LQNs). For each of these formalisms, multiple predictive models are generated (e.g., models with different level of detail): (a) two for OMNeT++, (b) two for QPNs, (c) two for LQNs. Some predictive models can be solved using multiple alternative solvers resulting in up to ten different automated solving methods for a single DNI model.
• A model extraction method that supports the modeler in the modeling process by automatically prefilling the DNI model with the network traffic data. The contributed traffic profile abstraction and optimization method provides a trade-off by balancing between the size and the level of detail of the extracted profiles.
• A method for selecting feasible solving methods for a DNI model. The method proposes a set of solvers based on trade-off analysis characterizing each transformation with respect to various parameters such as its specific limitations, expected prediction accuracy, expected run-time, required resources in terms of CPU and memory consumption, and scalability.
• An evaluation of the approach in the context of two realistic systems. I evaluate the approach with focus on such factors like: prediction of network capacity and interface throughput, applicability, flexibility in trading-off between prediction accuracy and solving time. Despite not focusing on the maximization of the prediction accuracy, I demonstrate that in the majority of cases, the prediction error is low—up to 20% for uncalibrated models and up to 10% for calibrated models depending on the solving technique.
In summary, this thesis presents the first approach to flexible run-time performance prediction in data center networks, including network based on SDN. It provides ability to flexibly balance between performance prediction accuracy and solving overhead. The approach provides the following key benefits:
• It is possible to predict the impact of changes in the data center network on the performance. The changes include: changes in network topology, hardware configuration, traffic load, and applications deployment.
• DNI can successfully model and predict the performance of multiple different of network infrastructures including proactive SDN scenarios.
• The prediction process is flexible, that is, it provides balance between the granularity of the predictive models and the solving time. The decreased prediction accuracy is usually rewarded with savings of the solving time and consumption of resources required for solving.
• The users are enabled to conduct performance analysis using multiple different prediction methods without requiring the expertise and experience in each of the modeling formalisms.
The components of the DNI approach can be also applied to scenarios that are not considered in this thesis. The approach is generalizable and applicable for the following examples: (a) networks outside of data centers may be analyzed with DNI as long as the background traffic profile is known; (b) uncalibrated DNI models may serve as a basis for design-time performance analysis; (c) the method for extracting and compacting of traffic profiles may be used for other, non-network workloads as well.
Virtualization allows the creation of virtual instances of physical devices, such as network and processing units. In a virtualized system, governed by a hypervisor, resources are shared among virtual machines (VMs). Virtualization has been receiving increasing interest as away to reduce costs through server consolidation and to enhance the flexibility of physical infrastructures. Although virtualization provides many benefits, it introduces new security challenges; that is, the introduction of a hypervisor introduces threats since hypervisors expose new attack surfaces.
Intrusion detection is a common cyber security mechanism whose task is to detect malicious activities in host and/or network environments. This enables timely reaction in order to stop an on-going attack, or to mitigate the impact of a security breach. The wide adoption of virtualization has resulted in the increasingly common practice of deploying conventional intrusion detection systems (IDSs), for example, hardware IDS appliances or common software-based IDSs, in designated VMs as virtual network functions (VNFs). In addition, the research and industrial communities have developed IDSs specifically designed to operate in virtualized environments (i.e., hypervisorbased IDSs), with components both inside the hypervisor and in a designated VM. The latter are becoming increasingly common with the growing proliferation of virtualized data centers and the adoption of the cloud computing paradigm, for which virtualization is as a key enabling technology.
To minimize the risk of security breaches, methods and techniques for evaluating IDSs in an accurate manner are essential. For instance, one may compare different IDSs in terms of their attack detection accuracy in order to identify and deploy the IDS that operates optimally in a given environment, thereby reducing the risks of a security breach. However, methods and techniques for realistic and accurate evaluation of the attack detection accuracy of IDSs in virtualized environments (i.e., IDSs deployed as VNFs or hypervisor-based IDSs) are lacking. That is, workloads that exercise the sensors of an evaluated IDS and contain attacks targeting hypervisors are needed. Attacks targeting hypervisors are of high severity since they may result in, for example, altering the hypervisors’s memory and thus enabling the execution of malicious code with hypervisor privileges. In addition, there are no metrics and measurement methodologies
for accurately quantifying the attack detection accuracy of IDSs in virtualized environments with elastic resource provisioning (i.e., on-demand allocation or deallocation of virtualized hardware resources to VMs). Modern hypervisors allow for hotplugging virtual CPUs and memory on the designated VM where the intrusion detection engine of hypervisor-based IDSs, as well as of IDSs deployed as VNFs, typically operates. Resource hotplugging may have a significant impact on the attack detection accuracy of an evaluated IDS, which is not taken into account by existing metrics for quantifying IDS attack detection accuracy. This may lead to inaccurate measurements, which, in turn, may result in the deployment of misconfigured or ill-performing IDSs, increasing
the risk of security breaches.
This thesis presents contributions that span the standard components of any system
evaluation scenario: workloads, metrics, and measurement methodologies. The scientific contributions of this thesis are:
A comprehensive systematization of the common practices and the state-of-theart on IDS evaluation. This includes: (i) a definition of an IDS evaluation design space allowing to put existing practical and theoretical work into a common context in a systematic manner; (ii) an overview of common practices in IDS evaluation reviewing evaluation approaches and methods related to each part of the design space; (iii) and a set of case studies demonstrating how different IDS evaluation approaches are applied in practice. Given the significant amount of existing practical and theoretical work related to IDS evaluation, the presented systematization is beneficial for improving the general understanding of the topic by providing an overview of the current state of the field. In addition, it is beneficial for identifying and contrasting advantages and disadvantages of different IDS evaluation methods and practices, while also helping to identify specific requirements and best practices for evaluating current and future IDSs.
An in-depth analysis of common vulnerabilities of modern hypervisors as well as a set of attack models capturing the activities of attackers triggering these vulnerabilities. The analysis includes 35 representative vulnerabilities of hypercall handlers (i.e., hypercall vulnerabilities). Hypercalls are software traps from a kernel of a VM to the hypervisor. The hypercall interface of hypervisors, among device drivers and VM exit events, is one of the attack surfaces that hypervisors expose. Triggering a hypercall vulnerability may lead to a crash of the hypervisor or to altering the hypervisor’s memory. We analyze the origins
of the considered hypercall vulnerabilities, demonstrate and analyze possible attacks that trigger them (i.e., hypercall attacks), develop hypercall attack models(i.e., systematized activities of attackers targeting the hypercall interface), and discuss future research directions focusing on approaches for securing hypercall interfaces.
A novel approach for evaluating IDSs enabling the generation of workloads that contain attacks targeting hypervisors, that is, hypercall attacks. We propose an approach for evaluating IDSs using attack injection (i.e., controlled execution of attacks during regular operation of the environment where an IDS under test is deployed). The injection of attacks is performed based on attack models that capture realistic attack scenarios. We use the hypercall attack models developed as part of this thesis for injecting hypercall attacks.
A novel metric and measurement methodology for quantifying the attack detection accuracy of IDSs in virtualized environments that feature elastic resource provisioning. We demonstrate how the elasticity of resource allocations in such environments may impact the IDS attack detection accuracy and show that using existing metrics in such environments may lead to practically challenging and inaccurate measurements. We also demonstrate the practical use of the metric we propose through a set of case studies, where we evaluate common conventional IDSs deployed as VNFs.
In summary, this thesis presents the first systematization of the state-of-the-art on IDS evaluation, considering workloads, metrics and measurement methodologies as integral parts of every IDS evaluation approach. In addition, we are the first to examine the hypercall attack surface of hypervisors in detail and to propose an approach using attack injection for evaluating IDSs in virtualized environments. Finally, this thesis presents the first metric and measurement methodology for quantifying the attack detection accuracy of IDSs in virtualized environments that feature elastic resource provisioning.
From a technical perspective, as part of the proposed approach for evaluating IDSsthis thesis presents hInjector, a tool for injecting hypercall attacks. We designed hInjector to enable the rigorous, representative, and practically feasible evaluation of IDSs using attack injection. We demonstrate the application and practical usefulness of hInjector, as well as of the proposed approach, by evaluating a representative hypervisor-based IDS designed to detect hypercall attacks. While we focus on evaluating the capabilities of IDSs to detect hypercall attacks, the proposed IDS evaluation approach can be generalized and applied in a broader context. For example, it may be directly used to also evaluate security mechanisms of hypervisors, such as hypercall access control (AC) mechanisms. It may also be applied to evaluate the capabilities
of IDSs to detect attacks involving operations that are functionally similar to hypercalls,
for example, the input/output control (ioctl) calls that the Kernel-based Virtual Machine (KVM) hypervisor supports. For IDSs in virtualized environments featuring elastic resource provisioning, our approach for injecting hypercall attacks can be applied in combination with the attack detection accuracy metric and measurement methodology we propose. Our approach for injecting hypercall attacks, and our metric and measurement methodology, can also be applied independently beyond the scenarios considered in this thesis. The wide spectrum of security mechanisms in virtualized environments whose evaluation can directly benefit from the contributions of this thesis (e.g., hypervisor-based IDSs, IDSs deployed as VNFs, and AC mechanisms) reflects the practical implication of the thesis.
Computer systems have replaced human work-force in many parts of everyday life, but there still exists a large number of tasks that cannot be automated, yet. This also includes tasks, which we consider to be rather simple like the categorization of image content or subjective ratings. Traditionally, these tasks have been completed by designated employees or outsourced to specialized companies. However, recently the crowdsourcing paradigm is more and more applied to complete such human-labor intensive tasks. Crowdsourcing aims at leveraging the huge number of Internet users all around the globe, which form a potentially highly available, low-cost, and easy accessible work-force.
To enable the distribution of work on a global scale, new web-based services emerged, so called crowdsourcing platforms, that act as mediator between employers posting tasks and workers completing tasks. However, the crowdsourcing approach, especially the large anonymous worker crowd, results in two types of challenges. On the one hand, there are technical challenges like the dimensioning of crowdsourcing platform infrastructure or the interconnection of crowdsourcing platforms and machine clouds to build hybrid services. On the other hand, there are conceptual challenges like identifying reliable workers or migrating traditional off-line work to the crowdsourcing environment. To tackle these challenges, this monograph analyzes and models current crowdsourcing systems to optimize crowdsourcing workflows and the underlying infrastructure. First, a categorization of crowdsourcing tasks and platforms is developed to derive generalizable properties. Based on this categorization and an exemplary analysis of a commercial crowdsourcing platform, models for different aspects of crowdsourcing platforms and crowdsourcing mechanisms are developed. A special focus is put on quality assurance mechanisms for crowdsourcing tasks, where the models are used to assess the suitability and costs of existing approaches for different types of tasks. Further, a novel quality assurance mechanism solely based on user-interactions is proposed and its feasibility is shown. The findings from the analysis of existing platforms, the derived models, and the developed quality assurance mechanisms are finally used to derive best practices for two crowdsourcing use-cases, crowdsourcing-based network measurements and crowdsourcing-based subjective user studies. These two exemplary use-cases cover aspects typical for a large range of crowdsourcing tasks and illustrated the potential benefits, but also resulting challenges when using crowdsourcing.
With the ongoing digitalization and globalization of the labor markets, the crowdsourcing paradigm is expected to gain even more importance in the next years. This is already evident in the currently new emerging fields of crowdsourcing, like enterprise crowdsourcing or mobile crowdsourcing. The models developed in the monograph enable platform providers to optimize their current systems and employers to optimize their workflows to increase their commercial success. Moreover, the results help to improve the general understanding of crowdsourcing systems, a key for identifying necessary adaptions and future improvements.
Bei der Durchführung öffentlicher Bauprojekte ist eine intensive Zusammenarbeit zwi¬schen vielen Beteiligten erforderlich: die in der Bauverwaltung des Bauherren angesiedelte Projektleitung, Bedarfsträger (z. B. Universität oder Be¬hörde), Gre-mien des Bauherrn (Kommunal-, Kreis- oder Bundesparlament), dessen Haus-haltsabteilung, Objekt- und Fachplaner (freiberuflich oder als Mitarbeiter der Bauverwaltung), Gutachter, Bauunternehmen, Lieferanten und Dienstleister, Raumordnungs-, Planfeststellungs- und Genehmigungsbehörden. Der Planungs-, Genehmigungs- und Realisationsprozess erstreckt sich meist über mehrere Jahre. Währenddessen ist ein intensiver Informations- und Kommunikationsaustausch zwischen den Beteiligten erforderlich. Baupläne, Leistungsverzeichnisse, Ange-bote, Verträge, Protokolle, Bauzeitenpläne und Rechnungen werden immer noch per E-Mail oder in Papierform ausgetauscht. Wegen der meist größeren Zahl zeit-gleich betreuter Bauprojekte führt dies bei fast allen Beteiligten regelmäßig zu einer herausfordernd großen Korrespondenz und einem als mangelhaft zu be-zeichnenden Überblick über die aktuellen Projektdaten.
Wegen der hochgradigen Interdependenz der Teilprozesse über alle Phasen hin-weg sind aber eine möglichst reibungslose Koordination und die ständige Verfüg-barkeit aktueller Daten bei allen Beteiligten unabdingbare Voraussetzungen, um eine Baumaßnahme zügig und im vorgesehenen Kostenrahmen auszuführen. Während Datenaustausch und Koordination bei großen gewerblichen Bauprojek-ten bereits mit Erfolg durch virtuelle Projekträume unterstützt werden, sind die öffentlichen Bauverwaltungen hier noch zögerlich. Die Erstellung eines einheitli-chen und prozessübergreifenden Datenmodells speziell für die Abläufe öffentli-cher Auftraggeber als Ziel der Arbeit könnte helfen, die Vorteile eines zentralen, für alle Beteiligten zugänglichen Datenbestandes auch für die Bauverwaltungen und ihre Projekte nutzbar zu machen und vormals getrennt gehaltene Datenbe-stände zu einem einzigen zusammenzuführen (Datenintegration). Die gründliche Analyse der Abläufe und Informationsflüsse zwischen den Beteiligten über alle Phasen eines öffentlichen Bauprojekts hinweg sowie eine Bestandsaufnahme der gegenwärtig am Markt verfügbaren virtuellen Projekträume im ersten Teil der Arbeit bilden die Grundlage für die Modellierung der Daten sowie ihrer Zusam-menhänge im zweiten Teil.
Mit der Gesamtdarstellung der Beteiligten, ihrer Rollen und Aufgaben, der Do-kumente und der zugehörigen Metadaten über alle Phasen und Baufachbereiche hinweg wurde ein neuer Forschungsbeitrag erarbeitet. Die unterschiedlichen Be-zeichnungen z. B. in Hoch- und Tiefbauprojekten wurden im Interesse der Ver-ständlichkeit erhalten, aber in einer gemeinsamen Struktur zusammengeführt. Diese Modellierung ist die Voraussetzung für eine verbesserte informationstech-nische Unterstützung öffentlicher Bauprojekte und zugleich die ureigenste Aufga-be des Wirtschaftsinformatikers als Mittler zwischen Anwendern und Entwick-lern.
Das in dieser Arbeit entwickelte Datenmodell erlaubt wegen seiner verwaltungs- und baufachbereichsübergreifenden Konzeption im Sinne eines Referenzmodells den Einsatz als Basis einer Standardanwendungssoftware, die mit geringem An-passungsaufwand bei einer großen Zahl an Kunden im öffentlichen Bereich einge-setzt werden kann. Beispiele sind Projektraumanwendungen sowie Workflow-Management-Systeme. Es ist zugleich ein Referenzvorschlag an die Entwickler bestehender Anwendungen zur Definition von Schnittstellen und schließlich zur Umsetzung applikationsübergreifender Integrationsansätze.
In unserem Alltag kommen wir heute ständig mit Systemen der Informations- und Kommunikationstechnik in Kontakt. Diese bestehen häufig aus mehreren interagierenden und kommunizierenden Komponenten, wie zum Beispiel nebenläufige Software zur effizienten Nutzung von Mehrkernprozessoren oder Sensornetzwerke. Systeme, die aus mehreren interagierenden und kommunizierenden Komponenten bestehen sind häufig komplex und dadurch sehr fehleranfällig. Daher ist es wichtig zuverlässige Methoden, die helfen die korrekte Funktionsweise solcher Systeme sicherzustellen, zu besitzen.
Im Rahmen dieser Doktorarbeit wurden neue Methoden zur Verbesserung der Verifizierbarkeit von asynchronen nebenläufigen Systemen durch Anwendung der symbolischen Modellprüfung mit binären Entscheidungsdiagrammen (BDDs) entwickelt. Ein asynchrones nebenläufiges System besteht aus mehreren Komponenten, von denen zu einem Zeitpunkt jeweils nur eine Komponente Transitionen ausführen kann. Die Modellprüfung ist eine Technik zur formalen Verifikation, bei der die Gültigkeit einer Menge von zu prüfenden Eigenschaften für eine gegebene Systembeschreibung automatisch durch Softwarewerkzeuge, die Modellprüfer genannt werden, entschieden wird. Das Hauptproblem der symbolischen Modellprüfung ist das Problem der Zustandsraumexplosion und es sind weitere Verbesserungen notwendig, um die symbolische Modellprüfung häufiger erfolgreich durchführen zu können.
Bei der BDD-basierten symbolischen Modellprüfung werden Mengen von Systemzuständen und Mengen von Transitionen jeweils durch BDDs repräsentiert. Zentrale Operationen bei ihr sind die Berechnung von Nachfolger- und Vorgängerzuständen von gegebenen Zustandsmengen, welche Bildberechnungen genannt werden. Um die Gültigkeit von Eigenschaften für eine gegebene Systembeschreibung zu überprüfen, werden wiederholt Bildberechnungen durchgeführt. Daher ist ihre effiziente Berechnung entscheidend für eine geringe Laufzeit und einen niedrigen Speicherbedarf der Modellprüfung. In einer Bildberechnung werden ein BDD zur Repräsentation einer Menge von Transitionen und ein BDD für eine Menge von Zuständen kombiniert, um eine Menge von Nachfolger- oder Vorgängerzuständen zu berechnen. Oft ist auch die Größe von BDDs zur Repräsentation der Transitionsrelation von Systemen entscheidend für die erfolgreiche Anwendbarkeit der Modellprüfung.
In der vorliegenden Arbeit werden neue Datenstrukturen zur Repräsentation der Transitionsrelation von asynchronen nebenläufigen Systemen bei der BDD-basierten symbolischen Modellprüfung vorgestellt. Zusätzlich werden neue Algorithmen zur Durchführung von Bildberechnungen präsentiert. Beides kann zu großen Reduktionen der Laufzeit und des Speicherbedarfs führen. Asynchrone nebenläufige Systeme besitzen häufig Symmetrien. Eine Technik zur Reduktion des Problems der Zustandsraumexplosion ist die Symmetriereduktion. In dieser Arbeit wird ebenfalls ein neuer effizienter Algorithmus zur Symmetriereduktion bei der symbolischen Modellprüfung mit BDDs aufgeführt.
Today's Internet is no longer only controlled by a single stakeholder, e.g. a standard body or a telecommunications company.
Rather, the interests of a multitude of stakeholders, e.g. application developers, hardware vendors, cloud operators, and network operators, collide during the development and operation of applications in the Internet.
Each of these stakeholders considers different KPIs to be important and attempts to optimise scenarios in its favour.
This results in different, often opposing views and can cause problems for the complete network ecosystem.
One example of such a scenario are Signalling Storms in the mobile Internet, with one of the largest occurring in Japan in 2012 due to the release and high popularity of a free instant messaging application.
The network traffic generated by the application caused a high number of connections to the Internet being established and terminated.
This resulted in a similarly high number of signalling messages in the mobile network, causing overload and a loss of service for 2.5 million users over 4 hours.
While the network operator suffers the largest impact of this signalling overload, it does not control the application.
Thus, the network operator can not change the application traffic characteristics to generate less network signalling traffic.
The stakeholders who could prevent, or at least reduce, such behaviour, i.e. application developers or hardware vendors, have no direct benefit from modifying their products in such a way.
This results in a clash of interests which negatively impacts the network performance for all participants.
The goal of this monograph is to provide an overview over the complex structures of stakeholder relationships in today's Internet applications in mobile networks.
To this end, we study different scenarios where such interests clash and suggest methods where tradeoffs can be optimised for all participants.
If such an optimisation is not possible or attempts at it might lead to adverse effects, we discuss the reasons.
Social interactions as introduced by Web 2.0 applications during the last decade have changed the way the Internet is used. Today, it is part of our daily lives to maintain contacts through social networks, to comment on the latest developments in microblogging services or to save and share information snippets such as photos or bookmarks online.
Social bookmarking systems are part of this development. Users can share links to interesting web pages by publishing bookmarks and providing descriptive keywords for them. The structure which evolves from the collection of annotated bookmarks is called a folksonomy. The sharing of interesting and relevant posts enables new ways of retrieving information from the Web. Users
can search or browse the folksonomy looking at resources related to specific tags or users. Ranking methods known from search engines have been adjusted to facilitate retrieval in social bookmarking systems. Hence, social bookmarking systems have become an alternative or addendum to search engines.
In order to better understand the commonalities and differences of social bookmarking systems and search engines, this thesis compares several aspects of the two systems' structure, usage behaviour and content. This includes the use of tags and query terms, the composition of the document collections and the rankings of bookmarks and search engine URLs. Searchers (recorded via session ids), their search terms and the clicked on URLs can be extracted from a search
engine query logfile. They form similar links as can be found in folksonomies where a user annotates a resource with tags. We use this analogy to build a tripartite hypergraph from query logfiles (a logsonomy), and compare structural and semantic properties of log- and folksonomies. Overall, we have found similar behavioural, structural and semantic characteristics in both systems. Driven by this insight, we investigate, if folksonomy data can be of use in web
information retrieval in a similar way to query log data: we construct training data from query logs and a folksonomy to build models for a learning-to-rank algorithm. First experiments show a positive correlation of ranking results generated from the ranking models of both systems. The research is based on various data collections from the social bookmarking systems BibSonomy and Delicious, Microsoft's search engine MSN (now Bing) and Google data.
To maintain social bookmarking systems as a good source for information retrieval, providers need to fight spam. This thesis introduces and analyses different features derived from the specific characteristics of social bookmarking systems to be used in spam detection classification algorithms. Best results can be derived from a combination of profile, activity, semantic and location-based features. Based on the experiments, a spam detection framework which identifies and eliminates spam activities for the social bookmarking system BibSonomy has been developed.
The storing and publication of user-related bookmarks and profile information raises questions about user data privacy. What kinds of personal information is collected and how do systems handle user-related items? In order to answer these questions, the thesis looks into the handling of data privacy in the social bookmarking system BibSonomy. Legal guidelines about how to deal with the private data collected and processed in social bookmarking systems are also presented. Experiments will show that the consideration of user data privacy in the process
of feature design can be a first step towards strengthening data privacy.
Within this thesis a new philosophy in monitoring spacecrafts is presented: the
unification of the various kinds of monitoring techniques used during the
different lifecylce phases of a spacecraft.
The challenging requirements being set for this monitoring framework are:
- "separation of concerns" as a design principle (dividing the steps of logging
from registered sources, sending to connected sinks and displaying of
information),
- usage during all mission phases,
- usage by all actors (EGSE engineers, groundstation operators, etc.),
- configurable at runtime, especially regarding the level of detail of logging
information, and
- very low resource consumption.
First a prototype of the monitoring framework was developed as a support library
for the real-time operating system
RODOS. This prototype was tested on dedicated hardware platforms relevant for
space, and also on a satellite demonstrator used for educational purposes.
As a second step, the results and lessons learned from the development and usage
of this prototype were transfered to a real space mission: the first satellite
of the DLR compact satellite series - a space based platform for DLR's own
research activities. Within this project, the software of the avionic subsystem
was supplemented by a powerful logging component, which enhances the traditional
housekeeping capabilities and offers extensive filtering and debugging
techniques for monitoring and FDIR needs. This logging component is the major
part of the flight version of the monitoring framework. It is completed by
counterparts running on the development computers and as well as the EGSE
hardware in the integration room, making it most valuable already in the
earliest stages of traditional spacecraft development.
Future plans in terms of adding support from the groundstation as well will lead
to a seamless integration of the monitoring framework not only into to the
spacecraft itself, but into the whole space system.
The general map-labeling problem is as follows: given a set of geometric objects to be labeled, or features, in the plane, and for each feature a set of label positions, maximize the number of placed labels such that there is at most one label per feature and no two labels overlap. There are three types of features in a map: point, line, and area features. Unfortunately, one cannot expect to find efficient algorithms that solve the labeling problem optimally.
Interactive maps are digital maps that only show a small part of the entire map whereas the user can manipulate the shown part, the view, by continuously panning, zooming, rotating, and tilting (that is, changing the perspective between a top and a bird view). An example for the application of interactive maps is in navigational devices. Interactive maps are challenging in that the labeling must be updated whenever labels leave the view and, while zooming, the label size must be constant on the screen (which either makes space for further labels or makes labels overlap when zooming in or out, respectively). These updates must be computed in real time, that is, the computation must be so fast that the user does not notice that we spend time on the computation. Additionally, labels must not jump or flicker, that is, labels must not suddenly change their positions or, while zooming out, a vanished label must not appear again.
In this thesis, we present efficient algorithms that dynamically label point and line features in interactive maps. We try to label as many features as possible while we prohibit labels that overlap, jump, and flicker. We have implemented all our approaches and tested them on real-world data. We conclude that our algorithms are indeed real-time capable.
The present paper describes an improved 4 DOF (x/y/z/yaw) vision based positioning solution for fully 6 DOF autonomous UAVs, optimised in terms of computation and development costs as well as robustness and performance. The positioning system combines Fourier transform-based image registration (Fourier Tracking) and differential optical flow computation to overcome the drawbacks of a single approach. The first method is capable of recognizing movement in four degree of freedom under variable lighting conditions, but suffers from low sample rate and high computational costs. Differential optical flow computation, on the other hand, enables a very high sample rate to gain control robustness. This method, however, is limited to translational movement only and performs poor in bad lighting conditions. A reliable positioning system for autonomous flights with free heading is obtained by fusing both techniques. Although the vision system can measure the variable altitude during flight, infrared and ultrasonic sensors are used for robustness. This work is part of the AQopterI8 project, which aims to develop an autonomous flying quadrocopter for indoor application and makes autonomous directed flight possible.
A number of public codes exist for GPS positioning and baseline determination in off-line mode. However, no software code exists for DGPS exploiting correction factors at base stations, without relying on double difference information. In order to accomplish it, a methodology is introduced in MATLAB environment for DGPS using C/A pseudoranges on single frequency L1 only to make it feasible for low-cost GPS receivers. Our base station is at accurately surveyed reference point. Pseudoranges and geometric ranges are compared at base station to compute the correction factors. These correction factors are then handed over to rover for all valid satellites observed during an epoch. The rover takes it into account for its own true position determination for corresponding epoch. In order to validate the proposed algorithm, our rover is also placed at a pre-determined location. The proposed code is an appropriate and simple to use tool for post-processing of GPS raw data for accurate position determination of a rover e.g. Unmanned Aerial Vehicle during post-mission analysis.
Die Dissertation „Ontologiebasiertes Cloud Computing“ im Fachbereich Wirtschaftsinformatik behandelt das Thema Cloud Computing und veranschaulicht die Möglichkeiten der theoretischen und praktischen Nutzung einer Ontologie für das Cloud Computing.
Neben den Private und Public Clouds sowie Hybrid-Lösungen wird vor allem eine ausgefeilte Virtualisierungstechnologie die Zukunft im IT-Bereich mitgestalten. Die Vielfalt und Anzahl der angebotenen Services nimmt besonders auf dem Sektor der Public Clouds weiterhin stark zu, während im Hybrid-Bereich ansprechende Lösungen noch ausstehen. Das Nutzen eines Cloud-Services ist in der Regel einfach und wird mit den fallenden Preisen zunehmend interessanter. Eine Reihe von Punkten, die im Vorfeld genau betrachtet und festgelegt werden müssen, wie Aspekte der IT-Sicherheit, des Datenschutzes und der Kosten, ermöglichen eine wirtschaftliche und rechtssichere Inanspruchnahme eines Cloud-Services. Vor der Nutzung eines Services müssen zudem der Wert, die Nutzungshäufigkeit und die Geheimhaltungsstufe der eigenen Daten bekannt sein, um sicher bestimmen zu können, ob alle Informationen oder nur ein Teil zum Auslagern geeignet sind. Dazu bedarf es einer klaren Festlegung der vertraglichen Rahmenbedingungen und einer Regelung bezüglich des Schadensersatzes bei einem Ausfall. Ein aktives Change Management sollte schon vor der Einführung eines Services Akzeptanz für die sich im IT-Umfeld ändernden Aufgabengebiete schaffen.
Vergleichbare Alternativen zu finden, dies war die Zielvorgabe der durchgeführten, breiten Untersuchung von 15 Serviceanbietern, verbunden mit dem Aufbau einer Ontologie. Auf einem sehr dynamischen Cloud Computing Markt können diese Untersuchungen natürlich nur eine Momentaufnahme abbilden, denn neue Provider etablieren sich, schon länger bestehende verändern und verbessern ihre Angebote. Damit diese Momentaufnahme nicht in einem statischen Endzustand verbleibt, wurde eine Ontologie aufgebaut, die die konsistente Einpflege veränderter Sachverhalte zulässt. Die Idealvorstellung ist es, dass beim Bekanntwerden einer neuen Information diese auch immer in die Ontologie einfließt. Die Anbieteruntersuchung zeigt, dass Cloud-Services heute schon ein hohes Potential haben. Um sich einen Gesamtüberblick über die unterschiedlichen Services und ihre Angebote zu verschaffen, ist eine Ontologie besonders geeignet.
Die aufgebaute Cloud-Ontologie beinhaltet eine Service-Auswahl, die auf die Literatur- und Anbieteruntersuchung aufbaut. Ähnlich einer Suchmaschine hilft sie, sich über bestehende Angebote auf dem Markt zu informieren. Und sie vereinfacht die Selektion, definiert klar bekannte technische Details, erleichtert die Suche z. B. nach benötigten Zusatzdienstleistungen über standardisierte Schnittstellen, versucht Transparenz und Nachvollziehbarkeit bei den Abrechnungsmodellen herzustellen, um eine Vergleichbarkeit überhaupt erst zu ermöglichen. Der größte Vorteil liegt in der Zeitersparnis: Die Recherche nach passenden Cloud-Services wird durch formalisierte und somit vergleichbare Kriterien verkürzt. Bei mehreren passenden Anbietern lässt sich über weitere Abfragen oder Kostenvergleiche der jeweils für den Nutzer beste Anbieter gezielt finden. Ebenso können Services mit signifikanten Ausschlusskriterien frühzeitig aus der Auswahl entfernt werden. Durch das Verbot bestimmter Zuweisungen oder durch die Forderung von Mindestbedingungen innerhalb der Ontologie wird die Einpflege falscher Sachverhalte verhindert und sie erweist sich damit wesentlich unempfindlicher als viele Programme. Die Aufgabenstellung bei der Modellerstellung lag darin, zu einer allgemeinen Aussagekraft der modellierten Abhängigkeiten zu kommen. Außerdem erfüllt die Cloud-Ontologie die vier typischen Anforderungen an eine Ontologie: Sie ist ausschließlich durch die standardisierte Sprache OWL beschrieben, kann durch einen Inferenzalgorithmus (z. B. Pellet) ausgewertet werden, unterscheidet eindeutig zwischen 80 Klassen und 342 Individuals und bildet zahlreiche Informationen über 2657 Verknüpfungen ab. Die Ontologie kann mit geringem Aufwand auch in ein Programm mit einer ansprechenden Oberfläche überführt werden, wie der programmierte Prototyp beweist.
In der Praxis müssen für Unternehmen verstärkt Hilfsmittel angeboten werden oder in den Vordergrund rücken, wie Cloud-Ontologien, die die Auswahl von Services erleichtern, Vergleiche erst ermöglichen, die Suche verkürzen und zum Schluss zu Ergebnissen führen, die den Vorstellungen des künftigen Nutzers entsprechen.
Mini Unmanned Aerial Vehicles (MUAVs) are becoming popular research platform and drawing considerable attention, particularly during the last decade due to their multi-dimensional applications in almost every walk of life. MUAVs range from simple toys found at electronic supermarkets for entertainment purpose to highly sophisticated commercial platforms performing novel assignments like offshore wind power station inspection and 3D modelling of buildings. This paper presents an overview of the main aspects in the domain of distributed control of cooperating MUAVs to facilitate the potential users in this fascinating field. Furthermore it gives an overview on state of the art in MUAV technologies e.g. Photonic Mixer Devices (PMD) camera, distributed control methods and on-going work and challenges, which is the motivation for many researchers all over the world to work in this field.
Bereits seit Anfang der 1990er Jahre wird jungen Wissenschaftlern im Vorfeld der Tagung "Wirtschaftsinformatik" ein Doctoral Consortium als unterstützendes Forum angeboten. Diese Einrichtung wurde auch zur größten Internationalen Konferenz der Wirtschaftsinformatik, der WI 2015 in Osnabrück fortgeführt. Dieser Band fasst die zum Vortag ausgewählten Beiträge zusammen.
Large volumes of data are collected today in many domains. Often, there is so much data available, that it is difficult to identify the relevant pieces of information. Knowledge discovery seeks to obtain novel, interesting and useful information from large datasets.
One key technique for that purpose is subgroup discovery. It aims at identifying descriptions for subsets of the data, which have an interesting distribution with respect to a predefined target concept. This work improves the efficiency and effectiveness of subgroup discovery in different directions.
For efficient exhaustive subgroup discovery, algorithmic improvements are proposed for three important variations of the standard setting: First, novel optimistic estimate bounds are derived for subgroup discovery with numeric target concepts. These allow for skipping the evaluation of large parts of the search space without influencing the results. Additionally, necessary adaptations to data structures for this setting are discussed. Second, for exceptional model mining, that is, subgroup discovery with a model over multiple attributes as target concept, a generic extension of the well-known FP-tree data structure is introduced. The modified data structure stores intermediate condensed data representations, which depend on the chosen model class, in the nodes of the trees. This allows the application for many popular model classes. Third, subgroup discovery with generalization-aware measures is investigated.
These interestingness measures compare the target share or mean value in the subgroup with the respective maximum value in all its generalizations. For this setting, a novel method for deriving optimistic estimates is proposed. In contrast to previous approaches, the novel measures are not exclusively based on the anti-monotonicity of instance coverage, but also takes the difference of coverage between the subgroup and its generalizations into account. In all three areas, the advances lead to runtime improvements of more than an order of magnitude.
The second part of the contributions focuses on the \emph{effectiveness} of subgroup discovery. These improvements aim to identify more interesting subgroups in practical applications. For that purpose, the concept of expectation-driven subgroup discovery is introduced as a new family of interestingness measures. It computes the score of a subgroup based on the difference between the actual target share and the target share that could be expected given the statistics for the separate influence factors that are combined to describe the subgroup.
In doing so, previously undetected interesting subgroups are discovered, while other, partially redundant findings are suppressed.
Furthermore, this work also approaches practical issues of subgroup discovery: In that direction, the VIKAMINE II tool is presented, which extends its predecessor with a rebuild user interface, novel algorithms for automatic discovery, new interactive mining techniques, as well novel options for result presentation and introspection. Finally, some real-world applications are described that utilized the presented techniques. These include the identification of influence factors on the success and satisfaction of university students and the description of locations using tagging data of geo-referenced images.
Performance Assessment of Resource Management Strategies for Cellular and Wireless Mesh Networks
(2015)
The rapid growth in the field of communication networks has been truly amazing in the last decades. We are currently experiencing a continuation thereof with an increase in traffic and the emergence of new fields of application. In particular, the latter is interesting since due to advances in the networks and new devices, such as smartphones, tablet PCs, and all kinds of Internet-connected devices, new additional applications arise from different areas. What applies for all these services is that they come from very different directions and belong to different user groups. This results in a very heterogeneous application mix with different requirements and needs on the access networks.
The applications within these networks typically use the network technology as a matter of course, and expect that it works in all situations and for all sorts of purposes without any further intervention. Mobile TV, for example, assumes that the cellular networks support the streaming of video data. Likewise, mobile-connected electricity meters rely on the timely transmission of accounting data for electricity billing. From the perspective of the communication networks, this requires not only the technical realization for the individual case, but a broad consideration of all circumstances and all requirements of special devices and applications of the users.
Such a comprehensive consideration of all eventualities can only be achieved by a dynamic, customized, and intelligent management of the transmission resources. This management requires to exploit the theoretical capacity as much as possible while also taking system and network architecture as well as user and application demands into account. Hence, for a high level of customer satisfaction, all requirements of the customers and the applications need to be considered, which requires a multi-faceted resource management.
The prerequisite for supporting all devices and applications is consequently a holistic resource management at different levels. At the physical level, the technical possibilities provided by different access technologies, e.g., more transmission antennas, modulation and coding of data, possible cooperation between network elements, etc., need to be exploited on the one hand. On the other hand, interference and changing network conditions have to be counteracted at physical level. On the application and user level, the focus should be on the customer demands due to the currently increasing amount of different devices and diverse applications (medical, hobby, entertainment, business, civil protection, etc.).
The intention of this thesis is the development, investigation, and evaluation of a holistic resource management with respect to new application use cases and requirements for the networks. Therefore, different communication layers are investigated and corresponding approaches are developed using simulative methods as well as practical emulation in testbeds. The new approaches are designed with respect to different complexity and implementation levels in order to cover the design space of resource management in a systematic way. Since the approaches cannot be evaluated generally for all types of access networks, network-specific use cases and evaluations are finally carried out in addition to the conceptual design and the modeling of the scenario.
The first part is concerned with management of resources at physical layer. We study distributed resource allocation approaches under different settings. Due to the ambiguous performance objectives, a high spectrum reuse is conducted in current cellular networks. This results in possible interference between cells that transmit on the same frequencies. The focus is on the identification of approaches that are able to mitigate such interference.
Due to the heterogeneity of the applications in the networks, increasingly different application-specific requirements are experienced by the networks. Consequently, the focus is shifted in the second part from optimization of network parameters to consideration and integration of the application and user needs by adjusting network parameters. Therefore, application-aware resource management is introduced to enable efficient and customized access networks.
As indicated before, approaches cannot be evaluated generally for all types of access networks. Consequently, the third contribution is the definition and realization of the application-aware paradigm in different access networks. First, we address multi-hop wireless mesh networks. Finally, we focus with the fourth contribution on cellular networks. Application-aware resource management is applied here to the air interface between user device and the base station. Especially in cellular networks, the intensive cost-driven competition among the different operators facilitates the usage of such a resource management to provide cost-efficient and customized networks with respect to the running applications.
The first part of this thesis deals with the approximability of the traveling salesman problem. This problem is defined on a complete graph with edge weights, and the task is to find a Hamiltonian cycle of minimum weight that visits each vertex exactly once. We study the most important multiobjective variants of this problem. In the multiobjective case, the edge weights are vectors of natural numbers with one component for each objective, and since weight vectors are typically incomparable, the optimal Hamiltonian cycle does not exist. Instead we consider the Pareto set, which consists of those Hamiltonian cycles that are not dominated by some other, strictly better Hamiltonian cycles. The central goal in multiobjective optimization and in the first part of this thesis in particular is the approximation of such Pareto sets.
We first develop improved approximation algorithms for the two-objective metric traveling salesman problem on multigraphs and for related Hamiltonian path problems that are inspired by the single-objective Christofides' heuristic. We further show arguments indicating that our algorithms are difficult to improve. Furthermore we consider multiobjective maximization versions of the traveling salesman problem, where the task is to find Hamiltonian cycles with high weight in each objective. We generalize single-objective techniques to the multiobjective case, where we first compute a cycle cover with high weight and then remove an edge with low weight in each cycle. Since weight vectors are often incomparable, the choice of the edges of low weight is non-trivial. We develop a general lemma that solves this problem and enables us to generalize the single-objective maximization algorithms to the multiobjective case. We obtain improved, randomized approximation algorithms for the multiobjective maximization variants of the traveling salesman problem. We conclude the first part by developing deterministic algorithms for these problems.
The second part of this thesis deals with redundancy properties of complete sets. We call a set autoreducible if for every input instance x we can efficiently compute some y that is different from x but that has the same membership to the set. If the set can be split into two equivalent parts, then it is called weakly mitotic, and if the splitting is obtained by an efficiently decidable separator set, then it is called mitotic. For different reducibility notions and complexity classes, we analyze how redundant its complete sets are.
Previous research in this field concentrates on polynomial-time computable reducibility notions. The main contribution of this part of the thesis is a systematic study of the redundancy properties of complete sets for typical complexity classes and reducibility notions that are computable in logarithmic space. We use different techniques to show autoreducibility and mitoticity that depend on the size of the complexity class and the strength of the reducibility notion considered. For small complexity classes such as NL and P we use self-reducible, complete sets to show that all complete sets are autoreducible. For large complexity classes such as PSPACE and EXP we apply diagonalization methods to show that all complete sets are even mitotic. For intermediate complexity classes such as NP and the remaining levels of the polynomial-time hierarchy we establish autoreducibility of complete sets by locally checking computational transcripts. In many cases we can show autoreducibility of complete sets, while mitoticity is not known to hold. We conclude the second part by showing that in some cases, autoreducibility of complete sets at least implies weak mitoticity.
Background
Information about genes, transcripts and proteins is spread over a wide variety of databases. Different tools have been developed using these databases to identify biological signals in gene lists from large scale analysis. Mostly, they search for enrichments of specific features. But, these tools do not allow an explorative walk through different views and to change the gene lists according to newly upcoming stories.
Results
To fill this niche, we have developed ISAAC, the InterSpecies Analysing Application using Containers. The central idea of this web based tool is to enable the analysis of sets of genes, transcripts and proteins under different biological viewpoints and to interactively modify these sets at any point of the analysis. Detailed history and snapshot information allows tracing each action. Furthermore, one can easily switch back to previous states and perform new analyses. Currently, sets can be viewed in the context of genomes, protein functions, protein interactions, pathways, regulation, diseases and drugs. Additionally, users can switch between species with an automatic, orthology based translation of existing gene sets. As todays research usually is performed in larger teams and consortia, ISAAC provides group based functionalities. Here, sets as well as results of analyses can be exchanged between members of groups.
Conclusions
ISAAC fills the gap between primary databases and tools for the analysis of large gene lists. With its highly modular, JavaEE based design, the implementation of new modules is straight forward. Furthermore, ISAAC comes with an extensive web-based administration interface including tools for the integration of third party data. Thus, a local installation is easily feasible. In summary, ISAAC is tailor made for highly explorative interactive analyses of gene, transcript and protein sets in a collaborative environment.
Context-specific Consistencies in Information Extraction: Rule-based and Probabilistic Approaches
(2015)
Large amounts of communication, documentation as well as knowledge and information are stored in textual documents. Most often, these texts like webpages, books, tweets or reports are only available in an unstructured representation since they are created and interpreted by humans. In order to take advantage of this huge amount of concealed information and to include it in analytic processes, it needs to be transformed into a structured representation. Information extraction considers exactly this task. It tries to identify well-defined entities and relations in unstructured data and especially in textual documents.
Interesting entities are often consistently structured within a certain context, especially in semi-structured texts. However, their actual composition varies and is possibly inconsistent among different contexts. Information extraction models stay behind their potential and return inferior results if they do not consider these consistencies during processing. This work presents a selection of practical and novel approaches for exploiting these context-specific consistencies in information extraction tasks. The approaches direct their attention not only to one technique, but are based on handcrafted rules as well as probabilistic models.
A new rule-based system called UIMA Ruta has been developed in order to provide optimal conditions for rule engineers. This system consists of a compact rule language with a high expressiveness and strong development support. Both elements facilitate rapid development of information extraction applications and improve the general engineering experience, which reduces the necessary efforts and costs when specifying rules.
The advantages and applicability of UIMA Ruta for exploiting context-specific consistencies are illustrated in three case studies. They utilize different engineering approaches for including the consistencies in the information extraction task. Either the recall is increased by finding additional entities with similar composition, or the precision is improved by filtering inconsistent entities. Furthermore, another case study highlights how transformation-based approaches are able to correct preliminary entities using the knowledge about the occurring consistencies.
The approaches of this work based on machine learning rely on Conditional Random Fields, popular probabilistic graphical models for sequence labeling. They take advantage of a consistency model, which is automatically induced during processing the document. The approach based on stacked graphical models utilizes the learnt descriptions as feature functions that have a static meaning for the model, but change their actual function for each document. The other two models extend the graph structure with additional factors dependent on the learnt model of consistency. They include feature functions for consistent and inconsistent entities as well as for additional positions that fulfill the consistencies.
The presented approaches are evaluated in three real-world domains: segmentation of scientific references, template extraction in curricula vitae, and identification and categorization of sections in clinical discharge letters. They are able to achieve remarkable results and provide an error reduction of up to 30% compared to usually applied techniques.
Die Grundlage für eine hohe Bestandsgenauigkeit ist die unternehmensübergreifende Identifikation und Nachverfolgung von Waren, die mit automatisierten Identifizierungstechnologien (Auto-ID-Technologien) ermöglicht wird. Die Einführung der Auto-ID-Technologie des Barcodes hat die Industrie vor mehr als 30 Jahren fundamental verändert. Darauf aufbauend versprechen neuere Auto-ID-Technologien wie die „Radio Frequency Identification“ (RFID) Probleme wie die Nichtverfügbarkeit von Waren, eine intransparente Diebstahlrate oder Warenschwund durch eine bessere Nachverfolgung aller Waren und eine höhere Bestandsgenauigkeit zu lösen. Die Vorteile von RFID gegenüber dem Barcode sind unter anderem die höhere Datendichte, die größere Robustheit gegenüber Umwelteinflüssen sowie die schnellere und mehrfache Erfassung von Gegenständen.
Viele Unternehmen sehen sich jedoch vor allem nach der Implementierung einer RFID-Infrastruktur mit einer Vielzahl von Problemen konfrontiert. Aspekte wie wenig Unterstützung durch das Management, interner Widerstand durch Mitarbeiter, Probleme bei der Integration von Hardware und Software und vor allem eine mangelnde Datenqualität verhindern, dass die prognostizierten positiven Effekte erreicht werden können. Derartige Phänomene werden passend unter dem Begriff „Credibility Gap“ zusammengefasst. Dieser beschreibt die Problematik, dass es insgesamt an Verfahren, Methoden und gezielter Unterstützung mangelt, um die in der Literatur umfangreich versprochenen positiven Effekte tatsächlich und nachhaltig zu realisieren. Passend werden die erwarteten Einsparungen und Verbesserungen durch den RFID-Einsatz oftmals als Expertenschätzungen und sogar als größtenteils rein spekulativ bezeichnet.
Das Ziel dieser Dissertation ist es, Praktikern das Erreichen der positiven RFID-Effekte zu ermöglichen. Hierzu wurden vielfältige Untersuchungen auf Basis einer langfristigen Kooperation mit einem der weltweit größten Bekleidungshändler durchgeführt, indem ein RFID-Implementierungsprojekt begleitet und intensiv mitgestaltet wurde. Zunächst wird bestätigt, dass die prognostizierten Vorteile der RFID-Technologie tatsächlich nicht allein durch die Implementierung der benötigten Infrastruktur erreicht werden können. Als Grund werden hohe Bestandsungenauigkeiten der verwendeten Bestandssysteme identifiziert, die sowohl auf technische als auch auf menschlich verursachte Fehler zurückzuführen sind. Als Folge ist die RFID-Datenqualität nicht verlässlich.
Die Dissertation setzt an den Problemen des Credibility Gap an und diagnostiziert bei einer bereits implementierten RFID-Infrastruktur zunächst die Fehler und Ursachen der mangelnden Datenqualität. Darauf aufbauend werden Maßnahmen und Handlungsanweisungen vorgestellt, mit deren Hilfe die Fehler behoben und die Infrastruktur schließlich verbessert und überwacht werden kann.
Um insgesamt die Anforderungen der Praxis und der Wissenschaft erfolgreich miteinander zu verknüpfen, wird als Forschungsmethode eine neuartige Kombination zweier Ausprägungen der Aktionsforschung verwendet. Als Ergebnis werden einerseits für Praktiker hilfreiche Frameworks und Tests zur Fehlerbehebung, Überwachungskennzahlen sowie Regeln des effektiven RFID-Systemmanagements beschrieben. Alle durchgeführten und in der Dissertation vorgestellten Maßnahmen führen nachweislich zu einer erhöhten Datenqualität eines implementierten RFID-Systems und stellen Möglichkeiten zur kennzahlenbasierten Visualisierung der RFID-Prozessperformance bereit. Andererseits wird ein Modell für die Verwendung der Aktionsforschung vorgeschlagen sowie eine umfangreiche Validierung der Methodik durchgeführt. Auf diese Weise wird neben der Praxisrelevanz der Ergebnisse auch die Präzision der Forschungsergebnisse sichergestellt.
Sämtliche Ergebnisse dienen als Basis für vielfältige Forschungsansätze. So ermöglichen eine höhere Verlässlichkeit und Datenqualität der RFID-Informationen aussagekräftigere Analysen. Weiter sind durch fehlerkorrigierte Prozessdaten neuartige Methoden des RFID-Data-Mining denkbar. Dieser Forschungsbereich ist nach wie vor größtenteils unberührt und bietet enormes Potential, weitere durch RFID in Aussicht gestellte Vorteile zu realisieren.
This dissertation presents controller design methodologies for a formation of cooperative mobile robots to perform trajectory tracking and convoy protection tasks. Two major problems related to multi-agent formation control are addressed, namely the time-delay and optimality problems. For the task of trajectory tracking, a leader-follower based system structure is adopted for the controller design, where the selection criteria for controller parameters are derived through analyses of characteristic polynomials. The resulting parameters ensure the stability of the system and overcome the steady-state error as well as the oscillation behavior under time-delay effect. In the convoy protection scenario, a decentralized coordination strategy for balanced deployment of mobile robots is first proposed. Based on this coordination scheme, optimal controller parameters are generated in both centralized and decentralized fashion to achieve dynamic convoy protection in a unified framework, where distributed optimization technique is applied in the decentralized strategy. This unified framework takes into account the motion of the target to be protected, and the desired system performance, for instance, minimal energy to spend, equal inter-vehicle distance to keep, etc.
Both trajectory tracking and convoy protection tasks are demonstrated through simulations and real-world hardware experiments based on the robotic equipment at Department of Computer Science VII, University of Würzburg.
Zahlreiche Digitalisierungsprojekte machen das Wissen vergangener Jahrhunderte jederzeit verfügbar. Das volle Potenzial der Digitalisierung von Dokumenten entfaltet sich jedoch erst, wenn diese als durchsuchbare Volltexte verfügbar gemacht werden. Mithilfe von OCR-Software kann die Erfassung weitestgehend automatisiert werden. Fraktur war ab dem 16. Jahrhundert bis zur Mitte des 20. Jahrhunderts die verbreitete Schrift des deutschen Sprachraums. Durch einige Besonderheiten von Fraktur bleiben die Erkennungsraten bei Frakturtexten aber meist deutlich hinter den Erkennungsergebnissen bei Antiquatexten zurück.
Diese Arbeit konzentriert sich auf die Verbesserung der Erkennungsergebnisse der OCR-Software Tesseract bei Frakturtexten. Dazu wurden die Software und bestehende Sprachpakete gesondert auf die Eigenschaften von Fraktur hin analysiert. Durch spezielles Training und Anpassungen an der Software wurde anschließend versucht, die Ergebnisse zu verbessern und Erkenntnisse über die Effektivität verschiedener Ansätze zu gewinnen.
Die Zeichenfehlerraten konnten durch verschiedene Experimente von zuvor 2,5 Prozent auf 1,85 Prozent gesenkt werden. Außerdem werden Werkzeuge vorgestellt, die das Training neuer Schriftarten für Tesseract erleichtern und eine Evaluation der erzielten Verbesserungen ermöglichen.
Bei Lernprozessen spielt das Anwenden der zu erlernenden Tätigkeit eine wichtige Rolle. Im Kontext der Ausbildung an Schulen und Hochschulen bedeutet dies, dass es wichtig ist, Schülern und Studierenden ausreichend viele Übungsmöglichkeiten anzubieten. Die von Lehrpersonal bei einer "Korrektur" erstellte Rückmeldung, auch Feedback genannt, ist jedoch teuer, da der zeitliche Aufwand je nach Art der Aufgabe beträchtlich ist.
Eine Lösung dieser Problematik stellen E-Learning-Systeme dar. Geeignete Systeme können nicht nur Lernstoff präsentieren, sondern auch Übungsaufgaben anbieten und nach deren Bearbeitung quasi unmittelbar entsprechendes Feedback generieren. Es ist jedoch im Allgemeinen nicht einfach, maschinelle Verfahren zu implementieren, die Bearbeitungen von Übungsaufgaben korrigieren und entsprechendes Feedback erstellen. Für einige Aufgabentypen, wie beispielsweise Multiple-Choice-Aufgaben, ist dies zwar trivial, doch sind diese vor allem dazu gut geeignet, sogenanntes Faktenwissen abzuprüfen. Das Einüben von Lernzielen im Bereich der Anwendung ist damit kaum möglich.
Die Behandlung dieser nach gängigen Taxonomien höheren kognitiven Lernziele erlauben sogenannte offene Aufgabentypen, deren Bearbeitung meist durch die Erstellung eines Freitexts in natürlicher Sprache erfolgt. Die Information bzw. das Wissen, das Lernende eingeben, liegt hier also in sogenannter „unstrukturierter“ Form vor. Dieses unstrukturierte Wissen ist maschinell nur schwer verwertbar, sodass sich Trainingssysteme, die Aufgaben dieser Art stellen und entsprechende Rückmeldung geben, bisher nicht durchgesetzt haben. Es existieren jedoch auch offene Aufgabentypen, bei denen Lernende das Wissen in strukturierter Form eingeben, so dass es maschinell leichter zu verwerten ist. Für Aufgaben dieser Art lassen sich somit Trainingssysteme erstellen, die eine gute Möglichkeit darstellen, Schülern und Studierenden auch für praxisnahe Anwendungen viele Übungsmöglichkeiten zur Verfügung zu stellen, ohne das Lehrpersonal zusätzlich zu belasten.
In dieser Arbeit wird beschrieben, wie bestimmte Eigenschaften von Aufgaben ausgenutzt werden, um entsprechende Trainingssysteme konzipieren und implementieren zu können. Es handelt sich dabei um Aufgaben, deren Lösungen strukturiert und maschinell interpretierbar sind.
Im Hauptteil der Arbeit werden vier Trainingssysteme bzw. deren Komponenten beschrieben und es wird von den Erfahrungen mit deren Einsatz in der Praxis berichtet: Eine Komponente des Trainingssystems „CaseTrain“ kann Feedback zu UML Klassendiagrammen erzeugen. Das neuartige Trainingssystem „WARP“ generiert zu UML Aktivitätsdiagrammen Feedback in mehreren Ebenen, u.a. indem es das durch Aktivitätsdiagramme definierte Verhalten von Robotern in virtuellen Umgebungen visualisiert. Mit „ÜPS“ steht ein Trainingssystem zur Verfügung, mit welchem die Eingabe von SQL-Anfragen eingeübt werden kann. Eine weitere in „CaseTrain“ implementierte Komponente für Bildmarkierungsaufgaben ermöglicht eine unmittelbare, automatische Bewertung entsprechender Aufgaben.
Die Systeme wurden im Zeitraum zwischen 2011 und 2014 an der Universität Würzburg in Vorlesungen mit bis zu 300 Studierenden eingesetzt und evaluiert. Die Evaluierung ergab eine hohe Nutzung und eine gute Bewertung der Studierenden der eingesetzten Konzepte, womit belegt wurde, dass elektronische Trainingssysteme für offene Aufgaben in der Praxis eingesetzt werden können.
Die Extraktion von Metadaten aus historischen Dokumenten ist eine zeitintensive, komplexe und höchst fehleranfällige Tätigkeit, die üblicherweise vom menschlichen Experten übernommen werden muss. Sie ist jedoch notwendig, um Bezüge zwischen Dokumenten herzustellen, Suchanfragen zu historischen Ereignissen korrekt zu beantworten oder semantische Verknüpfungen aufzubauen. Um den manuellen Aufwand dieser Aufgabe reduzieren zu können, sollen Verfahren der Named Entity Recognition angewendet werden. Die Klassifikation von Termen in historischen Handschriften stellt jedoch eine große Herausforderung dar, da die Domäne eine hohe Schreibweisenvarianz durch unter anderem nur konventionell vereinbarte Orthographie mit sich bringt. Diese Arbeit stellt Verfahren vor, die auch in komplexen syntaktischen Umgebungen arbeiten können, indem sie auf Informationen aus dem Kontext der zu klassifizierenden Terme zurückgreifen und diese mit domänenspezifischen Heuristiken kombinieren. Weiterhin wird evaluiert, wie die so gewonnenen Metadaten genutzt werden können, um in Workflow-Systemen zur Digitalisierung historischer Handschriften Mehrwerte durch Heuristiken zur Produktionsfehlererkennung zu erzielen.
Despite the internet's dynamic and collaborative nature, scientists continue to produce grant proposals, lab notebooks, data files, conclusions etc. that stay in static formats or are not published online and therefore not always easily accessible to the interested public. Because of limited adoption of tools that seamlessly integrate all aspects of a research project (conception, data generation, data evaluation, peerreviewing and publishing of conclusions), much effort is later spent on reproducing or reformatting individual entities before they can be repurposed independently or as parts of articles.
We propose that workflows - performed both individually and collaboratively - could potentially become more efficient if all steps of the research cycle were coherently represented online and the underlying data were formatted, annotated and licensed for reuse. Such a system would accelerate the process of taking projects from conception to publication stages and allow for continuous updating of the data sets and their interpretation as well as their integration into other independent projects.
A major advantage of such work ows is the increased transparency, both with respect to the scientific process as to the contribution of each participant. The latter point is important from a perspective of motivation, as it enables the allocation of reputation, which creates incentives for scientists to contribute to projects. Such work ow platforms offering possibilities to fine-tune the accessibility of their content could gradually pave the path from the current static mode of research presentation into a more coherent practice of open science.
Scientific research is a process concerned with the creation, collective accumulation, contextualization, updating and maintenance of knowledge. Wikis provide an environment that allows to collectively accumulate, contextualize, update and maintain knowledge in a coherent and transparent fashion. Here, we examine the potential of wikis as platforms for scholarly publishing. In the hope to stimulate further discussion, the article itself was drafted on Species-ID – a wiki that hosts a prototype for wiki-based scholarly publishing – where it can be updated, expanded or otherwise improved.
Streaming of videos has become the major traffic generator in today's Internet and the video traffic share is still increasing. According to Cisco's annual Visual Networking Index report, in 2012, 60% of the global Internet IP traffic was generated by video streaming services. Furthermore, the study predicts further increase to 73% by 2017. At the same time, advances in the fields of mobile communications and embedded devices lead to a widespread adoption of Internet video enabled mobile and wireless devices (e.g. Smartphones). The report predicts that by 2017, the traffic originating from mobile and wireless devices will exceed the traffic from wired devices and states that mobile video traffic was the source of roughly half of the mobile IP traffic at the end of 2012.
With the increasing importance of Internet video streaming in today's world, video content provider find themselves in a highly competitive market where user expectations are high and customer loyalty depends strongly on the user's satisfaction with the provided service. In particular paying customers expect their viewing experience to be the same across all their viewing devices and independently of their currently utilized Internet access technology. However, providing video streaming services is costly in terms of storage space, required bandwidth and generated traffic. Therefore, content providers face a trade-off between the user perceived Quality of Experience (QoE) and the costs for providing the service.
Today, a variety of transport and application protocols exist for providing video streaming services, but the one utilized depends on the scenario in mind. Video streaming services can be divided up in three categories: Video conferencing, IPTV and Video-on-Demand services. IPTV and video-conferencing have severe real-time constraints and thus utilize mostly datagram-based protocols like the RTP/UDP protocol for the video transmission. Video-on-Demand services in contrast can profit from pre-encoded content, buffers at the end user's device, and mostly utilize TCP-based protocols in combination with progressive streaming for the media delivery.
In recent years, the HTTP protocol on top of the TCP protocol gained widespread popularity as a cost-efficient way to distribute pre-encoded video content to customers via progressive streaming. This is due to the fact that HTTP-based video streaming profits from a well-established infrastructure which was originally implemented to efficiently satisfy the increasing demand for web browsing and file downloads. Large Content Delivery Networks (CDN) are the key components of that distribution infrastructure. CDNs prevent expensive long-haul data traffic and delays by distributing HTTP content to world-wide locations close to the customers. As of 2012, already 53% of the global video traffic in the Internet originates from Content Delivery Networks and that percentage is expected to increase to 65% by the year 2017. Furthermore, HTTP media streaming profits from existing HTTP caching infrastructure, ease of NAT and proxy traversal and firewall friendliness.
Video delivery through heterogeneous wired and wireless communications networks is prone to distortions due to insufficient network resources. This is especially true in wireless scenarios, where user mobility and insufficient signal strength can result in a very poor transport service performance (e.g. high packet loss, delays and low and varying bandwidth). A poor performance of the transport in turn may degrade the Quality of Experience as perceived by the user, either due to buffer underruns (i.e. playback interruptions) for TCP-based delivery or image distortions for datagram-based real-time video delivery.
In order to overcome QoE degradations due to insufficient network resources, content provider have to consider adaptive video streaming. One possibility to implement this for HTTP/TCP streaming is by partitioning the content into small segments, encode the segments into different quality levels and provide access to the segments and the quality level details (e.g. resolution, average bitrate). During the streaming session, a client-centric adaptation algorithm can use the supplied details to adapt the playback to the current environment. However, a lack of a common HTTP adaptive streaming standard led to multiple proprietary solutions developed by major Internet companies like Microsoft (Smooth Streaming), Apple (HTTP Live Streaming) and Adobe (HTTP Dynamic Streaming) loosely based on the aforementioned principle. In 2012, the ISO/IEC published the Dynamic Adaptive Streaming over HTTP (MPEG-DASH) standard. As of today, DASH is becoming widely accepted with major companies announcing their support or having already implemented the standard into their products. MPEG-DASH is typically used with single layer codecs like H.264/AVC, but recent publications show that scalable video coding can use the existing HTTP infrastructure more efficiently. Furthermore, the layered approach of scalable video coding extends the adaptation options for the client, since already downloaded segments can be enhanced at a later time.
The influence of distortions on the perceived QoE for non-adaptive video streaming are well reviewed and published. For HTTP streaming, the QoE of the user is influenced by the initial delay (i.e. the time the client pre-buffers video data) and the length and frequency of playback interruptions due to a depleted video playback buffer. Studies highlight that even low stalling times and frequencies have a negative impact on the QoE of the user and should therefore be avoided. The first contribution of this thesis is the identification of QoE influence factors of adaptive video streaming by the means of crowd-sourcing and a laboratory study.
MPEG-DASH does not specify how to adapt the playback to the available bandwidth and therefore the design of a download/adaptation algorithm is left to the developer of the client logic. The second contribution of this thesis is the design of a novel user-centric adaption logic for DASH with SVC. Other download algorithms for segmented HTTP streaming with single layer and scalable video coding have been published lately. However, there is little information about the behavior of these algorithms regarding the identified QoE-influence factors. The third contribution is a user-centric performance evaluation of three existing adaptation algorithms and a comparison to the proposed algorithm. In the performance evaluation we also evaluate the fairness of the algorithms. In one fairness scenario, two clients deploy the same adaptation algorithm and share one Internet connection. For a fair adaptation algorithm, we expect the behavior of the two clients to be identical. In a second fairness scenario, one client shares the Internet connection with a large HTTP file download and we expect an even bandwidth distribution between the video streaming and the file download. The forth contribution of this thesis is an evaluation of the behavior of the algorithms in a two-client and HTTP cross traffic scenario.
The remainder of this thesis is structured as follows. Chapter II gives a brief introduction to video coding with H.264, the HTTP adaptive streaming standard MPEG-DASH, the investigated adaptation algorithms and metrics of Quality of Experience (QoE) for video streaming. Chapter III presents the methodology and results of the subjective studies conducted in the course of this thesis to identify the QoE influence factors of adaptive video streaming. In Chapter IV, we introduce the proposed adaptation algorithm and the methodology of the performance evaluation. Chapter V highlights the results of the performance evaluation and compares the investigated adaptation algorithms. Section VI summarizes the main findings and gives an outlook towards QoE-centric management of DASH with SVC.
Die Universitätsbibliothek Würzburg hat für ihre umfangreiche Sammlung alter Würzburger Hochschulschriften einen Katalog erarbeitet, der hauptsächlich Dissertationen und Thesen verzeichnet, aber auch andere Prüfungsarbeiten, die für den Erwerb unterschiedlicher akademischer Grade und Titel ausgearbeitet und publiziert worden sind. Dies ist der 2. Band der Nachweise für die Jahre 1804 bis 1885 mit 2510 Titeln.
Die Universitätsbibliothek Würzburg hat für ihre umfangreiche Sammlung alter Würzburger Hochschulschriften einen Katalog erarbeitet, der hauptsächlich Dissertationen und Thesen verzeichnet, aber auch andere Prüfungsarbeiten, die für den Erwerb unterschiedlicher akademischer Grade und Titel ausgearbeitet und publiziert worden sind und die aus der fürstbischöflichen Zeit unserer Universität stammen (1582 - 1803).
Viele Studierende der Geschichte und anderer Geisteswissenschaften streben das Lehramt an. Darin Fuß zu fassen, wird in den kommenden Jahren immer schwieriger. Andere Studierende haben sogar überhaupt keine Vorstellungen von ihrer beruflichen Zukunft. Dieser Leitfaden möchte Orientierung bei der Berufswahl vermitteln und mit Hilfe von Experten Perspektiven eröffnen.
No abstract available
In a nice assay published in Nature in 1993 the physicist Richard God III started from a human observer and made a number of witty conclusions about our future prospects giving estimates for the existence of the Berlin Wall, the human race and all the rest of the universe. In the same spirit, we derive implications for "the meaning of life, the universe and all the rest" from few principles. Adams´ absurd answer "42" tells the lesson "garbage in / garbage out" - or suggests that the question is non calculable. We show that experience of "meaning" and to decide fundamental questions which can not be decided by formal systems imply central properties of life: Ever higher levels of internal representation of the world and an escalating tendency to become more complex. An observer, "collecting observations" and three measures for complexity are examined. A theory on living systems is derived focussing on their internal representation of information. Living systems are more complex than Kolmogorov complexity ("life is NOT simple") and overcome decision limits (Gödel theorem) for formal systems as illustrated for cell cycle. Only a world with very fine tuned environments allows life. Such a world is itself rather complex and hence excessive large in its space of different states – a living observer has thus a high probability to reside in a complex and fine tuned universe.
No abstract available