OPUS Würzburg | Institut für Informatik

Institut für Informatik

372 search hits

291 to 300

Sort by

Novel Techniques for Efficient and Effective Subgroup Discovery (2014)

Large volumes of data are collected today in many domains. Often, there is so much data available, that it is difficult to identify the relevant pieces of information. Knowledge discovery seeks to obtain novel, interesting and useful information from large datasets. One key technique for that purpose is subgroup discovery. It aims at identifying descriptions for subsets of the data, which have an interesting distribution with respect to a predefined target concept. This work improves the efficiency and effectiveness of subgroup discovery in different directions. For efficient exhaustive subgroup discovery, algorithmic improvements are proposed for three important variations of the standard setting: First, novel optimistic estimate bounds are derived for subgroup discovery with numeric target concepts. These allow for skipping the evaluation of large parts of the search space without influencing the results. Additionally, necessary adaptations to data structures for this setting are discussed. Second, for exceptional model mining, that is, subgroup discovery with a model over multiple attributes as target concept, a generic extension of the well-known FP-tree data structure is introduced. The modified data structure stores intermediate condensed data representations, which depend on the chosen model class, in the nodes of the trees. This allows the application for many popular model classes. Third, subgroup discovery with generalization-aware measures is investigated. These interestingness measures compare the target share or mean value in the subgroup with the respective maximum value in all its generalizations. For this setting, a novel method for deriving optimistic estimates is proposed. In contrast to previous approaches, the novel measures are not exclusively based on the anti-monotonicity of instance coverage, but also takes the difference of coverage between the subgroup and its generalizations into account. In all three areas, the advances lead to runtime improvements of more than an order of magnitude. The second part of the contributions focuses on the \emph{effectiveness} of subgroup discovery. These improvements aim to identify more interesting subgroups in practical applications. For that purpose, the concept of expectation-driven subgroup discovery is introduced as a new family of interestingness measures. It computes the score of a subgroup based on the difference between the actual target share and the target share that could be expected given the statistics for the separate influence factors that are combined to describe the subgroup. In doing so, previously undetected interesting subgroups are discovered, while other, partially redundant findings are suppressed. Furthermore, this work also approaches practical issues of subgroup discovery: In that direction, the VIKAMINE II tool is presented, which extends its predecessor with a rebuild user interface, novel algorithms for automatic discovery, new interactive mining techniques, as well novel options for result presentation and introspection. Finally, some real-world applications are described that utilized the presented techniques. These include the identification of influence factors on the success and satisfaction of university students and the description of locations using tagging data of geo-referenced images.

Performance Assessment of Resource Management Strategies for Cellular and Wireless Mesh Networks (2015)

Wamser, Florian

The rapid growth in the field of communication networks has been truly amazing in the last decades. We are currently experiencing a continuation thereof with an increase in traffic and the emergence of new fields of application. In particular, the latter is interesting since due to advances in the networks and new devices, such as smartphones, tablet PCs, and all kinds of Internet-connected devices, new additional applications arise from different areas. What applies for all these services is that they come from very different directions and belong to different user groups. This results in a very heterogeneous application mix with different requirements and needs on the access networks. The applications within these networks typically use the network technology as a matter of course, and expect that it works in all situations and for all sorts of purposes without any further intervention. Mobile TV, for example, assumes that the cellular networks support the streaming of video data. Likewise, mobile-connected electricity meters rely on the timely transmission of accounting data for electricity billing. From the perspective of the communication networks, this requires not only the technical realization for the individual case, but a broad consideration of all circumstances and all requirements of special devices and applications of the users. Such a comprehensive consideration of all eventualities can only be achieved by a dynamic, customized, and intelligent management of the transmission resources. This management requires to exploit the theoretical capacity as much as possible while also taking system and network architecture as well as user and application demands into account. Hence, for a high level of customer satisfaction, all requirements of the customers and the applications need to be considered, which requires a multi-faceted resource management. The prerequisite for supporting all devices and applications is consequently a holistic resource management at different levels. At the physical level, the technical possibilities provided by different access technologies, e.g., more transmission antennas, modulation and coding of data, possible cooperation between network elements, etc., need to be exploited on the one hand. On the other hand, interference and changing network conditions have to be counteracted at physical level. On the application and user level, the focus should be on the customer demands due to the currently increasing amount of different devices and diverse applications (medical, hobby, entertainment, business, civil protection, etc.). The intention of this thesis is the development, investigation, and evaluation of a holistic resource management with respect to new application use cases and requirements for the networks. Therefore, different communication layers are investigated and corresponding approaches are developed using simulative methods as well as practical emulation in testbeds. The new approaches are designed with respect to different complexity and implementation levels in order to cover the design space of resource management in a systematic way. Since the approaches cannot be evaluated generally for all types of access networks, network-specific use cases and evaluations are finally carried out in addition to the conceptual design and the modeling of the scenario. The first part is concerned with management of resources at physical layer. We study distributed resource allocation approaches under different settings. Due to the ambiguous performance objectives, a high spectrum reuse is conducted in current cellular networks. This results in possible interference between cells that transmit on the same frequencies. The focus is on the identification of approaches that are able to mitigate such interference. Due to the heterogeneity of the applications in the networks, increasingly different application-specific requirements are experienced by the networks. Consequently, the focus is shifted in the second part from optimization of network parameters to consideration and integration of the application and user needs by adjusting network parameters. Therefore, application-aware resource management is introduced to enable efficient and customized access networks. As indicated before, approaches cannot be evaluated generally for all types of access networks. Consequently, the third contribution is the definition and realization of the application-aware paradigm in different access networks. First, we address multi-hop wireless mesh networks. Finally, we focus with the fourth contribution on cellular networks. Application-aware resource management is applied here to the air interface between user device and the base station. Especially in cellular networks, the intensive cost-driven competition among the different operators facilitates the usage of such a resource management to provide cost-efficient and customized networks with respect to the running applications.

Multiobjective Traveling Salesman Problems and Redundancy of Complete Sets (2014)

Witek, Maximilian

The first part of this thesis deals with the approximability of the traveling salesman problem. This problem is defined on a complete graph with edge weights, and the task is to find a Hamiltonian cycle of minimum weight that visits each vertex exactly once. We study the most important multiobjective variants of this problem. In the multiobjective case, the edge weights are vectors of natural numbers with one component for each objective, and since weight vectors are typically incomparable, the optimal Hamiltonian cycle does not exist. Instead we consider the Pareto set, which consists of those Hamiltonian cycles that are not dominated by some other, strictly better Hamiltonian cycles. The central goal in multiobjective optimization and in the first part of this thesis in particular is the approximation of such Pareto sets. We first develop improved approximation algorithms for the two-objective metric traveling salesman problem on multigraphs and for related Hamiltonian path problems that are inspired by the single-objective Christofides' heuristic. We further show arguments indicating that our algorithms are difficult to improve. Furthermore we consider multiobjective maximization versions of the traveling salesman problem, where the task is to find Hamiltonian cycles with high weight in each objective. We generalize single-objective techniques to the multiobjective case, where we first compute a cycle cover with high weight and then remove an edge with low weight in each cycle. Since weight vectors are often incomparable, the choice of the edges of low weight is non-trivial. We develop a general lemma that solves this problem and enables us to generalize the single-objective maximization algorithms to the multiobjective case. We obtain improved, randomized approximation algorithms for the multiobjective maximization variants of the traveling salesman problem. We conclude the first part by developing deterministic algorithms for these problems. The second part of this thesis deals with redundancy properties of complete sets. We call a set autoreducible if for every input instance x we can efficiently compute some y that is different from x but that has the same membership to the set. If the set can be split into two equivalent parts, then it is called weakly mitotic, and if the splitting is obtained by an efficiently decidable separator set, then it is called mitotic. For different reducibility notions and complexity classes, we analyze how redundant its complete sets are. Previous research in this field concentrates on polynomial-time computable reducibility notions. The main contribution of this part of the thesis is a systematic study of the redundancy properties of complete sets for typical complexity classes and reducibility notions that are computable in logarithmic space. We use different techniques to show autoreducibility and mitoticity that depend on the size of the complexity class and the strength of the reducibility notion considered. For small complexity classes such as NL and P we use self-reducible, complete sets to show that all complete sets are autoreducible. For large complexity classes such as PSPACE and EXP we apply diagonalization methods to show that all complete sets are even mitotic. For intermediate complexity classes such as NP and the remaining levels of the polynomial-time hierarchy we establish autoreducibility of complete sets by locally checking computational transcripts. In many cases we can show autoreducibility of complete sets, while mitoticity is not known to hold. We conclude the second part by showing that in some cases, autoreducibility of complete sets at least implies weak mitoticity.

Context-specific Consistencies in Information Extraction: Rule-based and Probabilistic Approaches (2015)

Klügl, Peter

Large amounts of communication, documentation as well as knowledge and information are stored in textual documents. Most often, these texts like webpages, books, tweets or reports are only available in an unstructured representation since they are created and interpreted by humans. In order to take advantage of this huge amount of concealed information and to include it in analytic processes, it needs to be transformed into a structured representation. Information extraction considers exactly this task. It tries to identify well-defined entities and relations in unstructured data and especially in textual documents. Interesting entities are often consistently structured within a certain context, especially in semi-structured texts. However, their actual composition varies and is possibly inconsistent among different contexts. Information extraction models stay behind their potential and return inferior results if they do not consider these consistencies during processing. This work presents a selection of practical and novel approaches for exploiting these context-specific consistencies in information extraction tasks. The approaches direct their attention not only to one technique, but are based on handcrafted rules as well as probabilistic models. A new rule-based system called UIMA Ruta has been developed in order to provide optimal conditions for rule engineers. This system consists of a compact rule language with a high expressiveness and strong development support. Both elements facilitate rapid development of information extraction applications and improve the general engineering experience, which reduces the necessary efforts and costs when specifying rules. The advantages and applicability of UIMA Ruta for exploiting context-specific consistencies are illustrated in three case studies. They utilize different engineering approaches for including the consistencies in the information extraction task. Either the recall is increased by finding additional entities with similar composition, or the precision is improved by filtering inconsistent entities. Furthermore, another case study highlights how transformation-based approaches are able to correct preliminary entities using the knowledge about the occurring consistencies. The approaches of this work based on machine learning rely on Conditional Random Fields, popular probabilistic graphical models for sequence labeling. They take advantage of a consistency model, which is automatically induced during processing the document. The approach based on stacked graphical models utilizes the learnt descriptions as feature functions that have a static meaning for the model, but change their actual function for each document. The other two models extend the graph structure with additional factors dependent on the learnt model of consistency. They include feature functions for consistent and inconsistent entities as well as for additional positions that fulfill the consistencies. The presented approaches are evaluated in three real-world domains: segmentation of scientific references, template extraction in curricula vitae, and identification and categorization of sections in clinical discharge letters. They are able to achieve remarkable results and provide an error reduction of up to 30% compared to usually applied techniques.

A Meta-Engineering Approach for Document-Centered Knowledge Acquisition (2014)

Reutelshöfer, Jochen

Today knowledge base authoring for the engineering of intelligent systems is performed mainly by using tools with graphical user interfaces. An alternative human-computer interaction para- digm is the maintenance and manipulation of electronic documents, which provides several ad- vantages with respect to the social aspects of knowledge acquisition. Until today it hardly has found any attention as a method for knowledge engineering. This thesis provides a comprehensive discussion of document-centered knowledge acquisition with knowledge markup languages. There, electronic documents are edited by the knowledge authors and the executable knowledge base entities are captured by markup language expressions within the documents. The analysis of this approach reveals significant advantages as well as new challenges when compared to the use of traditional GUI-based tools. Some advantages of the approach are the low barriers for domain expert participation, the simple integration of informal descriptions, and the possibility of incremental knowledge for- malization. It therefore provides good conditions for building up a knowledge acquisition pro- cess based on the mixed-initiative strategy, being a flexible combination of direct and indirect knowledge acquisition. Further it turns out that document-centered knowledge acquisition with knowledge markup languages provides high potential for creating customized knowledge au- thoring environments, tailored to the needs of the current knowledge engineering project and its participants. The thesis derives a process model to optimally exploit this customization po- tential, evolving a project specific authoring environment by an agile process on the meta level. This meta-engineering process continuously refines the three aspects of the document space: The employed markup languages, the scope of the informal knowledge, and the structuring and organization of the documents. The evolution of the first aspect, the markup languages, plays a key role, implying the design of project specific markup languages that are easily understood by the knowledge authors and that are suitable to capture the required formal knowledge precisely. The goal of the meta-engineering process is to create a knowledge authoring environment, where structure and presentation of the domain knowledge comply well to the users’ mental model of the domain. In that way, the approach can help to ease major issues of knowledge-based system development, such as high initial development costs and long-term maintenance problems. In practice, the application of the meta-engineering approach for document-centered knowl- edge acquisition poses several technical challenges that need to be coped with by appropriate tool support. In this thesis KnowWE, an extensible document-centered knowledge acquisition environment is presented. The system is designed to support the technical tasks implied by the meta-engineering approach, as for instance design and implementation of new markup lan- guages, content refactoring, and authoring support. It is used to evaluate the approach in several real-world case-studies from different domains, such as medicine or engineering for instance. We end the thesis by a summary and point out further interesting research questions consid- ering the document-centered knowledge acquisition approach.

Cooperative Formation Controller Design for Time-Delay and Optimality Problems (2014)

Xu, Zhihao

This dissertation presents controller design methodologies for a formation of cooperative mobile robots to perform trajectory tracking and convoy protection tasks. Two major problems related to multi-agent formation control are addressed, namely the time-delay and optimality problems. For the task of trajectory tracking, a leader-follower based system structure is adopted for the controller design, where the selection criteria for controller parameters are derived through analyses of characteristic polynomials. The resulting parameters ensure the stability of the system and overcome the steady-state error as well as the oscillation behavior under time-delay effect. In the convoy protection scenario, a decentralized coordination strategy for balanced deployment of mobile robots is first proposed. Based on this coordination scheme, optimal controller parameters are generated in both centralized and decentralized fashion to achieve dynamic convoy protection in a unified framework, where distributed optimization technique is applied in the decentralized strategy. This unified framework takes into account the motion of the target to be protected, and the desired system performance, for instance, minimal energy to spend, equal inter-vehicle distance to keep, etc. Both trajectory tracking and convoy protection tasks are demonstrated through simulations and real-world hardware experiments based on the robotic equipment at Department of Computer Science VII, University of Würzburg.

Analysen und Heuristiken zur Verbesserung von OCR-Ergebnissen bei Frakturtexten (2014)

Vorbach, Paul

Zahlreiche Digitalisierungsprojekte machen das Wissen vergangener Jahrhunderte jederzeit verfügbar. Das volle Potenzial der Digitalisierung von Dokumenten entfaltet sich jedoch erst, wenn diese als durchsuchbare Volltexte verfügbar gemacht werden. Mithilfe von OCR-Software kann die Erfassung weitestgehend automatisiert werden. Fraktur war ab dem 16. Jahrhundert bis zur Mitte des 20. Jahrhunderts die verbreitete Schrift des deutschen Sprachraums. Durch einige Besonderheiten von Fraktur bleiben die Erkennungsraten bei Frakturtexten aber meist deutlich hinter den Erkennungsergebnissen bei Antiquatexten zurück. Diese Arbeit konzentriert sich auf die Verbesserung der Erkennungsergebnisse der OCR-Software Tesseract bei Frakturtexten. Dazu wurden die Software und bestehende Sprachpakete gesondert auf die Eigenschaften von Fraktur hin analysiert. Durch spezielles Training und Anpassungen an der Software wurde anschließend versucht, die Ergebnisse zu verbessern und Erkenntnisse über die Effektivität verschiedener Ansätze zu gewinnen. Die Zeichenfehlerraten konnten durch verschiedene Experimente von zuvor 2,5 Prozent auf 1,85 Prozent gesenkt werden. Außerdem werden Werkzeuge vorgestellt, die das Training neuer Schriftarten für Tesseract erleichtern und eine Evaluation der erzielten Verbesserungen ermöglichen.

Six Degrees of Freedom Object Pose Estimation with Fusion Data from a Time-of-flight Camera and a Color Camera (2014)

Sun, Kaipeng

Object six Degrees of Freedom (6DOF) pose estimation is a fundamental problem in many practical robotic applications, where the target or an obstacle with a simple or complex shape can move fast in cluttered environments. In this thesis, a 6DOF pose estimation algorithm is developed based on the fused data from a time-of-flight camera and a color camera. The algorithm is divided into two stages, an annealed particle filter based coarse pose estimation stage and a gradient decent based accurate pose optimization stage. In the first stage, each particle is evaluated with sparse representation. In this stage, the large inter-frame motion of the target can be well handled. In the second stage, the range data based conventional Iterative Closest Point is extended by incorporating the target appearance information and used for calculating the accurate pose by refining the coarse estimate from the first stage. For dealing with significant illumination variations during the tracking, spherical harmonic illumination modeling is investigated and integrated into both stages. The robustness and accuracy of the proposed algorithm are demonstrated through experiments on various objects in both indoor and outdoor environments. Moreover, real-time performance can be achieved with graphics processing unit acceleration.

Feedback-Generierung für offene, strukturierte Aufgaben in E-Learning-Systemen (2014)

Ifland, Marianus

Bei Lernprozessen spielt das Anwenden der zu erlernenden Tätigkeit eine wichtige Rolle. Im Kontext der Ausbildung an Schulen und Hochschulen bedeutet dies, dass es wichtig ist, Schülern und Studierenden ausreichend viele Übungsmöglichkeiten anzubieten. Die von Lehrpersonal bei einer "Korrektur" erstellte Rückmeldung, auch Feedback genannt, ist jedoch teuer, da der zeitliche Aufwand je nach Art der Aufgabe beträchtlich ist. Eine Lösung dieser Problematik stellen E-Learning-Systeme dar. Geeignete Systeme können nicht nur Lernstoff präsentieren, sondern auch Übungsaufgaben anbieten und nach deren Bearbeitung quasi unmittelbar entsprechendes Feedback generieren. Es ist jedoch im Allgemeinen nicht einfach, maschinelle Verfahren zu implementieren, die Bearbeitungen von Übungsaufgaben korrigieren und entsprechendes Feedback erstellen. Für einige Aufgabentypen, wie beispielsweise Multiple-Choice-Aufgaben, ist dies zwar trivial, doch sind diese vor allem dazu gut geeignet, sogenanntes Faktenwissen abzuprüfen. Das Einüben von Lernzielen im Bereich der Anwendung ist damit kaum möglich. Die Behandlung dieser nach gängigen Taxonomien höheren kognitiven Lernziele erlauben sogenannte offene Aufgabentypen, deren Bearbeitung meist durch die Erstellung eines Freitexts in natürlicher Sprache erfolgt. Die Information bzw. das Wissen, das Lernende eingeben, liegt hier also in sogenannter „unstrukturierter“ Form vor. Dieses unstrukturierte Wissen ist maschinell nur schwer verwertbar, sodass sich Trainingssysteme, die Aufgaben dieser Art stellen und entsprechende Rückmeldung geben, bisher nicht durchgesetzt haben. Es existieren jedoch auch offene Aufgabentypen, bei denen Lernende das Wissen in strukturierter Form eingeben, so dass es maschinell leichter zu verwerten ist. Für Aufgaben dieser Art lassen sich somit Trainingssysteme erstellen, die eine gute Möglichkeit darstellen, Schülern und Studierenden auch für praxisnahe Anwendungen viele Übungsmöglichkeiten zur Verfügung zu stellen, ohne das Lehrpersonal zusätzlich zu belasten. In dieser Arbeit wird beschrieben, wie bestimmte Eigenschaften von Aufgaben ausgenutzt werden, um entsprechende Trainingssysteme konzipieren und implementieren zu können. Es handelt sich dabei um Aufgaben, deren Lösungen strukturiert und maschinell interpretierbar sind. Im Hauptteil der Arbeit werden vier Trainingssysteme bzw. deren Komponenten beschrieben und es wird von den Erfahrungen mit deren Einsatz in der Praxis berichtet: Eine Komponente des Trainingssystems „CaseTrain“ kann Feedback zu UML Klassendiagrammen erzeugen. Das neuartige Trainingssystem „WARP“ generiert zu UML Aktivitätsdiagrammen Feedback in mehreren Ebenen, u.a. indem es das durch Aktivitätsdiagramme definierte Verhalten von Robotern in virtuellen Umgebungen visualisiert. Mit „ÜPS“ steht ein Trainingssystem zur Verfügung, mit welchem die Eingabe von SQL-Anfragen eingeübt werden kann. Eine weitere in „CaseTrain“ implementierte Komponente für Bildmarkierungsaufgaben ermöglicht eine unmittelbare, automatische Bewertung entsprechender Aufgaben. Die Systeme wurden im Zeitraum zwischen 2011 und 2014 an der Universität Würzburg in Vorlesungen mit bis zu 300 Studierenden eingesetzt und evaluiert. Die Evaluierung ergab eine hohe Nutzung und eine gute Bewertung der Studierenden der eingesetzten Konzepte, womit belegt wurde, dass elektronische Trainingssysteme für offene Aufgaben in der Praxis eingesetzt werden können.

Relative pose estimation of known rigid objects using a novel approach to high-level PMD-/CCD- sensor data fusion with regard to applications in space (2014)

Tzschichholz, Tristan

In this work, a novel method for estimating the relative pose of a known object is presented, which relies on an application-specific data fusion process. A PMD-sensor in conjunction with a CCD-sensor is used to perform the pose estimation. Furthermore, the work provides a method for extending the measurement range of the PMD sensor along with the necessary calibration methodology. Finally, extensive measurements on a very accurate Rendezvous and Docking testbed are made to evaluate the performance, what includes a detailed discussion of lighting conditions.

291 to 300

Institut für Informatik

Refine

Has Fulltext

Is part of the Bibliography

Year of publication

Document Type

Language

Keywords

Author

Institute

Schriftenreihe

Sonstige beteiligte Institutionen

EU-Project number / Contract (GA) number

372 search hits