Refine
Has Fulltext
- yes (14)
Is part of the Bibliography
- yes (14)
Document Type
- Doctoral Thesis (12)
- Master Thesis (2)
Keywords
Institute
Die vorliegende Arbeit ist in zwei Teile gegliedert, von denen der erste Teil den theoretischen Hintergrund und empirische Befunde zum Thema „Komplexes Problemlösen“ behandelt. Der zweite Teil beinhaltet Methodik und Ergebnisse der durchgeführten Untersuchung. Nach der Einleitung in Kapitel 1 werden in Kapitel 2 die „Grundkonzepte des Komplexen Problemlösens“ vorgestellt, wobei mit der Abgrenzung des Bereichs „Komplexes Problemlösen“ begonnen wird. Anschließend werden die Eigenschaften von komplexen Systemen und deren Anforderungen an Problemlöser beschrieben, wobei die Taxonomie1 von Dörner et al. (1994) zugrunde gelegt wird. In Kapitel 3 werden Modelle der Wissensrepräsentation und des Problemlösens vorgestellt. Dabei wird der Begriff der „Strategie“ diskutiert und im Zusammenhang mit verschiedenen allgemeinen Modellen des Problemlösens erläutert. Kapitel 4 behandelt das Konzept „Delegation“. Delegation wird in dieser Arbeit als Methode verwendet, um Versuchspersonen zur Formalisierung ihrer Strategien zu bewegen, wobei sie die Ausführung der Strategien gleichzeitig beobachten können. Es werden vor allem Befunde aus der Organisationspsychologie und Unternehmensführung berichtet und die Anwendung von Delegation in der Interaktion zwischen Mensch und künstlichem Agent erörtert. In Kapitel 5 werden Waldbrandsimulationen behandelt. Diese zählen zu den klassischen Simulationen, die zur Untersuchung von Komplexem Problemlösen verwendet werden. Zuerst wird auf computergestützte Simulation im Allgemeinen eingegangen, wobei Unterschiede zu traditionellen Untersuchungsmethoden angesprochen werden. Dabei wird auch die Bedeutung der Multiagentensimulation für die Komplexe Problemlöseforschung hervorgehoben. Anschließend wird Feuerverhalten und Feuerbekämpfung als Vorbild für Waldbrandsimulationen erläutert. Dadurch können sowohl Anhaltspunkte zur Beurteilung der Plausibilität als auch für die Implementierung einer Waldbrandsimulation gewonnen werden. Im Anschluss daran werden drei bekannte Beispiele für Waldbrandsimulationen vorgestellt, wobei auch auf domänen- bzw. simulationsspezifische Strategien eingegangen wird. In Kapitel 6 wird ein Überblick über verschiedene empirische Befunde aus dem Bereich des Komplexen Problemlösens gegeben. Diese betreffen sowohl Eigenschaften von komplexen Systemen als auch Merkmale des Problemlösers. In Kapitel 7 werden die wichtigsten Kritikpunkte und Probleme, mit denen die Komplexe Problemlöseforschung zu kämpfen hat, zusammengefasst. Die konkreten Fragestellungen der Untersuchung werden in Kapitel 8 vorgestellt, wobei Kapitel 9 und 10 erläutern, mit welcher Methodik diese Fragen untersucht werden. In diesem Zusammenhang wird auch die Simulationsumgebung SeSAm vorgestellt. Im folgenden Kapitel 11 wird auf die Eigenschaften der implementierten Waldbrandsimulation eingegangen. Kapitel 12 beschreibt den Aufbau und Ablauf der Untersuchung, mit der die Daten gewonnen werden, die in Kapitel 13 berichtet werden. Eine Diskussion der Befunde im Hinblick auf die Fragestellungen und ihre Bedeutung für die zukünftige Forschung erfolgt in Kapitel 14.
Die Extraktion von Metadaten aus historischen Dokumenten ist eine zeitintensive, komplexe und höchst fehleranfällige Tätigkeit, die üblicherweise vom menschlichen Experten übernommen werden muss. Sie ist jedoch notwendig, um Bezüge zwischen Dokumenten herzustellen, Suchanfragen zu historischen Ereignissen korrekt zu beantworten oder semantische Verknüpfungen aufzubauen. Um den manuellen Aufwand dieser Aufgabe reduzieren zu können, sollen Verfahren der Named Entity Recognition angewendet werden. Die Klassifikation von Termen in historischen Handschriften stellt jedoch eine große Herausforderung dar, da die Domäne eine hohe Schreibweisenvarianz durch unter anderem nur konventionell vereinbarte Orthographie mit sich bringt. Diese Arbeit stellt Verfahren vor, die auch in komplexen syntaktischen Umgebungen arbeiten können, indem sie auf Informationen aus dem Kontext der zu klassifizierenden Terme zurückgreifen und diese mit domänenspezifischen Heuristiken kombinieren. Weiterhin wird evaluiert, wie die so gewonnenen Metadaten genutzt werden können, um in Workflow-Systemen zur Digitalisierung historischer Handschriften Mehrwerte durch Heuristiken zur Produktionsfehlererkennung zu erzielen.
Today knowledge base authoring for the engineering of intelligent systems is performed mainly by using tools with graphical user interfaces. An alternative human-computer interaction para- digm is the maintenance and manipulation of electronic documents, which provides several ad- vantages with respect to the social aspects of knowledge acquisition. Until today it hardly has found any attention as a method for knowledge engineering.
This thesis provides a comprehensive discussion of document-centered knowledge acquisition with knowledge markup languages. There, electronic documents are edited by the knowledge authors and the executable knowledge base entities are captured by markup language expressions within the documents. The analysis of this approach reveals significant advantages as well as new challenges when compared to the use of traditional GUI-based tools.
Some advantages of the approach are the low barriers for domain expert participation, the simple integration of informal descriptions, and the possibility of incremental knowledge for- malization. It therefore provides good conditions for building up a knowledge acquisition pro- cess based on the mixed-initiative strategy, being a flexible combination of direct and indirect knowledge acquisition. Further it turns out that document-centered knowledge acquisition with knowledge markup languages provides high potential for creating customized knowledge au- thoring environments, tailored to the needs of the current knowledge engineering project and its participants. The thesis derives a process model to optimally exploit this customization po- tential, evolving a project specific authoring environment by an agile process on the meta level. This meta-engineering process continuously refines the three aspects of the document space: The employed markup languages, the scope of the informal knowledge, and the structuring and organization of the documents. The evolution of the first aspect, the markup languages, plays a key role, implying the design of project specific markup languages that are easily understood by the knowledge authors and that are suitable to capture the required formal knowledge precisely. The goal of the meta-engineering process is to create a knowledge authoring environment, where structure and presentation of the domain knowledge comply well to the users’ mental model of the domain. In that way, the approach can help to ease major issues of knowledge-based system development, such as high initial development costs and long-term maintenance problems.
In practice, the application of the meta-engineering approach for document-centered knowl- edge acquisition poses several technical challenges that need to be coped with by appropriate tool support. In this thesis KnowWE, an extensible document-centered knowledge acquisition environment is presented. The system is designed to support the technical tasks implied by the meta-engineering approach, as for instance design and implementation of new markup lan- guages, content refactoring, and authoring support. It is used to evaluate the approach in several real-world case-studies from different domains, such as medicine or engineering for instance.
We end the thesis by a summary and point out further interesting research questions consid- ering the document-centered knowledge acquisition approach.
Bei Lernprozessen spielt das Anwenden der zu erlernenden Tätigkeit eine wichtige Rolle. Im Kontext der Ausbildung an Schulen und Hochschulen bedeutet dies, dass es wichtig ist, Schülern und Studierenden ausreichend viele Übungsmöglichkeiten anzubieten. Die von Lehrpersonal bei einer "Korrektur" erstellte Rückmeldung, auch Feedback genannt, ist jedoch teuer, da der zeitliche Aufwand je nach Art der Aufgabe beträchtlich ist.
Eine Lösung dieser Problematik stellen E-Learning-Systeme dar. Geeignete Systeme können nicht nur Lernstoff präsentieren, sondern auch Übungsaufgaben anbieten und nach deren Bearbeitung quasi unmittelbar entsprechendes Feedback generieren. Es ist jedoch im Allgemeinen nicht einfach, maschinelle Verfahren zu implementieren, die Bearbeitungen von Übungsaufgaben korrigieren und entsprechendes Feedback erstellen. Für einige Aufgabentypen, wie beispielsweise Multiple-Choice-Aufgaben, ist dies zwar trivial, doch sind diese vor allem dazu gut geeignet, sogenanntes Faktenwissen abzuprüfen. Das Einüben von Lernzielen im Bereich der Anwendung ist damit kaum möglich.
Die Behandlung dieser nach gängigen Taxonomien höheren kognitiven Lernziele erlauben sogenannte offene Aufgabentypen, deren Bearbeitung meist durch die Erstellung eines Freitexts in natürlicher Sprache erfolgt. Die Information bzw. das Wissen, das Lernende eingeben, liegt hier also in sogenannter „unstrukturierter“ Form vor. Dieses unstrukturierte Wissen ist maschinell nur schwer verwertbar, sodass sich Trainingssysteme, die Aufgaben dieser Art stellen und entsprechende Rückmeldung geben, bisher nicht durchgesetzt haben. Es existieren jedoch auch offene Aufgabentypen, bei denen Lernende das Wissen in strukturierter Form eingeben, so dass es maschinell leichter zu verwerten ist. Für Aufgaben dieser Art lassen sich somit Trainingssysteme erstellen, die eine gute Möglichkeit darstellen, Schülern und Studierenden auch für praxisnahe Anwendungen viele Übungsmöglichkeiten zur Verfügung zu stellen, ohne das Lehrpersonal zusätzlich zu belasten.
In dieser Arbeit wird beschrieben, wie bestimmte Eigenschaften von Aufgaben ausgenutzt werden, um entsprechende Trainingssysteme konzipieren und implementieren zu können. Es handelt sich dabei um Aufgaben, deren Lösungen strukturiert und maschinell interpretierbar sind.
Im Hauptteil der Arbeit werden vier Trainingssysteme bzw. deren Komponenten beschrieben und es wird von den Erfahrungen mit deren Einsatz in der Praxis berichtet: Eine Komponente des Trainingssystems „CaseTrain“ kann Feedback zu UML Klassendiagrammen erzeugen. Das neuartige Trainingssystem „WARP“ generiert zu UML Aktivitätsdiagrammen Feedback in mehreren Ebenen, u.a. indem es das durch Aktivitätsdiagramme definierte Verhalten von Robotern in virtuellen Umgebungen visualisiert. Mit „ÜPS“ steht ein Trainingssystem zur Verfügung, mit welchem die Eingabe von SQL-Anfragen eingeübt werden kann. Eine weitere in „CaseTrain“ implementierte Komponente für Bildmarkierungsaufgaben ermöglicht eine unmittelbare, automatische Bewertung entsprechender Aufgaben.
Die Systeme wurden im Zeitraum zwischen 2011 und 2014 an der Universität Würzburg in Vorlesungen mit bis zu 300 Studierenden eingesetzt und evaluiert. Die Evaluierung ergab eine hohe Nutzung und eine gute Bewertung der Studierenden der eingesetzten Konzepte, womit belegt wurde, dass elektronische Trainingssysteme für offene Aufgaben in der Praxis eingesetzt werden können.
Context-specific Consistencies in Information Extraction: Rule-based and Probabilistic Approaches
(2015)
Large amounts of communication, documentation as well as knowledge and information are stored in textual documents. Most often, these texts like webpages, books, tweets or reports are only available in an unstructured representation since they are created and interpreted by humans. In order to take advantage of this huge amount of concealed information and to include it in analytic processes, it needs to be transformed into a structured representation. Information extraction considers exactly this task. It tries to identify well-defined entities and relations in unstructured data and especially in textual documents.
Interesting entities are often consistently structured within a certain context, especially in semi-structured texts. However, their actual composition varies and is possibly inconsistent among different contexts. Information extraction models stay behind their potential and return inferior results if they do not consider these consistencies during processing. This work presents a selection of practical and novel approaches for exploiting these context-specific consistencies in information extraction tasks. The approaches direct their attention not only to one technique, but are based on handcrafted rules as well as probabilistic models.
A new rule-based system called UIMA Ruta has been developed in order to provide optimal conditions for rule engineers. This system consists of a compact rule language with a high expressiveness and strong development support. Both elements facilitate rapid development of information extraction applications and improve the general engineering experience, which reduces the necessary efforts and costs when specifying rules.
The advantages and applicability of UIMA Ruta for exploiting context-specific consistencies are illustrated in three case studies. They utilize different engineering approaches for including the consistencies in the information extraction task. Either the recall is increased by finding additional entities with similar composition, or the precision is improved by filtering inconsistent entities. Furthermore, another case study highlights how transformation-based approaches are able to correct preliminary entities using the knowledge about the occurring consistencies.
The approaches of this work based on machine learning rely on Conditional Random Fields, popular probabilistic graphical models for sequence labeling. They take advantage of a consistency model, which is automatically induced during processing the document. The approach based on stacked graphical models utilizes the learnt descriptions as feature functions that have a static meaning for the model, but change their actual function for each document. The other two models extend the graph structure with additional factors dependent on the learnt model of consistency. They include feature functions for consistent and inconsistent entities as well as for additional positions that fulfill the consistencies.
The presented approaches are evaluated in three real-world domains: segmentation of scientific references, template extraction in curricula vitae, and identification and categorization of sections in clinical discharge letters. They are able to achieve remarkable results and provide an error reduction of up to 30% compared to usually applied techniques.
Large volumes of data are collected today in many domains. Often, there is so much data available, that it is difficult to identify the relevant pieces of information. Knowledge discovery seeks to obtain novel, interesting and useful information from large datasets.
One key technique for that purpose is subgroup discovery. It aims at identifying descriptions for subsets of the data, which have an interesting distribution with respect to a predefined target concept. This work improves the efficiency and effectiveness of subgroup discovery in different directions.
For efficient exhaustive subgroup discovery, algorithmic improvements are proposed for three important variations of the standard setting: First, novel optimistic estimate bounds are derived for subgroup discovery with numeric target concepts. These allow for skipping the evaluation of large parts of the search space without influencing the results. Additionally, necessary adaptations to data structures for this setting are discussed. Second, for exceptional model mining, that is, subgroup discovery with a model over multiple attributes as target concept, a generic extension of the well-known FP-tree data structure is introduced. The modified data structure stores intermediate condensed data representations, which depend on the chosen model class, in the nodes of the trees. This allows the application for many popular model classes. Third, subgroup discovery with generalization-aware measures is investigated.
These interestingness measures compare the target share or mean value in the subgroup with the respective maximum value in all its generalizations. For this setting, a novel method for deriving optimistic estimates is proposed. In contrast to previous approaches, the novel measures are not exclusively based on the anti-monotonicity of instance coverage, but also takes the difference of coverage between the subgroup and its generalizations into account. In all three areas, the advances lead to runtime improvements of more than an order of magnitude.
The second part of the contributions focuses on the \emph{effectiveness} of subgroup discovery. These improvements aim to identify more interesting subgroups in practical applications. For that purpose, the concept of expectation-driven subgroup discovery is introduced as a new family of interestingness measures. It computes the score of a subgroup based on the difference between the actual target share and the target share that could be expected given the statistics for the separate influence factors that are combined to describe the subgroup.
In doing so, previously undetected interesting subgroups are discovered, while other, partially redundant findings are suppressed.
Furthermore, this work also approaches practical issues of subgroup discovery: In that direction, the VIKAMINE II tool is presented, which extends its predecessor with a rebuild user interface, novel algorithms for automatic discovery, new interactive mining techniques, as well novel options for result presentation and introspection. Finally, some real-world applications are described that utilized the presented techniques. These include the identification of influence factors on the success and satisfaction of university students and the description of locations using tagging data of geo-referenced images.
Knowledge-based systems (KBS) face an ever-increasing interest in various disciplines and contexts. Yet, the former aim to construct the ’perfect intelligent software’ continuously shifts to user-centered, participative solutions. Such systems enable users to contribute their personal knowledge to the problem solving process for increased efficiency and an ameliorated user experience. More precisely, we define non-functional key requirements of participative KBS as: Transparency (encompassing KBS status mediation), configurability (user adaptability, degree of user control/exploration), quality of the KB and UI, and evolvability (enabling the KBS to grow mature with their users). Many of those requirements depend on the respective target users, thus calling for a more user-centered development. Often, also highly expertise domains are targeted — inducing highly complex KBs — which requires a more careful and considerate UI/interaction design. Still, current KBS engineering (KBSE) approaches mostly focus on knowledge acquisition (KA) This often leads to non-optimal, little reusable, and non/little evaluated KBS front-end solutions.
In this thesis we propose a more encompassing KBSE approach. Due to the strong mutual influences between KB and UI, we suggest a novel form of intertwined UI and KB development. We base the approach on three core components for encompassing KBSE:
(1) Extensible prototyping, a tailored form of evolutionary prototyping; this builds on mature UI prototypes and offers two extension steps for the anytime creation of core KBS prototypes (KB + core UI) and fully productive KBS (core KBS prototype + common framing functionality). (2) KBS UI patterns, that define reusable solutions for the core KBS UI/interaction; we provide a basic collection of such patterns in this work. (3) Suitable usability instruments for the assessment of the KBS artifacts. Therewith, we do not strive for ’yet another’ self-contained KBS engineering methodology. Rather, we motivate to extend existing approaches by the proposed key components. We demonstrate this based on an agile KBSE model.
For practical support, we introduce the tailored KBSE tool ProKEt. ProKEt offers a basic selection of KBS core UI patterns and corresponding configuration options out of the box; their further adaption/extension is possible on various levels of expertise. For practical usability support, ProKEt offers facilities for quantitative and qualitative data collection. ProKEt explicitly fosters the suggested, intertwined development of UI and KB. For seamlessly integrating KA activities, it provides extension points for two selected external KA tools: For KnowOF, a standard office based KA environment. And for KnowWE, a semantic wiki for collaborative KA. Therewith, ProKEt offers powerful support for encompassing, user-centered KBSE.
Finally, based on the approach and the tool, we also developed a novel KBS type: Clarification KBS as a mashup of consultation and justification KBS modules. Those denote a specifically suitable realization for participative KBS in highly expertise contexts and consequently require a specific design. In this thesis, apart from more common UI solutions, we particularly also introduce KBS UI patterns especially tailored towards Clarification KBS.
Das Potenzial der Wissensentdeckung in Daten wird häufig nicht ausgenutzt, was hauptsächlich auf Barrieren zwischen dem Entwicklerteam und dem Endnutzer des Data-Mining zurückzuführen ist. In dieser Arbeit wird ein transparenter Ansatz zum Beschreiben und Erklären von Daten für Entscheidungsträger vorgestellt. In Entscheidungsträger-zentrierten Aufgaben werden die Projektanforderungen definiert und die Ergebnisse zu einer Geschichte zusammengestellt. Eine Anforderung besteht dabei aus einem tabellarischen Bericht und ggf. Mustern in seinem Inhalt, jeweils verständlich für einen Entscheidungsträger. Die technischen Aufgaben bestehen aus einer Datenprüfung, der Integration der Daten in einem Data-Warehouse sowie dem Generieren von Berichten und dem Entdecken von Mustern wie in den Anforderungen beschrieben. Mehrere Data-Mining-Projekte können durch Wissensmanagement sowie eine geeignete Infrastruktur voneinander profitieren. Der Ansatz wurde in zwei Projekten unter Verwendung von ausschließlich Open-Source-Software angewendet.
The importance of Clinical Data Warehouses (CDW) has increased significantly in recent years as they support or enable many applications such as clinical trials, data mining, and decision making.
CDWs integrate Electronic Health Records which still contain a large amount of text data, such as discharge letters or reports on diagnostic findings in addition to structured and coded data like ICD-codes of diagnoses.
Existing CDWs hardly support features to gain information covered in texts.
Information extraction methods offer a solution for this problem but they have a high and long development effort, which can only be carried out by computer scientists.
Moreover, such systems only exist for a few medical domains.
This paper presents a method empowering clinicians to extract information from texts on their own. Medical concepts can be extracted ad hoc from e.g. discharge letters, thus physicians can work promptly and autonomously. The proposed system achieves these improvements by efficient data storage, preprocessing, and with powerful query features. Negations in texts are recognized and automatically excluded, as well as the context of information is determined and undesired facts are filtered, such as historical events or references to other persons (family history).
Context-sensitive queries ensure the semantic integrity of the concepts to be extracted.
A new feature not available in other CDWs is to query numerical concepts in texts and even filter them (e.g. BMI > 25).
The retrieved values can be extracted and exported for further analysis.
This technique is implemented within the efficient architecture of the PaDaWaN CDW and evaluated with comprehensive and complex tests.
The results outperform similar approaches reported in the literature.
Ad hoc IE determines the results in a few (milli-) seconds and a user friendly GUI enables interactive working, allowing flexible adaptation of the extraction.
In addition, the applicability of this system is demonstrated in three real-world applications at the Würzburg University Hospital (UKW).
Several drug trend studies are replicated: Findings of five studies on high blood pressure, atrial fibrillation and chronic renal failure can be partially or completely confirmed in the UKW. Another case study evaluates the prevalence of heart failure in inpatient hospitals using an algorithm that extracts information with ad hoc IE from discharge letters and echocardiogram report (e.g. LVEF < 45 ) and other sources of the hospital information system.
This study reveals that the use of ICD codes leads to a significant underestimation (31%) of the true prevalence of heart failure.
The third case study evaluates the consistency of diagnoses by comparing structured ICD-10-coded diagnoses with the diagnoses described in the diagnostic section of the discharge letter.
These diagnoses are extracted from texts with ad hoc IE, using synonyms generated with a novel method.
The developed approach can extract diagnoses from the discharge letter with a high accuracy and furthermore it can prove the degree of consistency between the coded and reported diagnoses.
The success of semantic systems has been proven over the last years.
Nowadays, Linked Data is the driver for the rapid development of ever new intelligent systems.
Especially in enterprise environments semantic systems successfully support more and more business processes.
This is especially true for after sales service in the mechanical engineering domain.
Here, service technicians need effective access to relevant technical documentation in order to diagnose and solve problems and defects.
Therefore, the usage of semantic information retrieval systems has become the new system metaphor.
Unlike classical retrieval software Linked Enterprise Data graphs are exploited to grant targeted and problem-oriented access to relevant documents.
However, huge parts of legacy technical documents have not yet been integrated into Linked Enterprise Data graphs.
Additionally, a plethora of information models for the semantic representation of technical information exists.
The semantic maturity of these information models can hardly be measured.
This thesis motivates that there is an inherent need for a self-contained semantification approach for technical documents.
This work introduces a maturity model that allows to quickly assess existing documentation.
Additionally, the approach comprises an abstracting semantic representation for technical documents that is aligned to all major standard information models.
The semantic representation combines structural and rhetorical aspects to provide access to so called Core Documentation Entities.
A novel and holistic semantification process describes how technical documents in different legacy formats can be transformed to a semantic and linked representation.
The practical significance of the semantification approach depends on tools supporting its application.
This work presents an accompanying tool chain of semantification applications, especially the semantification framework CAPLAN that is a highly integrated development and runtime environment for semantification processes.
The complete semantification approach is evaluated in four real-life projects: in a spare part augmentation project, semantification projects for earth moving technology and harvesting technology, as well as an ontology population project for special purpose vehicles.
Three additional case studies underline the broad applicability of the presented ideas.