TY  - CHAP
A1  - Jannidis, Fotis
A1  - Reger, Isabella
A1  - Weimer, Lukas
A1  - Krug, Markus
A1  - Puppe, Frank
T1  - Automatische Erkennung von Figuren in deutschsprachigen Romanen
N2  - Eine wichtige Grundlage für die quantitative Analyse von Erzähltexten, etwa eine Netzwerkanalyse der Figurenkonstellation, ist die automatische Erkennung von Referenzen auf Figuren in Erzähltexten, ein Sonderfall des generischen NLP-Problems der Named Entity Recognition. Bestehende, auf Zeitungstexten trainierte Modelle sind für literarische Texte nur eingeschränkt brauchbar, da die Einbeziehung von Appellativen in die Named Entity-Definition und deren häufige Verwendung in Romantexten zu einem schlechten Ergebnis führt. Dieses Paper stellt eine anhand eines manuell annotierten Korpus auf deutschsprachige Romane des 19. Jahrhunderts angepasste NER-Komponente vor.
KW  - Digital Humanities
KW  - Figurenerkennung
KW  - Named-Entity-Recognition
KW  - Domänenadaption
KW  - Literatur
Y1  - 2015
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-143332
UR  - https://dhd2015.uni-graz.at/
ER  - 
TY  - JOUR
A1  - Toepfer, Martin
A1  - Corovic, Hamo
A1  - Fette, Georg
A1  - Klügl, Peter
A1  - Störk, Stefan
A1  - Puppe, Frank
T1  - Fine-grained information extraction from German transthoracic echocardiography reports
JF  - BMC Medical Informatics and Decision Making
N2  - Background
Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg.

Methods
A domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts.

Results
The final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f 1=.989 (micro average) and f 1=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout.

Conclusions
The developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports with very high precision and high recall on the majority of documents at the University Hospital of Würzburg. Extracted results populate a clinical data warehouse which supports clinical research.
Y1  - 2015
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-125509
VL  - 15
IS  - 91
ER  -