TY  - JOUR
A1  - Ahmed, Zeeshan
A1  - Zeeshan, Saman
A1  - Dandekar, Thomas
T1  - Mining biomedical images towards valuable information retrieval in biomedical and life sciences
JF  - Database - The Journal of Biological Databases and Curation
N2  - Biomedical images are helpful sources for the scientists and practitioners in drawing significant hypotheses, exemplifying approaches and describing experimental results in published biomedical literature. In last decades, there has been an enormous increase in the amount of heterogeneous biomedical image production and publication, which results in a need for bioimaging platforms for feature extraction and analysis of text and content in biomedical images to take advantage in implementing effective information retrieval systems. In this review, we summarize technologies related to data mining of figures. We describe and compare the potential of different approaches in terms of their developmental aspects, used methodologies, produced results, achieved accuracies and limitations. Our comparative conclusions include current challenges for bioimaging software with selective image mining, embedded text extraction and processing of complex natural language queries.
KW  - humans
KW  - software
KW  - image processing
KW  - animals
KW  - computer-assisted
KW  - data mining/methods
KW  - natural language processing
Y1  - 2016
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-162697
VL  - 2016
ER  - 
TY  - JOUR
A1  - Loda, Sophia
A1  - Krebs, Jonathan
A1  - Danhof, Sophia
A1  - Schreder, Martin
A1  - Solimando, Antonio G.
A1  - Strifler, Susanne
A1  - Rasche, Leo
A1  - Kortüm, Martin
A1  - Kerscher, Alexander
A1  - Knop, Stefan
A1  - Puppe, Frank
A1  - Einsele, Hermann
A1  - Bittrich, Max
T1  - Exploration of artificial intelligence use with ARIES in multiple myeloma research
JF  - Journal of Clinical Medicine
N2  - Background: Natural language processing (NLP) is a powerful tool supporting the generation of Real-World Evidence (RWE). There is no NLP system that enables the extensive querying of parameters specific to multiple myeloma (MM) out of unstructured medical reports. We therefore created a MM-specific ontology to accelerate the information extraction (IE) out of unstructured text. Methods: Our MM ontology consists of extensive MM-specific and hierarchically structured attributes and values. We implemented “A Rule-based Information Extraction System” (ARIES) that uses this ontology. We evaluated ARIES on 200 randomly selected medical reports of patients diagnosed with MM. Results: Our system achieved a high F1-Score of 0.92 on the evaluation dataset with a precision of 0.87 and recall of 0.98. Conclusions: Our rule-based IE system enables the comprehensive querying of medical reports. The IE accelerates the extraction of data and enables clinicians to faster generate RWE on hematological issues. RWE helps clinicians to make decisions in an evidence-based manner. Our tool easily accelerates the integration of research evidence into everyday clinical practice.
KW  - natural language processing
KW  - ontology
KW  - artificial intelligence
KW  - multiple myeloma
KW  - real world evidence
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-197231
SN  - 2077-0383
VL  - 8
IS  - 7
ER  - 
TY  - THES
A1  - Dietrich, Georg
T1  - Ad Hoc Information Extraction in a Clinical Data Warehouse with Case Studies for Data Exploration and Consistency Checks
T1  - Ad Hoc Informationsextraktion in einem Klinischen Data-Warehouse mit Fallstudien zur Datenexploration und Konsistenzüberprüfungen
N2  - The importance of Clinical Data Warehouses (CDW) has increased significantly in recent years as they support or enable many applications such as clinical trials, data mining, and decision making. 
CDWs integrate Electronic Health Records which still contain a large amount of text data, such as discharge letters or reports on diagnostic findings in addition to structured and coded data like ICD-codes of diagnoses. 
Existing CDWs hardly support features to gain information covered in  texts. 
Information extraction methods offer a solution for this problem but they have a high and long development effort, which can only be carried out by computer scientists. 
Moreover, such systems only exist for a few medical domains. 

This paper presents a method empowering clinicians to extract information from texts on their own. Medical concepts can be extracted ad hoc from e.g. discharge letters, thus physicians can work promptly and autonomously. The proposed system achieves these improvements by efficient data storage, preprocessing, and with powerful query features. Negations in texts are recognized and automatically excluded, as well as the context of information is determined and undesired facts are filtered, such as historical events or references to other persons (family history). 
Context-sensitive queries ensure the semantic integrity of the concepts to be extracted. 
A new feature not available in other CDWs is to query numerical concepts in texts and even  filter them (e.g. BMI > 25). 
The retrieved values can be extracted and exported for further analysis.

This technique is implemented within the efficient architecture of the PaDaWaN CDW and evaluated with comprehensive and complex tests.
The results outperform similar approaches reported in the literature. 
Ad hoc IE determines the results in a few (milli-) seconds and a user friendly GUI enables interactive working, allowing flexible adaptation of the extraction. 

In addition, the applicability of this system is demonstrated in three real-world applications at the Würzburg University Hospital (UKW). 
Several drug trend studies are replicated: Findings of five studies on high blood pressure, atrial fibrillation and chronic renal failure can be partially or completely confirmed in the UKW. Another case study evaluates the prevalence of heart failure in inpatient hospitals using an algorithm that extracts information with ad hoc IE from discharge letters and echocardiogram report (e.g. LVEF < 45 ) and other sources of the hospital information system. 
This study reveals that the use of ICD codes leads to a significant underestimation (31%) of the true prevalence of heart failure. 
The third case study evaluates the consistency of diagnoses by comparing structured ICD-10-coded diagnoses with the diagnoses described in the diagnostic section of the discharge letter. 
These diagnoses are extracted from  texts with ad hoc IE, using synonyms generated with a novel method.
The developed approach can extract diagnoses from the discharge letter with a high accuracy and furthermore it can prove the degree of consistency between the coded and reported diagnoses.
N2  - Die Bedeutung von Clinical Data Warehouses (CDW) hat in den letzten Jahren stark zugenommen, da sie viele Anwendungen wie klinische Studien, Data Mining und Entscheidungsfindung unterstützen oder ermöglichen. CDWs integrieren elektronische Patientenakten, die  neben strukturierten und kodierten Daten wie ICD-Codes von Diagnosen immer noch sehr vielen Textdaten enthalten, sowie Arztbriefe oder Befundberichte.  Bestehende CDWs unterstützen kaum Funktionen, um die in den Texten enthaltenen Informationen zu nutzen. Informationsextraktionsmethoden bieten zwar eine Lösung für dieses Problem, erfordern aber einen hohen und langen Entwicklungsaufwand, der nur von Informatikern durchgeführt werden kann. Außerdem gibt es solche Systeme nur für wenige medizinische Bereiche. 

Diese Arbeit stellt eine Methode vor, die es Ärzten ermöglicht, Informationen aus Texten selbstständig zu extrahieren. Medizinische Konzepte können ad hoc aus Texten (z. B. Arztbriefen) extrahiert werden, so dass Ärzte unverzüglich und autonom arbeiten können. Das vorgestellte System erreicht diese Verbesserungen durch effiziente Datenspeicherung, Vorverarbeitung und leistungsstarke Abfragefunktionen. 
Negationen in Texten werden erkannt und automatisch ausgeschlossen, ebenso wird der Kontext von Informationen bestimmt und unerwünschte Fakten gefiltert, wie z. B. historische Ereignisse oder ein Bezug zu anderen Personen (Familiengeschichte). 
Kontextsensitive Abfragen gewährleisten die semantische Integrität der zu extrahierenden Konzepte. Eine neue Funktion, die in anderen CDWs nicht verfügbar ist, ist die Abfrage numerischer Konzepte in Texten und sogar deren Filterung (z. B. BMI > 25). Die abgerufenen Werte können extrahiert und zur weiteren Analyse exportiert werden. 

Diese Technik wird innerhalb der effizienten Architektur des PaDaWaN-CDW implementiert und mit umfangreichen und aufwendigen Tests evaluiert. Die Ergebnisse übertreffen ähnliche Ansätze, die in der Literatur beschrieben werden. Ad hoc IE ermittelt die Ergebnisse in wenigen (Milli-)Sekunden und die benutzerfreundliche Oberfläche ermöglicht interaktives Arbeiten und eine flexible Anpassung der Extraktion. 

Darüber hinaus wird die Anwendbarkeit dieses Systems in drei realen Anwendungen am Universitätsklinikum Würzburg (UKW) demonstriert: Mehrere Medikationstrendstudien werden repliziert: Die Ergebnisse aus fünf Studien zu Bluthochdruck, Vorhofflimmern und chronischem Nierenversagen können in dem UKW teilweise oder vollständig bestätigt werden. Eine weitere Fallstudie bewertet die Prävalenz von Herzinsuffizienz in stationären Patienten in Krankenhäusern mit einem Algorithmus, der Informationen mit Ad-hoc-IE aus Arztbriefen, Echokardiogrammbericht und  aus anderen Quellen des Krankenhausinformationssystems  extrahiert (z. B. LVEF < 45). Diese Studie zeigt, dass die Verwendung von ICD-Codes zu einer signifikanten Unterschätzung (31%) der tatsächlichen Prävalenz von Herzinsuffizienz führt. Die dritte Fallstudie bewertet die Konsistenz von Diagnosen, indem sie strukturierte ICD-10-codierte Diagnosen mit den Diagnosen, die im Diagnoseabschnitt des Arztbriefes beschriebenen, vergleicht. Diese Diagnosen werden mit Ad-hoc-IE  aus den Texten gewonnen, dabei werden Synonyme verwendet, die mit einer neuartigen Methode generiert werden. Der verwendete Ansatz kann Diagnosen mit hoher Genauigkeit aus  Arztbriefen extrahieren und darüber hinaus den Grad der Übereinstimmung zwischen den kodierten und beschriebenen Diagnosen bestimmen.
KW  - Information Extraction
KW  - information extraction
KW  - information retrieval
KW  - Clinical Data Warehouse
KW  - negation detection
KW  - natural language processing
KW  - Data-Warehouse-Konzept
KW  - Klinisches Experiment
KW  - Data Warehouse
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-184642
ER  -