TY  - JOUR
A1  - Gehrke, Alexander
A1  - Balbach, Nico
A1  - Rauch, Yong-Mi
A1  - Degkwitz, Andreas
A1  - Puppe, Frank
T1  - Erkennung von handschriftlichen Unterstreichungen in Alten Drucken
JF  - Bibliothek Forschung und Praxis
N2  - Die Erkennung handschriftlicher Artefakte wie Unterstreichungen in Buchdrucken ermöglicht Rückschlüsse auf das Rezeptionsverhalten und die Provenienzgeschichte und wird auch für eine OCR benötigt. Dabei soll zwischen handschriftlichen Unterstreichungen und waagerechten Linien im Druck (z. B. Trennlinien usw.) unterschieden werden, da letztere nicht ausgezeichnet werden sollen. Im Beitrag wird ein Ansatz basierend auf einem auf Unterstreichungen trainierten Neuronalen Netz gemäß der U-Net Architektur vorgestellt, dessen Ergebnisse in einem zweiten Schritt mit heuristischen Regeln nachbearbeitet werden. Die Evaluationen zeigen, dass Unterstreichungen sehr gut erkannt werden, wenn bei der Binarisierung der Scans nicht zu viele Pixel der Unterstreichung wegen geringem Kontrast verloren gehen. Zukünftig sollen die Worte oberhalb der Unterstreichung mit OCR transkribiert werden und auch andere Artefakte wie handschriftliche Notizen in alten Drucken erkannt werden.
N2  - The recognition of handwritten artefacts like underlines in historical printings allows inference on the reception and provenance history and is necessary for OCR (optical character recognition). In this context it is important to differentiate between handwritten and printed lines, since the latter are common in printings, but should be ignored. We present an approach based on neural nets with the U-Net architecture, whose segmentation results are post processed with heuristic rules. The evaluations show that handwritten underlines are very well recognized if the binarisation of the scans is adequate. Future work includes transcription of the underlined words with OCR and recognition of other artefacts like handwritten notes in historical printings.
T2  - Recognition of handwritten underlines in historical printings
KW  - Brüder Grimm Privatbibliothek
KW  - Erkennung handschriftlicher Artefakte
KW  - Convolutional Neural Network
KW  - regelbasierte Nachbearbeitung
KW  - Grimm brothers personal library
KW  - handwritten artefact recognition
KW  - convolutional neural network
KW  - rule based post processing
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-193377
SN  - 1865-7648
SN  - 0341-4183
N1  - Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG-geförderten) Allianz- bzw. Nationallizenz frei zugänglich.
VL  - 43
IS  - 3
SP  - 447
EP  - 452
ER  - 
TY  - JOUR
A1  - Wick, Christoph
A1  - Hartelt, Alexander
A1  - Puppe, Frank
T1  - Staff, symbol and melody detection of Medieval manuscripts written in square notation using deep Fully Convolutional Networks
JF  - Applied Sciences
N2  - Even today, the automatic digitisation of scanned documents in general, but especially the automatic optical music recognition (OMR) of historical manuscripts, still remains an enormous challenge, since both handwritten musical symbols and text have to be identified. This paper focuses on the Medieval so-called square notation developed in the 11th–12th century, which is already composed of staff lines, staves, clefs, accidentals, and neumes that are roughly spoken connected single notes. The aim is to develop an algorithm that captures both the neumes, and in particular its melody, which can be used to reconstruct the original writing. Our pipeline is similar to the standard OMR approach and comprises a novel staff line and symbol detection algorithm based on deep Fully Convolutional Networks (FCN), which perform pixel-based predictions for either staff lines or symbols and their respective types. Then, the staff line detection combines the extracted lines to staves and yields an F\(_1\) -score of over 99% for both detecting lines and complete staves. For the music symbol detection, we choose a novel approach that skips the step to identify neumes and instead directly predicts note components (NCs) and their respective affiliation to a neume. Furthermore, the algorithm detects clefs and accidentals. Our algorithm predicts the symbol sequence of a staff with a diplomatic symbol accuracy rate (dSAR) of about 87%, which includes symbol type and location. If only the NCs without their respective connection to a neume, all clefs and accidentals are of interest, the algorithm reaches an harmonic symbol accuracy rate (hSAR) of approximately 90%. In general, the algorithm recognises a symbol in the manuscript with an F\(_1\) -score of over 96%.
KW  - optical music recognition
KW  - historical document analysis
KW  - medieval manuscripts
KW  - neume notation
KW  - fully convolutional neural networks
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-197248
SN  - 2076-3417
VL  - 9
IS  - 13
ER  - 
TY  - JOUR
A1  - Reul, Christian
A1  - Christ, Dennis
A1  - Hartelt, Alexander
A1  - Balbach, Nico
A1  - Wehner, Maximilian
A1  - Springmann, Uwe
A1  - Wick, Christoph
A1  - Grundig, Christine
A1  - Büttner, Andreas
A1  - Puppe, Frank
T1  - OCR4all—An open-source tool providing a (semi-)automatic OCR workflow for historical printings
JF  - Applied Sciences
N2  - Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years, great progress has been made in the area of historical OCR, resulting in several powerful open-source tools for preprocessing, layout analysis and segmentation, character recognition, and post-processing. The drawback of these tools often is their limited applicability by non-technical users like humanist scholars and in particular the combined use of several tools in a workflow. In this paper, we present an open-source OCR software called OCR4all, which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow. While a variety of materials can already be processed fully automatically, books with more complex layouts require manual intervention by the users. This is mostly due to the fact that the required ground truth for training stronger mixed models (for segmentation, as well as text recognition) is not available, yet, neither in the desired quantity nor quality. To deal with this issue in the short run, OCR4all offers a comfortable GUI that allows error corrections not only in the final output, but already in early stages to minimize error propagations. In the long run, this constant manual correction produces large quantities of valuable, high quality training material, which can be used to improve fully automatic approaches. Further on, extensive configuration capabilities are provided to set the degree of automation of the workflow and to make adaptations to the carefully selected default parameters for specific printings, if necessary. During experiments, the fully automated application on 19th Century novels showed that OCR4all can considerably outperform the commercial state-of-the-art tool ABBYY Finereader on moderate layouts if suitably pretrained mixed OCR models are available. Furthermore, on very complex early printed books, even users with minimal or no experience were able to capture the text with manageable effort and great quality, achieving excellent Character Error Rates (CERs) below 0.5%. The architecture of OCR4all allows the easy integration (or substitution) of newly developed tools for its main components by standardized interfaces like PageXML, thus aiming at continual higher automation for historical printings.
KW  - optical character recognition
KW  - document analysis
KW  - historical printings
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-193103
SN  - 2076-3417
VL  - 9
IS  - 22
ER  - 
TY  - JOUR
A1  - Djebko, Kirill
A1  - Puppe, Frank
A1  - Kayal, Hakan
T1  - Model-based fault detection and diagnosis for spacecraft with an application for the SONATE triple cube nano-satellite
JF  - Aerospace
N2  - The correct behavior of spacecraft components is the foundation of unhindered mission operation. However, no technical system is free of wear and degradation. A malfunction of one single component might significantly alter the behavior of the whole spacecraft and may even lead to a complete mission failure. Therefore, abnormal component behavior must be detected early in order to be able to perform counter measures. A dedicated fault detection system can be employed, as opposed to classical health monitoring, performed by human operators, to decrease the response time to a malfunction. In this paper, we present a generic model-based diagnosis system, which detects faults by analyzing the spacecraft’s housekeeping data. The observed behavior of the spacecraft components, given by the housekeeping data is compared to their expected behavior, obtained through simulation. Each discrepancy between the observed and the expected behavior of a component generates a so-called symptom. Given the symptoms, the diagnoses are derived by computing sets of components whose malfunction might cause the observed discrepancies. We demonstrate the applicability of the diagnosis system by using modified housekeeping data of the qualification model of an actual spacecraft and outline the advantages and drawbacks of our approach.
KW  - fault detection
KW  - model-based diagnosis
KW  - nano-satellite
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-198836
SN  - 2226-4310
VL  - 6
IS  - 10
ER  - 
TY  - JOUR
A1  - Dietrich, Georg
A1  - Krebs, Jonathan
A1  - Liman, Leon
A1  - Fette, Georg
A1  - Ertl, Maximilian
A1  - Kaspar, Mathias
A1  - Störk, Stefan
A1  - Puppe, Frank
T1  - Replicating medication trend studies using ad hoc information extraction in a clinical data warehouse
JF  - BMC Medical Informatics and Decision Making
N2  - Background
Medication trend studies show the changes of medication over the years and may be replicated using a clinical Data Warehouse (CDW). Even nowadays, a lot of the patient information, like medication data, in the EHR is stored in the format of free text. As the conventional approach of information extraction (IE) demands a high developmental effort, we used ad hoc IE instead. This technique queries information and extracts it on the fly from texts contained in the CDW.

Methods
We present a generalizable approach of ad hoc IE for pharmacotherapy (medications and their daily dosage) presented in hospital discharge letters. We added import and query features to the CDW system, like error tolerant queries to deal with misspellings and proximity search for the extraction of the daily dosage. During the data integration process in the CDW, negated, historical and non-patient context data are filtered. For the replication studies, we used a drug list grouped by ATC (Anatomical Therapeutic Chemical Classification System) codes as input for queries to the CDW.

Results
We achieve an F1 score of 0.983 (precision 0.997, recall 0.970) for extracting medication from discharge letters and an F1 score of 0.974 (precision 0.977, recall 0.972) for extracting the dosage. We replicated three published medical trend studies for hypertension, atrial fibrillation and chronic kidney disease. Overall, 93% of the main findings could be replicated, 68% of sub-findings, and 75% of all findings. One study could be completely replicated with all main and sub-findings.

Conclusion
A novel approach for ad hoc IE is presented. It is very suitable for basic medical texts like discharge letters and finding reports. Ad hoc IE is by definition more limited than conventional IE and does not claim to replace it, but it substantially exceeds the search capabilities of many CDWs and it is convenient to conduct replication studies fast and with high quality.
KW  - data warehouse
KW  - medication extraction
KW  - information extraction
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-200409
VL  - 19
ER  - 
TY  - JOUR
A1  - Loda, Sophia
A1  - Krebs, Jonathan
A1  - Danhof, Sophia
A1  - Schreder, Martin
A1  - Solimando, Antonio G.
A1  - Strifler, Susanne
A1  - Rasche, Leo
A1  - Kortüm, Martin
A1  - Kerscher, Alexander
A1  - Knop, Stefan
A1  - Puppe, Frank
A1  - Einsele, Hermann
A1  - Bittrich, Max
T1  - Exploration of artificial intelligence use with ARIES in multiple myeloma research
JF  - Journal of Clinical Medicine
N2  - Background: Natural language processing (NLP) is a powerful tool supporting the generation of Real-World Evidence (RWE). There is no NLP system that enables the extensive querying of parameters specific to multiple myeloma (MM) out of unstructured medical reports. We therefore created a MM-specific ontology to accelerate the information extraction (IE) out of unstructured text. Methods: Our MM ontology consists of extensive MM-specific and hierarchically structured attributes and values. We implemented “A Rule-based Information Extraction System” (ARIES) that uses this ontology. We evaluated ARIES on 200 randomly selected medical reports of patients diagnosed with MM. Results: Our system achieved a high F1-Score of 0.92 on the evaluation dataset with a precision of 0.87 and recall of 0.98. Conclusions: Our rule-based IE system enables the comprehensive querying of medical reports. The IE accelerates the extraction of data and enables clinicians to faster generate RWE on hematological issues. RWE helps clinicians to make decisions in an evidence-based manner. Our tool easily accelerates the integration of research evidence into everyday clinical practice.
KW  - natural language processing
KW  - ontology
KW  - artificial intelligence
KW  - multiple myeloma
KW  - real world evidence
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-197231
SN  - 2077-0383
VL  - 8
IS  - 7
ER  -