TY  - THES
A1  - Wick, Christoph
T1  - Optical Medieval Music Recognition
T1  - Optical Medieval Music Recognition
N2  - In recent years, great progress has been made in the area of Artificial Intelligence (AI) due to the possibilities of Deep Learning which steadily yielded new state-of-the-art results especially in many image recognition tasks.
Currently, in some areas, human performance is achieved or already exceeded.
This great development already had an impact on the area of Optical Music Recognition (OMR) as several novel methods relying on Deep Learning succeeded in specific tasks.

Musicologists are interested in large-scale musical analysis and in publishing digital transcriptions in a collection enabling to develop tools for searching and data retrieving.
The application of OMR promises to simplify and thus speed-up the transcription process by either providing fully-automatic or semi-automatic approaches.
This thesis focuses on the automatic transcription of Medieval music with a focus on square notation which poses a challenging task due to complex layouts, highly varying handwritten notations, and degradation.
However, since handwritten music notations are quite complex to read, even for an experienced musicologist, it is to be expected that even with new techniques of OMR manual corrections are required to obtain the transcriptions.

This thesis presents several new approaches and open source software solutions for layout analysis and Automatic Text Recognition (ATR) for early documents and for OMR of Medieval manuscripts providing state-of-the-art technology.
Fully Convolutional Networks (FCN) are applied for the segmentation of historical manuscripts and early printed books, to detect staff lines, and to recognize neume notations.
The ATR engine Calamari is presented which allows for ATR of early prints and also the recognition of lyrics.
Configurable CNN/LSTM-network architectures which are trained with the segmentation-free CTC-loss are applied to the sequential recognition of text but also monophonic music.
Finally, a syllable-to-neume assignment algorithm is presented which represents the final step to obtain a complete transcription of the music.

The evaluations show that the performances of any algorithm is highly depending on the material at hand and the number of training instances.
The presented staff line detection correctly identifies staff lines and staves with an $F_1$-score of above $99.5\%$.
The symbol recognition yields a diplomatic Symbol Accuracy Rate (dSAR) of above $90\%$ by counting the number of correct predictions in the symbols sequence normalized by its length.
The ATR of lyrics achieved a Character Error Rate (CAR) (equivalently the number of correct predictions normalized by the sentence length) of above $93\%$ trained on 771 lyric lines of Medieval manuscripts and of 99.89\% when training on around 3.5 million lines of contemporary printed fonts.
The assignment of syllables and their corresponding neumes reached $F_1$-scores of up to $99.2\%$.
A direct comparison to previously published performances is difficult due to different materials and metrics.
However, estimations show that the reported values of this thesis exceed the state-of-the-art in the area of square notation.

A further goal of this thesis is to enable musicologists without technical background to apply the developed algorithms in a complete workflow by providing a user-friendly and comfortable Graphical User Interface (GUI) encapsulating the technical details.
For this purpose, this thesis presents the web-application OMMR4all.
Its fully-functional workflow includes the proposed state-of-the-art machine-learning algorithms and optionally allows for a manual intervention at any stage to correct the output preventing error propagation.
To simplify the manual (post-) correction, OMMR4all provides an overlay-editor that superimposes the annotations with a scan of the original manuscripts so that errors can easily be spotted.
The workflow is designed to be iteratively improvable by training better models as soon as new Ground Truth (GT) is available.
N2  - In den letzten Jahre wurden aufgrund der Möglichkeiten durch Deep Learning, was insbesondere in vielen Bildbearbeitungsaufgaben stetig neue Bestwerte erzielte, große Fortschritte im Bereich der künstlichen Intelligenz (KI) gemacht. Derzeit wird in vielen Gebieten menschliche Performanz erreicht oder mittlerweile sogar übertroffen. Diese großen Entwicklungen hatten einen Einfluss auf den Forschungsbereich der optischen Musikerkennung (OMR), da verschiedenste Methodiken, die auf Deep Learning basierten in spezifischen Aufgaben erfolgreich waren.

Musikwissenschaftler sind in großangelegter Musikanalyse und in das Veröffentlichen von digitalen Transkriptionen als Sammlungen interessiert, was eine Entwicklung von Werkzeugen zur Suche und Datenakquise ermöglicht. Die Anwendung von OMR verspricht diesen Transkriptionsprozess zu vereinfachen und zu beschleunigen indem vollautomatische oder semiautomatische Ansätze bereitgestellt werden. Diese Arbeit legt den Schwerpunkt auf die automatische Transkription von mittelalterlicher Musik mit einem Fokus auf Quadratnotation, die eine komplexe Aufgabe aufgrund der komplexen Layouts, der stark variierenden Notationen und der Alterungsprozesse der Originalmanuskripte darstellt. Da jedoch die handgeschriebenen Musiknotationen selbst für erfahrene Musikwissenschaftler aufgrund der Komplexität schwer zu lesen sind, ist davon auszugehen, dass selbst mit den neuesten OMR-Techniken manuelle Korrekturen erforderlich sind, um die Transkription zu erhalten.

Diese Arbeit präsentiert mehrere neue Ansätze und Open-Source-Software-Lösungen zur Layoutanalyse und zur automatischen Texterkennung (ATR) von frühen Dokumenten und für OMR
 von Mittelalterlichen Mauskripten, die auf dem Stand der aktuellen Technik sind. Fully Convolutional Networks (FCN) werden zur Segmentierung der historischen Manuskripte und frühen Buchdrucke, zur Detektion von Notenlinien und zur Erkennung von Neumennotationen eingesetzt. Die ATR-Engine Calamari, die eine ATR von frühen Buchdrucken und ebenso eine Erkennung von Liedtexten ermöglicht wird vorgestellt. Konfigurierbare CNN/LSTM-Netzwerkarchitekturen, die mit dem segmentierungsfreien CTC-loss trainiert werden, werden zur sequentiellen Texterkennung, aber auch einstimmiger Musik, eingesetzt. Abschließend wird ein Silben-zu-Neumen-Algorithmus vorgestellt, der dem letzten Schritt entspricht eine vollständige Transkription der Musik zu erhalten.

Die Evaluationen zeigen, dass die Performanz eines jeden Algorithmus hochgradig abhängig vom vorliegenden Material und der Anzahl der Trainingsbeispiele ist. Die vorgestellte Notenliniendetektion erkennt Notenlinien und -zeilen mit einem $F_1$-Wert von über 99,5%. Die Symbolerkennung erreichte eine diplomatische Symbolerkennungsrate (dSAR), die die Anzahl der korrekten Vorhersagen in der Symbolsequenz zählt und mit der Länge normalisiert, von über 90%. Die ATR von Liedtext erzielte eine Zeichengenauigkeit (CAR) (äquivalent zur Anzahl der korrekten Vorhersagen normalisiert durch die Sequenzlänge) von über 93% bei einem Training auf 771 Liedtextzeilen von mittelalterlichen Manuskripten und von 99,89%, wenn auf 3,5 Millionen Zeilen von moderner gedruckter Schrift trainiert wird. Die Zuordnung von Silben und den zugehörigen Neumen erreicht $F_1$-werte von über 99,2%. Ein direkter Vergleich zu bereits veröffentlichten Performanzen ist hierbei jedoch schwer, da mit verschiedenen Material und Metriken evaluiert wurde. Jedoch zeigen Abschätzungen, dass die Werte dieser Arbeit den aktuellen Stand der Technik darstellen.

Ein weiteres Ziel dieser Arbeit war es, Musikwissenschaftlern ohne technischen Hintergrund das Anwenden der entwickelten Algorithmen in einem vollständigen Workflow zu ermöglichen, indem eine benutzerfreundliche und komfortable graphische Benutzerschnittstelle (GUI) bereitgestellt wird, die die technischen Details kapselt. Zu diesem Zweck präsentiert diese Arbeit die Web-Applikation OMMR4all. Ihr voll funktionsfähiger Workflow inkludiert die vorgestellten Algorithmen gemäß dem aktuellen Stand der Technik und erlaubt optional manuell zu jedem Schritt einzugreifen, um die Ausgabe zur Vermeidung von Folgefehlern zu korrigieren. Zur Vereinfachung der manuellen (Nach-)Korrektur stellt OMMR4all einen Overlay-Editor zur Verfügung, der die Annotationen mit dem Scan des Originalmanuskripts überlagert, wodurch Fehler leicht erkannt werden können. Das Design des Workflows erlaubt iterative Verbesserungen, indem neue performantere Modelle trainiert werden können, sobald neue Ground Truth (GT) verfügbar ist.
KW  - Neumenschrift
KW  - Optische Zeichenerkennung (OCR)
KW  - Deep Learning
KW  - Optical Music Recognition
KW  - Neume Notation
KW  - Automatic Text Reconition
KW  - Optical Character Recognition
KW  - Deep Learning
KW  - Optische Musikerkennung (OMR)
KW  - Neumennotation
KW  - Automatische Texterkennung (ATR)
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-214348
ER  - 
TY  - THES
A1  - Reul, Christian
T1  - An Intelligent Semi-Automatic Workflow for Optical Character Recognition of Historical Printings
T1  - Ein intelligenter semi-automatischer Workflow für die OCR historischer Drucke
N2  - Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years great progress has been made in the area of historical OCR resulting in several powerful open-source tools for preprocessing, layout analysis and segmentation, Automatic Text Recognition (ATR) and postcorrection. Their major drawback is that they only offer limited applicability by non-technical users like humanist scholars, in particular when it comes to the combined use of several tools in a workflow. Furthermore, depending on the material, these tools are usually not able to fully automatically achieve sufficiently low error rates, let alone perfect results, creating a demand for an interactive postcorrection functionality which, however, is generally not incorporated.
This thesis addresses these issues by presenting an open-source OCR software called OCR4all which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow. While a variety of materials can already be processed fully automatically, books with more complex layouts require manual intervention by the users. This is mostly due to the fact that the required Ground Truth (GT) for training stronger mixed models (for segmentation as well as text recognition) is not available, yet, neither in the desired quantity nor quality.
To deal with this issue in the short run, OCR4all offers better recognition capabilities in combination with a very comfortable Graphical User Interface (GUI) that allows error corrections not only in the final output, but already in early stages to minimize error propagation. In the long run this constant manual correction produces large quantities of valuable, high quality training material which can be used to improve fully automatic approaches. Further on, extensive configuration capabilities are provided to set the degree of automation of the workflow and to make adaptations to the carefully selected default parameters for specific printings, if necessary. The architecture of OCR4all allows for an easy integration (or substitution) of newly developed tools for its main components by supporting standardized interfaces like PageXML, thus aiming at continual higher automation for historical printings.
In addition to OCR4all, several methodical extensions in the form of accuracy improving techniques for training and recognition are presented. Most notably an effective, sophisticated, and adaptable voting methodology using a single ATR engine, a pretraining procedure, and an Active Learning (AL) component are proposed. Experiments showed that combining pretraining and voting significantly improves the effectiveness of book-specific training, reducing the obtained Character Error Rates (CERs) by more than 50%.
The proposed extensions were further evaluated during two real world case studies: First, the voting and pretraining techniques are transferred to the task of constructing so-called mixed models which are trained on a variety of different fonts. This was done by using 19th century Fraktur script as an example, resulting in a considerable improvement over a variety of existing open-source and commercial engines and models. Second, the extension from ATR on raw text to the adjacent topic of typography recognition was successfully addressed by thoroughly indexing a historical lexicon that heavily relies on different font types in order to encode its complex semantic structure.
During the main experiments on very complex early printed books even users with minimal or no experience were able to not only comfortably deal with the challenges presented by the complex layout, but also to recognize the text with manageable effort and great quality, achieving excellent CERs below 0.5%. Furthermore, the fully automated application on 19th century novels showed that OCR4all (average CER of 0.85%) can considerably outperform the commercial state-of-the-art tool ABBYY Finereader (5.3%) on moderate layouts if suitably pretrained mixed ATR models are available.
N2  - Die Optische Zeichenerkennung (Optical Character Recognition, OCR) auf historischen Drucken stellt nach wie vor eine große Herausforderung dar, hauptsächlich aufgrund des häufig komplexen Layouts und der hoch varianten Typographie. In den letzten Jahre gab es große Fortschritte im Bereich der historischen OCR, die nicht selten auch in Form von Open Source Tools interessierten Nutzenden frei zur Verfügung stehen. Der Nachteil dieser Tools ist, dass sie meist ausschließlich über die Kommandozeile bedient werden können und somit nicht-technische Nutzer schnell überfordern. Außerdem sind die Tools häufig nicht aufeinander abgestimmt und verfügen dementsprechend nicht über gemeinsame Schnittstellen.

Diese Arbeit adressiert diese Problematik mittels des Open Source Tools OCR4all, das verschiedene State-of-the-Art OCR Lösungen zu einem zusammenhängenden Workflow kombiniert und in einer einzigen Anwendung kapselt. Besonderer Wert liegt dabei darauf, auch nicht-technischen Nutzern zu erlauben, selbst die ältesten und anspruchsvollen Drucke selbstständig und mit höchster Qualität zu erfassen. OCR4all ist vollständig über eine komfortable graphische Nutzeroberfläche bedienbar und bietet umfangreiche Möglichkeiten hinsichtlich Konfiguration und interaktiver Nachkorrektur. Zusätzlich zu OCR4all werden mehrere methodische Erweiterungen präsentiert, um die Effektivität und Effizienz der Trainings- und Erkennungsprozesse zur Texterkennung zu optimieren.

Während umfangreicher Evaluationen konnte gezeigt werden, dass selbst Nutzer ohne nennenswerte Vorerfahrung in der Lage waren, OCR4all eigenständig auf komplexe historische Drucke anzuwenden und dort hervorragende Zeichenfehlerraten von durchschnittlich unter 0,5% zu erzielen. Die methodischen Verbesserungen mit Blick auf die Texterkennung reduzierten dabei die Fehlerrate um über 50% im Vergleich zum etablierten Standardansatz.
KW  - Optische Zeichenerkennung
KW  - Optical Character Recognition
KW  - Document Analysis
KW  - Historical Printings
KW  - Alter Druck
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-209239
ER  -