TY  - THES
A1  - Wick, Christoph
T1  - Optical Medieval Music Recognition
T1  - Optical Medieval Music Recognition
N2  - In recent years, great progress has been made in the area of Artificial Intelligence (AI) due to the possibilities of Deep Learning which steadily yielded new state-of-the-art results especially in many image recognition tasks.
Currently, in some areas, human performance is achieved or already exceeded.
This great development already had an impact on the area of Optical Music Recognition (OMR) as several novel methods relying on Deep Learning succeeded in specific tasks.

Musicologists are interested in large-scale musical analysis and in publishing digital transcriptions in a collection enabling to develop tools for searching and data retrieving.
The application of OMR promises to simplify and thus speed-up the transcription process by either providing fully-automatic or semi-automatic approaches.
This thesis focuses on the automatic transcription of Medieval music with a focus on square notation which poses a challenging task due to complex layouts, highly varying handwritten notations, and degradation.
However, since handwritten music notations are quite complex to read, even for an experienced musicologist, it is to be expected that even with new techniques of OMR manual corrections are required to obtain the transcriptions.

This thesis presents several new approaches and open source software solutions for layout analysis and Automatic Text Recognition (ATR) for early documents and for OMR of Medieval manuscripts providing state-of-the-art technology.
Fully Convolutional Networks (FCN) are applied for the segmentation of historical manuscripts and early printed books, to detect staff lines, and to recognize neume notations.
The ATR engine Calamari is presented which allows for ATR of early prints and also the recognition of lyrics.
Configurable CNN/LSTM-network architectures which are trained with the segmentation-free CTC-loss are applied to the sequential recognition of text but also monophonic music.
Finally, a syllable-to-neume assignment algorithm is presented which represents the final step to obtain a complete transcription of the music.

The evaluations show that the performances of any algorithm is highly depending on the material at hand and the number of training instances.
The presented staff line detection correctly identifies staff lines and staves with an $F_1$-score of above $99.5\%$.
The symbol recognition yields a diplomatic Symbol Accuracy Rate (dSAR) of above $90\%$ by counting the number of correct predictions in the symbols sequence normalized by its length.
The ATR of lyrics achieved a Character Error Rate (CAR) (equivalently the number of correct predictions normalized by the sentence length) of above $93\%$ trained on 771 lyric lines of Medieval manuscripts and of 99.89\% when training on around 3.5 million lines of contemporary printed fonts.
The assignment of syllables and their corresponding neumes reached $F_1$-scores of up to $99.2\%$.
A direct comparison to previously published performances is difficult due to different materials and metrics.
However, estimations show that the reported values of this thesis exceed the state-of-the-art in the area of square notation.

A further goal of this thesis is to enable musicologists without technical background to apply the developed algorithms in a complete workflow by providing a user-friendly and comfortable Graphical User Interface (GUI) encapsulating the technical details.
For this purpose, this thesis presents the web-application OMMR4all.
Its fully-functional workflow includes the proposed state-of-the-art machine-learning algorithms and optionally allows for a manual intervention at any stage to correct the output preventing error propagation.
To simplify the manual (post-) correction, OMMR4all provides an overlay-editor that superimposes the annotations with a scan of the original manuscripts so that errors can easily be spotted.
The workflow is designed to be iteratively improvable by training better models as soon as new Ground Truth (GT) is available.
N2  - In den letzten Jahre wurden aufgrund der Möglichkeiten durch Deep Learning, was insbesondere in vielen Bildbearbeitungsaufgaben stetig neue Bestwerte erzielte, große Fortschritte im Bereich der künstlichen Intelligenz (KI) gemacht. Derzeit wird in vielen Gebieten menschliche Performanz erreicht oder mittlerweile sogar übertroffen. Diese großen Entwicklungen hatten einen Einfluss auf den Forschungsbereich der optischen Musikerkennung (OMR), da verschiedenste Methodiken, die auf Deep Learning basierten in spezifischen Aufgaben erfolgreich waren.

Musikwissenschaftler sind in großangelegter Musikanalyse und in das Veröffentlichen von digitalen Transkriptionen als Sammlungen interessiert, was eine Entwicklung von Werkzeugen zur Suche und Datenakquise ermöglicht. Die Anwendung von OMR verspricht diesen Transkriptionsprozess zu vereinfachen und zu beschleunigen indem vollautomatische oder semiautomatische Ansätze bereitgestellt werden. Diese Arbeit legt den Schwerpunkt auf die automatische Transkription von mittelalterlicher Musik mit einem Fokus auf Quadratnotation, die eine komplexe Aufgabe aufgrund der komplexen Layouts, der stark variierenden Notationen und der Alterungsprozesse der Originalmanuskripte darstellt. Da jedoch die handgeschriebenen Musiknotationen selbst für erfahrene Musikwissenschaftler aufgrund der Komplexität schwer zu lesen sind, ist davon auszugehen, dass selbst mit den neuesten OMR-Techniken manuelle Korrekturen erforderlich sind, um die Transkription zu erhalten.

Diese Arbeit präsentiert mehrere neue Ansätze und Open-Source-Software-Lösungen zur Layoutanalyse und zur automatischen Texterkennung (ATR) von frühen Dokumenten und für OMR
 von Mittelalterlichen Mauskripten, die auf dem Stand der aktuellen Technik sind. Fully Convolutional Networks (FCN) werden zur Segmentierung der historischen Manuskripte und frühen Buchdrucke, zur Detektion von Notenlinien und zur Erkennung von Neumennotationen eingesetzt. Die ATR-Engine Calamari, die eine ATR von frühen Buchdrucken und ebenso eine Erkennung von Liedtexten ermöglicht wird vorgestellt. Konfigurierbare CNN/LSTM-Netzwerkarchitekturen, die mit dem segmentierungsfreien CTC-loss trainiert werden, werden zur sequentiellen Texterkennung, aber auch einstimmiger Musik, eingesetzt. Abschließend wird ein Silben-zu-Neumen-Algorithmus vorgestellt, der dem letzten Schritt entspricht eine vollständige Transkription der Musik zu erhalten.

Die Evaluationen zeigen, dass die Performanz eines jeden Algorithmus hochgradig abhängig vom vorliegenden Material und der Anzahl der Trainingsbeispiele ist. Die vorgestellte Notenliniendetektion erkennt Notenlinien und -zeilen mit einem $F_1$-Wert von über 99,5%. Die Symbolerkennung erreichte eine diplomatische Symbolerkennungsrate (dSAR), die die Anzahl der korrekten Vorhersagen in der Symbolsequenz zählt und mit der Länge normalisiert, von über 90%. Die ATR von Liedtext erzielte eine Zeichengenauigkeit (CAR) (äquivalent zur Anzahl der korrekten Vorhersagen normalisiert durch die Sequenzlänge) von über 93% bei einem Training auf 771 Liedtextzeilen von mittelalterlichen Manuskripten und von 99,89%, wenn auf 3,5 Millionen Zeilen von moderner gedruckter Schrift trainiert wird. Die Zuordnung von Silben und den zugehörigen Neumen erreicht $F_1$-werte von über 99,2%. Ein direkter Vergleich zu bereits veröffentlichten Performanzen ist hierbei jedoch schwer, da mit verschiedenen Material und Metriken evaluiert wurde. Jedoch zeigen Abschätzungen, dass die Werte dieser Arbeit den aktuellen Stand der Technik darstellen.

Ein weiteres Ziel dieser Arbeit war es, Musikwissenschaftlern ohne technischen Hintergrund das Anwenden der entwickelten Algorithmen in einem vollständigen Workflow zu ermöglichen, indem eine benutzerfreundliche und komfortable graphische Benutzerschnittstelle (GUI) bereitgestellt wird, die die technischen Details kapselt. Zu diesem Zweck präsentiert diese Arbeit die Web-Applikation OMMR4all. Ihr voll funktionsfähiger Workflow inkludiert die vorgestellten Algorithmen gemäß dem aktuellen Stand der Technik und erlaubt optional manuell zu jedem Schritt einzugreifen, um die Ausgabe zur Vermeidung von Folgefehlern zu korrigieren. Zur Vereinfachung der manuellen (Nach-)Korrektur stellt OMMR4all einen Overlay-Editor zur Verfügung, der die Annotationen mit dem Scan des Originalmanuskripts überlagert, wodurch Fehler leicht erkannt werden können. Das Design des Workflows erlaubt iterative Verbesserungen, indem neue performantere Modelle trainiert werden können, sobald neue Ground Truth (GT) verfügbar ist.
KW  - Neumenschrift
KW  - Optische Zeichenerkennung (OCR)
KW  - Deep Learning
KW  - Optical Music Recognition
KW  - Neume Notation
KW  - Automatic Text Reconition
KW  - Optical Character Recognition
KW  - Deep Learning
KW  - Optische Musikerkennung (OMR)
KW  - Neumennotation
KW  - Automatische Texterkennung (ATR)
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-214348
ER  - 
TY  - JOUR
A1  - Wick, Christoph
A1  - Hartelt, Alexander
A1  - Puppe, Frank
T1  - Staff, symbol and melody detection of Medieval manuscripts written in square notation using deep Fully Convolutional Networks
JF  - Applied Sciences
N2  - Even today, the automatic digitisation of scanned documents in general, but especially the automatic optical music recognition (OMR) of historical manuscripts, still remains an enormous challenge, since both handwritten musical symbols and text have to be identified. This paper focuses on the Medieval so-called square notation developed in the 11th–12th century, which is already composed of staff lines, staves, clefs, accidentals, and neumes that are roughly spoken connected single notes. The aim is to develop an algorithm that captures both the neumes, and in particular its melody, which can be used to reconstruct the original writing. Our pipeline is similar to the standard OMR approach and comprises a novel staff line and symbol detection algorithm based on deep Fully Convolutional Networks (FCN), which perform pixel-based predictions for either staff lines or symbols and their respective types. Then, the staff line detection combines the extracted lines to staves and yields an F\(_1\) -score of over 99% for both detecting lines and complete staves. For the music symbol detection, we choose a novel approach that skips the step to identify neumes and instead directly predicts note components (NCs) and their respective affiliation to a neume. Furthermore, the algorithm detects clefs and accidentals. Our algorithm predicts the symbol sequence of a staff with a diplomatic symbol accuracy rate (dSAR) of about 87%, which includes symbol type and location. If only the NCs without their respective connection to a neume, all clefs and accidentals are of interest, the algorithm reaches an harmonic symbol accuracy rate (hSAR) of approximately 90%. In general, the algorithm recognises a symbol in the manuscript with an F\(_1\) -score of over 96%.
KW  - optical music recognition
KW  - historical document analysis
KW  - medieval manuscripts
KW  - neume notation
KW  - fully convolutional neural networks
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-197248
SN  - 2076-3417
VL  - 9
IS  - 13
ER  - 
TY  - JOUR
A1  - Reul, Christian
A1  - Christ, Dennis
A1  - Hartelt, Alexander
A1  - Balbach, Nico
A1  - Wehner, Maximilian
A1  - Springmann, Uwe
A1  - Wick, Christoph
A1  - Grundig, Christine
A1  - Büttner, Andreas
A1  - Puppe, Frank
T1  - OCR4all—An open-source tool providing a (semi-)automatic OCR workflow for historical printings
JF  - Applied Sciences
N2  - Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years, great progress has been made in the area of historical OCR, resulting in several powerful open-source tools for preprocessing, layout analysis and segmentation, character recognition, and post-processing. The drawback of these tools often is their limited applicability by non-technical users like humanist scholars and in particular the combined use of several tools in a workflow. In this paper, we present an open-source OCR software called OCR4all, which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow. While a variety of materials can already be processed fully automatically, books with more complex layouts require manual intervention by the users. This is mostly due to the fact that the required ground truth for training stronger mixed models (for segmentation, as well as text recognition) is not available, yet, neither in the desired quantity nor quality. To deal with this issue in the short run, OCR4all offers a comfortable GUI that allows error corrections not only in the final output, but already in early stages to minimize error propagations. In the long run, this constant manual correction produces large quantities of valuable, high quality training material, which can be used to improve fully automatic approaches. Further on, extensive configuration capabilities are provided to set the degree of automation of the workflow and to make adaptations to the carefully selected default parameters for specific printings, if necessary. During experiments, the fully automated application on 19th Century novels showed that OCR4all can considerably outperform the commercial state-of-the-art tool ABBYY Finereader on moderate layouts if suitably pretrained mixed OCR models are available. Furthermore, on very complex early printed books, even users with minimal or no experience were able to capture the text with manageable effort and great quality, achieving excellent Character Error Rates (CERs) below 0.5%. The architecture of OCR4all allows the easy integration (or substitution) of newly developed tools for its main components by standardized interfaces like PageXML, thus aiming at continual higher automation for historical printings.
KW  - optical character recognition
KW  - document analysis
KW  - historical printings
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-193103
SN  - 2076-3417
VL  - 9
IS  - 22
ER  -