• Treffer 2 von 11
Zurück zur Trefferliste

Line-level layout recognition of historical documents with background knowledge

Zitieren Sie bitte immer diese URN: urn:nbn:de:bvb:20-opus-310938
  • Digitization and transcription of historic documents offer new research opportunities for humanists and are the topics of many edition projects. However, manual work is still required for the main phases of layout recognition and the subsequent optical character recognition (OCR) of early printed documents. This paper describes and evaluates how deep learning approaches recognize text lines and can be extended to layout recognition using background knowledge. The evaluation was performed on five corpora of early prints from the 15th and 16thDigitization and transcription of historic documents offer new research opportunities for humanists and are the topics of many edition projects. However, manual work is still required for the main phases of layout recognition and the subsequent optical character recognition (OCR) of early printed documents. This paper describes and evaluates how deep learning approaches recognize text lines and can be extended to layout recognition using background knowledge. The evaluation was performed on five corpora of early prints from the 15th and 16th Centuries, representing a variety of layout features. While the main text with standard layouts could be recognized in the correct reading order with a precision and recall of up to 99.9%, also complex layouts were recognized at a rate as high as 90% by using background knowledge, the full potential of which was revealed if many pages of the same source were transcribed.zeige mehrzeige weniger

Volltext Dateien herunterladen

Metadaten exportieren

Weitere Dienste

Teilen auf Twitter Suche bei Google Scholar Statistik - Anzahl der Zugriffe auf das Dokument
Metadaten
Autor(en): Norbert Fischer, Alexander Hartelt, Frank Puppe
URN:urn:nbn:de:bvb:20-opus-310938
Dokumentart:Artikel / Aufsatz in einer Zeitschrift
Institute der Universität:Fakultät für Mathematik und Informatik / Institut für Informatik
Sprache der Veröffentlichung:Englisch
Titel des übergeordneten Werkes / der Zeitschrift (Englisch):Algorithms
ISSN:1999-4893
Erscheinungsjahr:2023
Band / Jahrgang:16
Heft / Ausgabe:3
Aufsatznummer:136
Originalveröffentlichung / Quelle:Algorithms (2023) 16:3, 136. https://doi.org/10.3390/a16030136
DOI:https://doi.org/10.3390/a16030136
Allgemeine fachliche Zuordnung (DDC-Klassifikation):0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Freie Schlagwort(e):background knowledge; baseline detection; fully convolutional neural networks; historical document analysis; layout recognition; text line detection
Datum der Freischaltung:07.03.2024
Datum der Erstveröffentlichung:03.03.2023
Lizenz (Deutsch):License LogoCC BY: Creative-Commons-Lizenz: Namensnennung 4.0 International