Design and Implementation of Architectures for Interactive Textual Documents Collation Systems

Nassourou, Mohamadou

One of the main purposes of textual documents collation is to identify a base text or closest witness to the base text, by analyzing and interpreting differences also known as types of changes that might exist between those documents. Based on this fact, it is reasonable to argue that, explicit identification of types of changes such as deletions, additions, transpositions, and mutations should be part of the collation process. The identification could be carried out by an interpretation module after alignment has taken place. UnfortunatelyOne of the main purposes of textual documents collation is to identify a base text or closest witness to the base text, by analyzing and interpreting differences also known as types of changes that might exist between those documents. Based on this fact, it is reasonable to argue that, explicit identification of types of changes such as deletions, additions, transpositions, and mutations should be part of the collation process. The identification could be carried out by an interpretation module after alignment has taken place. Unfortunately existing collation software such as CollateX1 and Juxta2’s collation engine do not have interpretation modules. In fact they implement the Gothenburg model [1] for collation process which does not include an interpretation unit. Currently both CollateX and Juxta’s collation engine do not distinguish in their critical apparatus between the types of changes, and do not offer statistics about those changes. This paper presents a model for both integrated and distributed collation processes that improves the Gothenburg model. The model introduces an interpretation component for computing and distinguishing between the types of changes that documents could have undergone. Moreover two architectures implementing the model in order to solve the problem of interactive collation are discussed as well. Each architecture uses CollateX library, and provides on the one hand preprocessing functions for transforming input documents into CollateX input format, and on the other hand a post-processing module for enabling interactive collation. Finally simple algorithms for distinguishing between types of changes, and linking collated source documents with the collation results are also introduced.… zeige mehr

Autor(en):	Mohamadou Nassourou
URN:	urn:nbn:de:bvb:20-opus-56601
Dokumentart:	Preprint (Vorabdruck)
Institute der Universität:	Philosophische Fakultät (Histor., philolog., Kultur- und geograph. Wissensch.) / Institut für deutsche Philologie
Sprache der Veröffentlichung:	Englisch
Erscheinungsjahr:	2011
Allgemeine fachliche Zuordnung (DDC-Klassifikation):	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Normierte Schlagworte (GND):	Softwarearchitektur; Textvergleich
Freie Schlagwort(e):	Gothenburg model of collation process; interactive collation of textual variants; service based software architecture; service brokerage
Datum der Freischaltung:	17.05.2011
Lizenz (Deutsch):	Deutsches Urheberrecht

Design and Implementation of Architectures for Interactive Textual Documents Collation Systems

Volltext Dateien herunterladen

Metadaten exportieren

Weitere Dienste