TY - INPR A1 - Nassourou, Mohamadou T1 - Design and Implementation of Architectures for Interactive Textual Documents Collation Systems N2 - One of the main purposes of textual documents collation is to identify a base text or closest witness to the base text, by analyzing and interpreting differences also known as types of changes that might exist between those documents. Based on this fact, it is reasonable to argue that, explicit identification of types of changes such as deletions, additions, transpositions, and mutations should be part of the collation process. The identification could be carried out by an interpretation module after alignment has taken place. Unfortunately existing collation software such as CollateX1 and Juxta2’s collation engine do not have interpretation modules. In fact they implement the Gothenburg model [1] for collation process which does not include an interpretation unit. Currently both CollateX and Juxta’s collation engine do not distinguish in their critical apparatus between the types of changes, and do not offer statistics about those changes. This paper presents a model for both integrated and distributed collation processes that improves the Gothenburg model. The model introduces an interpretation component for computing and distinguishing between the types of changes that documents could have undergone. Moreover two architectures implementing the model in order to solve the problem of interactive collation are discussed as well. Each architecture uses CollateX library, and provides on the one hand preprocessing functions for transforming input documents into CollateX input format, and on the other hand a post-processing module for enabling interactive collation. Finally simple algorithms for distinguishing between types of changes, and linking collated source documents with the collation results are also introduced. KW - Softwarearchitektur KW - Textvergleich KW - service based software architecture KW - service brokerage KW - interactive collation of textual variants KW - Gothenburg model of collation process Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-56601 ER - TY - INPR A1 - Nassourou, Mohamadou T1 - Computer-based Textual Documents Collation System for Reconstructing the Original Text from Automatically Identified Base Text and Ranked Witnesses N2 - Given a collection of diverging documents about some lost original text, any person interested in the text would try reconstructing it from the diverging documents. Whether it is eclecticism, stemmatics, or copy-text, one is expected to explicitly or indirectly select one of the documents as a starting point or as a base text, which could be emended through comparison with remaining documents, so that a text that could be designated as the original document is generated. Unfortunately the process of giving priority to one of the documents also known as witnesses is a subjective approach. In fact even Cladistics, which could be considered as a computer-based approach of implementing stemmatics, does not present or recommend users to select a certain witness as a starting point for the process of reconstructing the original document. In this study, a computational method using a rule-based Bayesian classifier is used, to assist text scholars in their attempts of reconstructing a non-existing document from some available witnesses. The method developed in this study consists of selecting a base text successively and collating it with remaining documents. Each completed collation cycle stores the selected base text and its closest witness, along with a weighted score of their similarities and differences. At the end of the collation process, a witness selected more often by majority of base texts is considered as the probable base text of the collection. Witnesses’ scores are weighted using a weighting system, based on effects of types of textual modifications on the process of reconstructing original documents. Users have the possibility to select between baseless and base text collation. If a base text is selected, the task is reduced to ranking the witnesses with respect to the base text, otherwise a base text as well as ranking of the witnesses with respect to the base text are computed and displayed on a bar diagram. Additionally this study includes a recursive algorithm for automatically reconstructing the original text from the identified base text and ranked witnesses. KW - Textvergleich KW - Text Mining KW - Textual document collation KW - Base text KW - Reconstruction of original text KW - Gothenburg model KW - Bayesian classifier KW - Textual alterations weighting system Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-65749 ER - TY - INPR A1 - Nassourou, Mohamadou T1 - A Rule-based Statistical Classifier for Determining a Base Text and Ranking Witnesses In Textual Documents Collation Process N2 - Given a collection of diverging documents about some lost original text, any person interested in the text would try reconstructing it from the diverging documents. Whether it is eclecticism, stemmatics, or copy-text, one is expected to explicitly or indirectly select one of the documents as a starting point or as a base text, which could be emended through comparison with remaining documents, so that a text that could be designated as the original document is generated. Unfortunately the process of giving priority to one of the documents also known as witnesses is a subjective approach. In fact even Cladistics, which could be considered as a computer-based approach of implementing stemmatics, does not present or recommend users to select a certain witness as a starting point for the process of reconstructing the original document. In this study, a computational method using a rule-based Bayesian classifier is used, to assist text scholars in their attempts of reconstructing a non-existing document from some available witnesses. The method developed in this study consists of selecting a base text successively and collating it with remaining documents. Each completed collation cycle stores the selected base text and its closest witness, along with a weighted score of their similarities and differences. At the end of the collation process, a witness selected more often by majority of base texts is considered as the probable base text of the collection. Witnesses’ scores are weighted using a weighting system, based on effects of types of textual modifications on the process of reconstructing original documents. Users have the possibility to select between baseless and base text collation. If a base text is selected, the task is reduced to ranking the witnesses with respect to the base text, otherwise a base text as well as ranking of the witnesses with respect to the base text are computed and displayed on a histogram. KW - Textvergleich KW - Text Mining KW - Gothenburg Modell KW - Bayes-Klassifikator KW - Textual document collation KW - Base text KW - Gothenburg model KW - Bayesian classifier KW - Textual alterations weighting system Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-57465 ER -