Refine
Has Fulltext
- yes (25) (remove)
Year of publication
- 2011 (25) (remove)
Document Type
- Preprint (11)
- Book article / Book chapter (9)
- Doctoral Thesis (2)
- Book (1)
- Master Thesis (1)
- Review (1)
Keywords
- Quran (7)
- Koran (6)
- Text Mining (6)
- Bayesian classifier (3)
- Textvergleich (3)
- Base text (2)
- Content Management (2)
- Gothenburg model (2)
- Knowledge Management (2)
- Maschinelles Lernen (2)
Institute
- Institut für deutsche Philologie (25) (remove)
Schriftenreihe
Sonstige beteiligte Institutionen
Jean Pauls Registerbände
(2011)
Die Dissertation befasst sich mit den Registerbänden Jean Pauls. Diese stellen eine komprimierte Fassung der umfangreichen Exzerptbände dar, die auf mehr als 12.000 Seiten vorliegen. Jean Paul ordnet seine Exzerpte unter zuvor ausgewählten Oberbegriffen, und fasst diese in den Registerbänden zusammen.
Diese Veröffentlichung ist eine Einführung in die syntaktischen Strukturen der deutschen Gegenwartssprache und deckt folgende Gebiete ab: Satzdefinition, Wortarten, Topologie deutscher Sätze, valenzabhängige und -unabhängige Satzglieder (Ergänzungen und Angaben), Funktion und Semantik von Dativ- und Genitivkonstruktionen, Hilfs-, Modal- und Modalitätsverben, Funktionsverbgefüge und verbale Wendungen, reflexive Konstruktionen, komplexe Sätze und Satzglieder, Passivkonstruktionen, Temporalität sowie Modalität.
Given a collection of diverging documents about some lost original text, any person interested in the text would try reconstructing it from the diverging documents. Whether it is eclecticism, stemmatics, or copy-text, one is expected to explicitly or indirectly select one of the documents as a starting point or as a base text, which could be emended through comparison with remaining documents, so that a text that could be designated as the original document is generated. Unfortunately the process of giving priority to one of the documents also known as witnesses is a subjective approach. In fact even Cladistics, which could be considered as a computer-based approach of implementing stemmatics, does not present or recommend users to select a certain witness as a starting point for the process of reconstructing the original document. In this study, a computational method using a rule-based Bayesian classifier is used, to assist text scholars in their attempts of reconstructing a non-existing document from some available witnesses. The method developed in this study consists of selecting a base text successively and collating it with remaining documents. Each completed collation cycle stores the selected base text and its closest witness, along with a weighted score of their similarities and differences. At the end of the collation process, a witness selected more often by majority of base texts is considered as the probable base text of the collection. Witnesses’ scores are weighted using a weighting system, based on effects of types of textual modifications on the process of reconstructing original documents. Users have the possibility to select between baseless and base text collation. If a base text is selected, the task is reduced to ranking the witnesses with respect to the base text, otherwise a base text as well as ranking of the witnesses with respect to the base text are computed and displayed on a histogram.
Learning a book in general involves reading it, underlining important words, adding comments, summarizing some passages, and marking up some text or concepts. Once deeper understanding is achieved, one would like to organize and manage her/his knowledge in such a way that, it could be easily remembered and efficiently transmitted to others. In this paper, books organized in terms of chapters consisting of verses, are considered as the source of knowledge to be modeled. The knowledge model consists of verses with their metadata and semantic annotations. The metadata represent the multiple perspectives of knowledge modeling. Verses with their metadata and annotations form a meta-model, which will be published on a web Mashup. The meta-model with linking between its elements constitute a knowledge base. An XML-based annotation system breaking down the learning process into specific tasks, helps constructing the desired meta-model. The system is made up of user interfaces for creating metadata, annotating chapters’ contents according to user selected semantics, and templates for publishing the generated knowledge on the Internet. The proposed software system improves comprehension and retention of knowledge contained in religious texts through modeling and visualization. The system has been applied to the Quran, and the result obtained shows that multiple perspectives of information modeling can be successfully applied to religious texts. It is expected that this short ongoing study would motivate others to engage in devising and offering software systems for cross-religions learning.
Design and Implementation of Architectures for Interactive Textual Documents Collation Systems
(2011)
One of the main purposes of textual documents collation is to identify a base text or closest witness to the base text, by analyzing and interpreting differences also known as types of changes that might exist between those documents. Based on this fact, it is reasonable to argue that, explicit identification of types of changes such as deletions, additions, transpositions, and mutations should be part of the collation process. The identification could be carried out by an interpretation module after alignment has taken place. Unfortunately existing collation software such as CollateX1 and Juxta2’s collation engine do not have interpretation modules. In fact they implement the Gothenburg model [1] for collation process which does not include an interpretation unit. Currently both CollateX and Juxta’s collation engine do not distinguish in their critical apparatus between the types of changes, and do not offer statistics about those changes. This paper presents a model for both integrated and distributed collation processes that improves the Gothenburg model. The model introduces an interpretation component for computing and distinguishing between the types of changes that documents could have undergone. Moreover two architectures implementing the model in order to solve the problem of interactive collation are discussed as well. Each architecture uses CollateX library, and provides on the one hand preprocessing functions for transforming input documents into CollateX input format, and on the other hand a post-processing module for enabling interactive collation. Finally simple algorithms for distinguishing between types of changes, and linking collated source documents with the collation results are also introduced.
The Quran is the holy book of Islam consisting of 6236 verses divided into 114 chapters called suras. Many verses are similar and even identical. Searching for similar texts (e.g verses) could return thousands of verses, that when displayed completely or partly as textual list would make analysis and understanding difficult and confusing. Moreover it would be visually impossible to instantly figure out the overall distribution of the retrieved verses in the Quran. As consequence reading and analyzing the verses would be tedious and unintuitive. In this study a combination of interactive scatter plots and tables has been developed to assist analysis and understanding of the search result. Retrieved verses are clustered by chapters, and a weight is assigned to each cluster according to number of verses it contains, so that users could visually identify most relevant areas, and figure out the places of revelation of the verses. Users visualize the complete result and can select a region of the plot to zoom in, click on a marker to display a table containing verses with English translation side by side.
A Knowledge-based Hybrid Statistical Classifier for Reconstructing the Chronology of the Quran
(2011)
Computationally categorizing Quran’s chapters has been mainly confined to the determination of chapters’ revelation places. However this broad classification is not sufficient to effectively and thoroughly understand and interpret the Quran. The chronology of revelation would not only improve comprehending the philosophy of Islam, but also the easiness of implementing and memorizing its laws and recommendations. This paper attempts estimating possible chapters’ dates of revelation through their lexical frequency profiles. A hybrid statistical classifier consisting of stemming and clustering algorithms for comparing lexical frequency profiles of chapters, and deriving dates of revelation has been developed. The classifier is trained using some chapters with known dates of revelation. Then it classifies chapters with uncertain dates of revelation by computing their proximity to the training ones. The results reported here indicate that the proposed methodology yields usable results in estimating dates of revelation of the Quran’s chapters based on their lexical contents.
This paper discusses the categorization of Quranic chapters by major phases of Prophet Mohammad’s messengership using machine learning algorithms. First, the chapters were categorized by places of revelation using Support Vector Machine and naïve Bayesian classifiers separately, and their results were compared to each other, as well as to the existing traditional Islamic and western orientalists classifications. The chapters were categorized into Meccan (revealed in Mecca) and Medinan (revealed in Medina). After that, chapters of each category were clustered using a kind of fuzzy-single linkage clustering approach, in order to correspond to the major phases of Prophet Mohammad’s life. The major phases of the Prophet’s life were manually derived from the Quranic text, as well as from the secondary Islamic literature e.g hadiths, exegesis. Previous studies on computing the places of revelation of Quranic chapters relied heavily on features extracted from existing background knowledge of the chapters. For instance, it is known that Meccan chapters contain mostly verses about faith and related problems, while Medinan ones encompass verses dealing with social issues, battles…etc. These features are by themselves insufficient as a basis for assigning the chapters to their respective places of revelation. In fact, there are exceptions, since some chapters do contain both Meccan and Medinan features. In this study, features of each category were automatically created from very few chapters, whose places of revelation have been determined through identification of historical facts and events such as battles, migration to Medina…etc. Chapters having unanimously agreed places of revelation were used as the initial training set, while the remaining chapters formed the testing set. The classification process was made recursive by regularly augmenting the training set with correctly classified chapters, in order to classify the whole testing set. Each chapter was preprocessed by removing unimportant words, stemming, and representation with vector space model. The result of this study shows that, the two classifiers have produced useable results, with an outperformance of the support vector machine classifier. This study indicates that, the proposed methodology yields encouraging results for arranging Quranic chapters by phases of Prophet Mohammad’s messengership.
Hauptgegenstand dieser Arbeit ist die Untersuchung des Phänomens der mündlichen Ver-wendung von im Chat gebräuchlichen akronymischen Kurzformen wie lol und omg im Deut-schen. Da die Chatkommunikation trotz ihrer schriftlichen Realisierung einige Merkmale mündlicher Kommunikation aufweist, scheint eine Integration der zunächst rein graphischen Kürzungen in die gesprochene Sprache außerhalb des Chats nicht abwegig. Darüber hinaus lassen sich in jüngster Zeit sowohl eine Flexibilisierung der Verwendungsweise der Kürzel als auch Wortbildungsprozesse mithilfe der entsprechenden Formen konstatieren. Es handelt sich dabei um ein jugendsprachliches Phänomen; dies gilt vor allem für den Bereich der Wortbildung. In dieser Arbeit werden die Ergebnisse einer empirischen Erhebung der Relevanz sechs gängiger Kürzel und abgeleiteter Formen vorgestellt und interpretiert. Darüber hinaus erfolgt eine Bestandsaufnahme aller analysierten Formen in Standardnachschlagewerken und diversen Wörterbüchern zu den Soziolekten der Jugend- und Internetsprache.
Der Begriff Germanen ist eine Fremdbezeichnung griechisch-römischer Autoren der Antike. Die so bezeichneten Gruppen hatten aber keine gemeinsame germanische Identität. Die Germanen wurden schon in der Antike als mächtige Gegner stilisiert, was wiederum im Mittelalter im Zuge der Staatenbildungen gerne in den schriftlichen Quellen aufgegriffen wurde. Retrospektiv kann keine "Ursprache" oder "Urheimat" der Germanen rekonstruiert werden. In der Archäologie gibt es jedoch aufgrund des Fundmaterials Kulturräume einer materiellen Kultur, die als germanisch interpretiert werden. Diese sind jedoch nicht mit einer "germanischen Ethnie" zu verwechseln.