TY  - THES
A1  - Henny-Krahmer, Ulrike
T1  - Genre Analysis and Corpus Design: Nineteenth Century Spanish-American Novels (1830–1910)
T1  - Gattungsanalyse und Korpusaufbau: Hispanoamerikanische Romane im 19. Jahrhundert (1830–1910)
T1  - Análisis de género y diseño de corpus: Novelas hispanoamericanas del siglo XIX (1830–1910)
N2  - This work in the field of digital literary stylistics and computational literary studies is concerned with theoretical concerns of literary genre, with the design of a corpus of nineteenth-century Spanish-American novels, and with its empirical analysis in terms of subgenres of the novel. The digital text corpus consists of 256 Argentine, Cuban, and Mexican novels from the period between 1830 and 1910. It has been created with the goal to analyze thematic subgenres and literary currents that were represented in numerous novels in the nineteenth century by means of computational text categorization methods. The texts have been gathered from different sources, encoded in the standard of the Text Encoding Initiative (TEI), and enriched with detailed bibliographic and subgenre-related metadata, as well as with structural information.
To categorize the texts, statistical classification and a family resemblance analysis relying on network analysis are used with the aim to examine how the subgenres, which are understood as communicative, conventional phenomena, can be captured on the stylistic, textual level of the novels that participate in them. The result is that both thematic subgenres and literary currents are textually coherent to degrees of 70–90 %, depending on the individual subgenre constellation, meaning that the communicatively established subgenre classifications can be accurately captured to this extent in terms of textually defined classes.
Besides the empirical focus, the dissertation also aims to relate literary theoretical genre concepts to the ones used in digital genre stylistics and computational literary studies as subfields of digital humanities. It is argued that literary text types, conventional literary genres, and textual literary genres should be distinguished on a theoretical level to improve the conceptualization of genre for digital text analysis.
N2  - Diese Arbeit ist in den Forschungsfeldern der digitalen literaturwissenschaftlichen Stilistik und der Computational Literary Studies angesiedelt und setzt sich mit theoretischen Gattungsproblemen, mit der Erstellung eines Korpus von hispanoamerikanischen Romanen des 19. Jahrhunderts und mit ihrer empirischen Analyse nach Untergattungen auseinander. Das digitale Textkorpus umfasst 256 argentinische, kubanische und mexikanische Romane aus der Zeit von 1830 bis 1910 und ist mit dem Ziel erstellt worden, thematische Untergattungen und literarische Strömungen, die im 19. Jahrhundert durch zahlreiche Romane repräsentiert waren, mit Hilfe computergestützter Methoden der Textkategorisierung zu analysieren.
Um die Texte zu kategorisieren werden Verfahren der statistischen Klassifikation und eine Familienähnlichkeitsanalyse verwendet, die auf einer Netzwerkanalyse basiert. Das Ziel der Analysen ist es zu untersuchen inwieweit die Untergattungen, die primär als Phänomene der Kommunikation und Konvention verstanden werden, auf der stilistischen, textlichen Ebene der Romane, die an ihnen teilhaben, erfasst werden können. Das Ergebnis ist, dass sowohl die thematischen Untergattungen als auch die literarischen Strömungen zu 70–90 % textlich kohärent sind, in Abhängigkeit der gewählten Untergattungskonstellation, womit gemeint ist, dass die kommunikativ etablierten Untergattungsklassifikationen in diesem Maß an Genauigkeit auch als textlich definierte Klassen erfasst werden können.
Über die empirische Ausrichtung hinaus ist ein weiteres Ziel, literaturtheoretische Gattungskonzepte zu denjenigen in Beziehung zu setzen, die in der digitalen Gattungsstilistik als einer Teildisziplin der Digital Humanities verwendet werden. Es wird argumentiert, dass literarische Texttypen, konventionelle literarische Gattungen und textliche literarische Gattungen auf einer theoretischen Ebene unterschieden werden sollten, um die Konzeption von Gattung für die digitale Textanalyse zu verbessern.
N2  - Este trabajo en el campo de la estilística literaria digital y los estudios literarios computacionales se ocupa de las preocupaciones teóricas del género literario, del diseño de un corpus de novelas hispanoamericanas del siglo XIX y de su análisis empírico en términos de subgéneros de la novela. El corpus de textos digitales consta de 256 novelas argentinas, cubanas y mexicanas del período comprendido entre 1830 y 1910. Ha sido creado con el objetivo de analizar los subgéneros temáticos y las corrientes literarias que estaban representadas en numerosas novelas del siglo XIX mediante métodos de categorización computacional de textos.
Para la categorización de los textos se utiliza una clasificación estadística y un análisis de semejanza familiar basado en el análisis de redes, con el fin de examinar cómo los subgéneros, entendidos como fenómenos comunicativos y convencionales, pueden ser captados en el plano estilístico y textual de las novelas que participan en ellos. El resultado es que tanto los subgéneros temáticos como las corrientes literarias son textualmente coherentes en grados del 70–90 %, dependiendo de la constelación individual de subgéneros, lo que significa que las clasificaciones de subgéneros establecidas comunicativamente pueden ser capturadas con precisición hasta este punto en términos de clases textualmente definidas.
Además del enfoque empírico, la disertación también pretende relacionar los conceptos teóricos de género literario con los utilizados en la estilística de género digital y los estudios literarios computacionales como subcampos de las humanidades digitales. Se argumenta que los tipos de texto literario, los géneros literarios convencionales y los géneros literarios textuales deberían distinguirse a nivel teórico para mejorar la conceptualización del género para el análisis de textos digitales.
KW  - Gattungstheorie
KW  - Roman
KW  - Hispanoamerikanisch
KW  - Digital Humanities
KW  - 19. Jahrhundert
KW  - Nineteenth Century
KW  - Text analysis
KW  - Textanalyse
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-319992
ER  - 
TY  - THES
A1  - Prez, Julia
T1  - Immersion als Textfunktion? Sprachliche Praktiken der Spielerlenkung in der Textgrundlage von Computerspielen
T1  - Immersion as textual function? Linguistic practices of player guidance in computer games texts
T1  - L'immersion comme fonction textuelle? Pratiques linguistiques de l'orientation des joueurs dans la base textuelle des jeux vidéo
T1  - Погружение как текстовая функция? Лингвистические практики руководства игроком в текстовой основе компьютерных игр
N2  - Die Textfunktion beschreibt den vom Emittenten intendierten Effekt eines Textes auf den Rezipienten. Sachbücher etwa sind in erster Linie informativ, Werbeanzeigen appellativ, Testamente deklarativ, Verträge erfüllen eine Obligationsfunktion und Danksagungen eine Kontaktfunktion. Wie sieht es aber mit Computerspielen aus? Können diese als Texte auf ihre Textfunktion untersucht werden? Laut den Game Studies ist Immersion das erklärte Ziel der Spielentwickler, wobei Aufmerksamkeitslenkung eine bedeutende Rolle einnimmt. Ist denn Immersion auch linguistisch als Textfunktion nachweisbar? 
Um dies herauszufinden, werden Computerspiele – gemäß dem Textanalyseschema von Brinker, Cölfen und Pappert \(^8\)2014 – zunächst als Texte definiert. Im Rahmen dieser Analyse werden auch Kohärenz und Kohäsion untersucht und sprachliche Mittel werden als Indizien betrachtet, die auf die Funktion hinweisen. Im Fokus stehen dabei Mündlichkeit und Schriftlichkeit, emotionale Sprache, die Kodierung von Regeln und Herausforderungen sowie Referenzen auf das Interface.
Im Speziellen werden Adventure und Role Playing Games (im Offline- und Single Player Modus) als Textsorten untersucht, weil diese Spiele üblicherweise viel Text enthalten. Zur Textsortenabgrenzung wird zunächst ein Spiel genauestens mittels AntConc untersucht, um anschließend das gesamte Korpus (23 Spiele, 70.060 Types, 1.183.536 Tokens) unter Verwendung von LancsBox vergleichend zu analysieren. 
Zusammenfassend kann diese Masterarbeit als eine der ersten Studien eines vernachlässigten, aber gegenwärtigen und an Bedeutung gewinnenden Bereichs linguistischer Forschung betrachtet werden, der Linguistik, Computerspiele und Immersion zu verbinden versucht. Die Hypothese, dass es gewisse sprachliche Praktiken in Computerspiel-Texten gibt, anhand derer der Rezipient beeinflusst und gelenkt wird, um in das Spiel hineinzutauchen, konnte auf Basis des Korpus bestätigt werden.
N2  - The textual function describes how the creator of a text wants the recipient to understand it. A non-fiction book, for example, is informative, an advertisement is appellative, a testament is declarative, contracts fulfill an obligation and an acknowledgment establishes contact. How about computer games? Provided that they can be considered texts, can their textual function be described? According to Game Studies, the aim of game developers is for the player to be immersed in a game by controlling attention. Can immersion also be proven linguistically? 
In order to ascertain this, computer games - according to the text analysis scheme of Brinker, Cölfen and Pappert \(^8\)2014 - are first defined as texts. In the context of this analysis, coherence and cohesion are also examined, and linguistic devices are considered as indications that point to the textual function. The focus is on orality and writtenness, emotional language, the encoding of rules and challenges, and references to the interface.
Specifically, Adventure Games and Role Playing Games (offline and single player mode) are examined as text types because these games usually contain a significant amount of text. To delineate text types, one game is examined in detail using AntConc. Subsequently, the entire corpus (23 games, 70,060 types, 1,183,536 tokens) is analyzed comparatively using LancsBox.
In summary, this master thesis can be viewed as one of the first studies on a neglected but contemporary area of linguistic research that tries to combine linguistics, computer games and immersion. The hypothesis that there are certain linguistic practices in computer game texts by means of which the recipient is influenced and guided to immerse into the game could be confirmed on the basis of the corpus.
T3  - WespA. Würzburger elektronische sprachwissenschaftliche Arbeiten - 22 
KW  - Linguistik
KW  - Korpus <Linguistik>
KW  - Germanistik
KW  - Textlinguistik
KW  - Computerspiel
KW  - Immersion
KW  - Textfunktion
KW  - Textsorte
KW  - Textanalyse
KW  - Text
KW  - video game
KW  - Corpus (Linguistics)
KW  - Text (Linguistics)
KW  - Game Studies
KW  - ludology
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-243775
SN  - 978-3-945459-35-5
SN  - 1864-9238
ER  - 
TY  - JOUR
A1  - Aurast, Anna
A1  - Gradl, Tobias
A1  - Pernes, Stefan
A1  - Pielström, Steffen
T1  - Big Data und Smart Data in den Geisteswissenschaften
JF  - Bibliothek Forschung und Praxis
N2  - Kein Abstract verfügbar.
KW  - Textanalyse
KW  - unstrukturierte Daten
KW  - Natural Language Processing
KW  - Text analysis
KW  - unstructured data
KW  - natural language processing
Y1  - 2016
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-195237
SN  - 1865-7648
SN  - 0341-4183
N1  - Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG-geförderten) Allianz- bzw. Nationallizenz frei zugänglich.
VL  - 40
IS  - 2
ER  - 
TY  - THES
A1  - Krug, Markus
T1  - Techniques for the Automatic Extraction of Character Networks in German Historic Novels
T1  - Techniken zur automatischen Extraktion von Figurennetzwerken aus deutschen Romanen
N2  - Recent advances in Natural Language Preprocessing (NLP) allow for a fully automatic extraction of character networks for an incoming text. These networks serve as a compact and easy to grasp representation of literary fiction. They offer an aggregated view of the text, which can be used during distant reading approaches for the analysis of literary hypotheses. In their core, the networks consist of nodes, which represent literary characters, and edges, which represent relations between characters. For an automatic extraction of such a network, the first step is the detection of the references of all fictional entities that are of importance for a text. References to the fictional entities appear in the form of names, noun phrases and pronouns and prior to this work, no components capable of automatic detection of character references were available. Existing tools are only capable of detecting proper nouns, a subset of all character references. When evaluated on the task of detecting proper nouns in the domain of literary fiction, they still underperform at an F1-score of just about 50%. This thesis uses techniques from the field of semi-supervised learning, such as Distant supervision and Generalized Expectations, and improves the results of an existing tool to about 82%, when evaluated on all three categories in literary fiction, but without the need for annotated data in the target domain. However, since this quality is still not sufficient, the decision to annotate DROC, a corpus comprising 90 fragments of German novels was made. This resulted in a new general purpose annotation environment titled as ATHEN, as well as annotated data that spans about 500.000 tokens in total. Using this data, the combination of supervised algorithms and a tailored rule based algorithm, which in combination are able to exploit both - local consistencies as well as global consistencies - yield an algorithm with an F1-score of about 93%. This component is referred to as the Kallimachos tagger.

A character network can not directly display references however, instead they need to be clustered so that all references that belong to a real world or fictional entity are grouped together. This process widely known as coreference resolution is a hard problem in the focus of research for more than half a century. This work experimented with adaptations of classical feature based machine learning, with a dedicated rule based algorithm and with modern techniques of Deep Learning, but no approach can surpass 55% B-Cubed F1, when evaluated on DROC. Due to this barrier, many researchers do not use a fully-fledged coreference resolution when they extract character networks, but only focus on a more forgiving subset- the names. For novels such as Alice's Adventures in Wonderland by Lewis Caroll, this would however only result in a network in which many important characters are missing. In order to integrate important characters into the network that are not named by the author, this work makes use of automatic detection of speaker and addressees for direct speech utterances (all entities involved in a dialog are considered to be of importance). This problem is by itself not an easy task, however the most successful system analysed in this thesis is able to correctly determine the speaker to about 85% of the utterances as well as about 65% of the addressees. This speaker information can not only help to identify the most dominant characters, but also serves as a way to model the relations between entities.

During the span of this work, components have been developed to model relations between characters using speaker attribution, using co-occurrences as well as by the usage of true interactions, for which yet again a dataset was annotated using ATHEN. Furthermore, since relations between characters are usually typed, a component for the extraction of a typed relation was developed. Similar to the experiments for the character reference detection, a combination of a rule based and a Maximum Entropy classifier yielded the best overall results, with the extraction of family relations showing a score of about 80% and the quality of love relations with a score of about 50%. For family relations, a kernel for a Support Vector Machine was developed that even exceeded the scores of the combined approach but is behind on the other labels.

In addition, this work presents new ways to evaluate automatically extracted networks without the need of domain experts, instead it relies on the usage of expert summaries. It also refrains from the uses of social network analysis for the evaluation, but instead presents ranked evaluations using Precision@k and the Spearman Rank correlation coefficient for the evaluation of the nodes and edges of the network. An analysis using these metrics showed, that the central characters of a novel are contained with high probability but the quality drops rather fast if more than five entities are analyzed. The quality of the edges is mainly dominated by the quality of the coreference resolution and the correlation coefficient between gold edges and system edges therefore varies between 30 and 60%. 

All developed components are aggregated alongside a large set of other preprocessing modules in the Kallimachos pipeline and can be reused without any restrictions.
N2  - Techniken zur automatischen Extraktion von Figurennetzwerken aus deutschen Romanen
KW  - Textanalyse
KW  - Character Networks
KW  - Coreference
KW  - Character Reference Detection
KW  - Relation Detection
KW  - Quotation Attribution
KW  - Netzwerkanalyse <Soziologie>
KW  - Digital Humanities
KW  - Netzwerk
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-209186
ER  - 
TY  - JOUR
A1  - Schöch, Christof
T1  - Ein digitales Textformat für die Literaturwissenschaften. Die Richtlinien der Text Encoding Initiative und ihr Nutzen für Textedition und Textanalyse
JF  - Romanische Studien
N2  - Die stetig voranschreitende Digitalisierung literarischer Texte verschiedenster Sprachen, Epochen und Gattungen stellt die Literaturwissenschaften immer wieder vor die Frage, wie sie diese Entwicklung mitgestalten und zu ihrem Vorteil nutzen können. Dabei ist digital nicht gleich digital, sondern es existiert eine Vielzahl sehr unterschiedlicher, digitaler Repräsentationsformen von Text. Nur wenige dieser Repräsentationsformen werden literaturwissenschaftlichen Anforderungen tatsächlich gerecht, darunter diejenige, die den Richtlinien der Text Encoding Initiative folgt. Der vorliegende Beitrag vergleicht zunächst einige derzeit gängige digitale Repräsentationsformen von Text. Für literaturwissenschaftliche Forschung besonders geeignet erweist sich hierbei eine Repräsentationsform, die den Richtlinien der Text Encoding Initiative folgt. Daher informiert der Beitrag anschließend über deren Nutzen für die literaturwissenschaftliche Arbeit, sowohl im Bereich der wissenschaftlichen Textedition als auch im Bereich der Analyse und Interpretation von Texten. Nur wenn die Literaturwissenschaften in ihrer Breite den Nutzen von offenen, expressiven, flexiblen und standardisierten, langfristig nutzbaren Formaten für die Forschung erkennen, können sie sich mit dem erforderlichen Nachdruck für deren Verbreitung einsetzen und durch die zunehmende Verfügbarkeit von Texten in solchen Formaten für die eigene Forschung und Lehre davon profitieren.
KW  - Digital Humanities
KW  - Text Encoding Initiative
KW  - Textedition
KW  - Textanalyse
Y1  - 2016
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-171351
VL  - 4
ER  - 
TY  - THES
A1  - Schönherr, Monika
T1  - Modalität und Modalitätsausdrücke in althochdeutschen Bibeltexten. Eine korpusgestützte Analyse
T1  - Modality and Expressions of Modality in Old High German Bible Translations. A Corpus Based Analysis
N2  - Die vorliegende Abhandlung leistet einen Beitrag zu der noch längst nicht genügend untersuchten Modalitätsproblematik in althochdeutschen Bibeltexten. Bei den bisherigen historischen Untersuchungen wurden die kommunikativen Bedingungen für das Vorkommen der Modalitätsausdrücke nicht oder nicht hinreichend genug beachtet. Daher hat sich die Arbeit zum Ziel gesetzt, die sprachlichen Ausdrucksformen der Modalität unter dem kommunikativen Gesichtspunkt zu behandeln und neue Punkte zum Thema "Modalität im Althochdeutschen" zur Diskussion zu stellen. Die Analyse ist diskursiv angelegt: Die jeweiligen Modalitätsformen werden auf dem Hintergrund des gesamten Textes dargestellt und aus dem jeweiligen Kontext heraus erklärt. Als Textgrundlage fungiert das selbst erstellte historische Korpus der zwei biblischen Evangelienharmonien aus dem 9. Jahrhundert (der ahd. 'Tatian' und 'Otfrids Evangelienbuch').
N2  - This work addresses the problem of modality in Old High German bible texts. In historical research so far, the communicative circumstances for the appearance of expressions of modality have not been taken into account or at least not enough. Thus, the aim of this work is, to look at linguistic expressions of modality from a communicative point of view and to open up the discussion about some new issues within the field of "Modality in Old High German". The analysis takes on a discoursal stance: each form of modality is described on the background of the text as a whole und explained in each individual context. The text samples scrutinized are two bible texts from the 9th century, namely the old high german 'Tatian' (Diatesseron) and 'Otfrids Evangelienbuch' (Otfrid von Weißenburg’s Gospel Book).
T3  - WespA. Würzburger elektronische sprachwissenschaftliche Arbeiten - 7 
KW  - Modalität <Linguistik>
KW  - Subjektivität
KW  - Gefühl
KW  - Textanalyse
KW  - Funktionale Grammatik
KW  - Korpus <Linguistik>
KW  - Althochdeutsch
KW  - Bibel
KW  - modality <linguistics>
KW  - subjectivity
KW  - emotionality
KW  - text analysis
KW  - functional grammar
KW  - corpus <linguistics>
KW  - Old High German
KW  - bible
Y1  - 2009
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-46904
SN  - 978-3-923959-62-4
ER  - 
TY  - THES
A1  - Binder, Kristina
T1  - Das Starinterview : eine vergleichende Textanalyse von  Presse-, Hörfunk-, Fernseh- und Chatinterview
T1  - Star interviews. A comparative textlinguistic analysis of press-, radio-, tv- and chat-interviews.
N2  - No abstract available
KW  - Interview
KW  - Massenmedien
KW  - Textanalyse
KW  - Textlinguistik
KW  - Textanalyse
KW  - Interview
KW  - Medien
KW  - Textlinguistik
KW  - Medium
KW  - interview
KW  - media
KW  - textlinguistics
Y1  - 2005
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-13255
ER  - 
TY  - THES
A1  - Beckmann, Pia
T1  - Schwangerschaftsabbruch als sprachliches Problem : eine linguistische Textanalyse ausgewählter Gesetzentwürfe zur Reform des § 218 StGB
N2  - Die Arbeit stellt eine linguistische Textanalyse von Gesetzentwürfen zur Reform des § 218 StGB aus dem Jahr 1991 dar. Mitspieler (Aktanten) in den jeweiligen Handlungsframes der Gesetzentwürfe sind die Frau als Entscheidungsträgerin für die Handlung "Schwangerschaftsabbruch", der Arzt, der den Schwangerschaftsabbruch vornimmt, und der Embryo, der abgetrieben wird. Unter Rückgriff auf die Prototypensemantik werden die Handlung und die Aktanten in dem jeweiligen Textzusammenhang der einzelnen Gesetzentwürfe analysiert (inclusive einer Frequenzanalyse). Die anschließende Argumentationsanalyse folgt dem Erklärungsmodell von Toulmin zu alltagssprachlichen Argumentationen unter Berücksichtigung der Argumentationsmuster von Kienpointner.
KW  - Deutsch
KW  - Gesetzesvorlage
KW  - Textlinguistik
KW  - Schwangerschaftsabbruch
KW  - Schwangerschaftsabbruch
KW  - Gesetzentwürfe
KW  - Textanalyse
KW  - semantische Analyse
KW  - Argumentationsanalyse
KW  - abortion
KW  - text analysis
Y1  - 2004
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-9989
ER  -