TY - JOUR A1 - Buchheim, Mark A. A1 - Keller, Alexander A1 - Koetschan, Christian A1 - Förster, Frank A1 - Merget, Benjamin A1 - Wolf, Matthias T1 - Internal Transcribed Spacer 2 (nu ITS2 rRNA) Sequence-Structure Phylogenetics: Towards an Automated Reconstruction of the Green Algal Tree of Life JF - PLoS ONE N2 - Background: Chloroplast-encoded genes (matK and rbcL) have been formally proposed for use in DNA barcoding efforts targeting embryophytes. Extending such a protocol to chlorophytan green algae, though, is fraught with problems including non homology (matK) and heterogeneity that prevents the creation of a universal PCR toolkit (rbcL). Some have advocated the use of the nuclear-encoded, internal transcribed spacer two (ITS2) as an alternative to the traditional chloroplast markers. However, the ITS2 is broadly perceived to be insufficiently conserved or to be confounded by introgression or biparental inheritance patterns, precluding its broad use in phylogenetic reconstruction or as a DNA barcode. A growing body of evidence has shown that simultaneous analysis of nucleotide data with secondary structure information can overcome at least some of the limitations of ITS2. The goal of this investigation was to assess the feasibility of an automated, sequence-structure approach for analysis of IT2 data from a large sampling of phylum Chlorophyta. Methodology/Principal Findings: Sequences and secondary structures from 591 chlorophycean, 741 trebouxiophycean and 938 ulvophycean algae, all obtained from the ITS2 Database, were aligned using a sequence structure-specific scoring matrix. Phylogenetic relationships were reconstructed by Profile Neighbor-Joining coupled with a sequence structure-specific, general time reversible substitution model. Results from analyses of the ITS2 data were robust at multiple nodes and showed considerable congruence with results from published phylogenetic analyses. Conclusions/Significance: Our observations on the power of automated, sequence-structure analyses of ITS2 to reconstruct phylum-level phylogenies of the green algae validate this approach to assessing diversity for large sets of chlorophytan taxa. Moreover, our results indicate that objections to the use of ITS2 for DNA barcoding should be weighed against the utility of an automated, data analysis approach with demonstrated power to reconstruct evolutionary patterns for highly divergent lineages. KW - RBCL Gene-sequences KW - Colonial volvocales chlorophyta KW - 26S RDNA Data KW - Land plants KW - Molecular systematics KW - Secondary structure KW - Nuclear RDNA KW - DNA KW - Barcodes KW - Dasycladales chlorophyta KW - Profile distances Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-140866 VL - 6 IS - 2 ER - TY - JOUR A1 - Koetschan, Christian A1 - Foerster, Frank A1 - Keller, Alexander A1 - Schleicher, Tina A1 - Ruderisch, Benjamin A1 - Schwarz, Roland A1 - Mueller, Tobias A1 - Wolf, Matthias A1 - Schultz, Joerg T1 - The ITS2 Database III-sequences and structures for phylogeny N2 - The internal transcribed spacer 2 (ITS2) is a widely used phylogenetic marker. In the past, it has mainly been used for species level classifications. Nowadays, a wider applicability becomes apparent. Here, the conserved structure of the RNA molecule plays a vital role. We have developed the ITS2 Database (http://its2.bioapps .biozentrum.uni-wuerzburg.de) which holds information about sequence, structure and taxonomic classification of all ITS2 in GenBank. In the new version, we use Hidden Markov models (HMMs) for the identification and delineation of the ITS2 resulting in a major redesign of the annotation pipeline. This allowed the identification of more than 160 000 correct full ength and more than 50 000 partial structures. In the web interface, these can now be searched with a modified BLAST considering both sequence and structure, enabling rapid taxon sampling. Novel sequences can be annotated using the HMM based approach and modelled according to multiple template structures. Sequences can be searched for known and newly identified motifs. Together, the database and the web server build an exhaustive resource for ITS2 based phylogenetic analyses. KW - Biologie Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-68390 ER - TY - THES A1 - Koetschan, Christian T1 - The Eukaryotic ITS2 Database - A workbench for modelling RNA sequence-structure evolution T1 - Die Eukaryotische ITS2 Datenbank - Eine Plattform zur Modellierung von RNA Sequenzstruktur Evolution N2 - In den vergangenen Jahren etablierte sich der Marker „internal transcribed spacer 2" (ITS2) zu einem häufig genutzten Werkzeug in der molekularen Phylogenetik der Eukaryoten. Seine schnell evolvierende Sequenz eignet sich bestens für den Einsatz in niedrigeren phylogenetischen Ebenen. Die ITS2 faltet jedoch auch in eine sehr konservierte Sekundärstruktur. Diese ermöglicht die Unterscheidung weit entfernter Arten. Eine Kombination aus beiden in einer Sequenzstrukturanalyse verbessert die Auflösung des Markers und ermöglicht die Rekonstruktion von robusteren Bäumen auf höherer taxonomischer Breite. Jedoch war die Durchführung solch einer Analyse, die die Nutzung unterschiedlichster Programme und Datenbanken vorraussetzte, für den klassischen Biologen nicht einfach durchführbar. Um diese Hürde zu umgehen, habe ich den „ITS2 Workbench“ entwickelt, eine im Internet nutzbare Arbeitsplattform zur automatisierten sequenzstrukturbasierten phylogenetischen Analyse basierend auf der ITS2 (http://its2.bioapps.biozentrum.uni-wuerzburg.de). Die Entwicklung begann mit der Längenoptimierung unterschiedlicher „Hidden Markov Model“ (HMM)-Topologien, die erfolgreich auf ein Modell zur Sequenzstrukturvorhersage der ITS2 angewandt wurden. Hierbei wird durch die Analyse von Sequenzbestandteilen in Kombination mit der Längenverteilung verschiedener Helixregionen die Struktur vorhergesagt. Anschließend konnte ich HMMs auch bei der Sequenzstrukturgenerierung einsetzen um die ITS2 innerhalb einer gegebenen Sequenz zu lokalisieren. Dieses neu implementierte Verfahren verdoppelte die Anzahl vorhergesagter Strukturen und verkürzte die Laufzeit auf wenige Tage. Zusammen mit weiteren Optimierungen des Homologiemodellierungsprozesses kann ich nun erschöpfend Sekundärstrukturen in mehreren Interationen vorhersagen. Diese Optimierungen liefern derzeit 380.000 annotierte Sequenzen einschließlich 288.000 Strukturvorhersagen. Um diese Strukturen für die Berechnung von Alignments und phylogenetischen Bäumen zu verwenden hab ich das R-Paket „treeforge“ entwickelt. Es ermöglicht die Generierung von Sequenzstrukturalignments auf bis zu vier unterschiedlich kodierten Alphabeten. Damit können erstmals auch strukturelle Basenpaarungen in die Alignmentberechnung mit einbezogen werden, die eine Schätzung neuer Scorematrizen vorraussetzten. Das R-Paket ermöglicht zusätzlich die Rekonstruktion von „Maximum Parsimony“, „Maximum Likelihood“ und „Neighbour Joining“ Bäumen auf allen vier Alphabeten mittels weniger Zeilen Programmcode. Das Paket wurde eingesetzt, um die noch umstrittene Phylogenie der „chlorophyceae“ zu rekonstruieren und könnte in zukünftigen Versionen des ITS2 workbench verwendet werden. Die ITS2 Plattform basiert auf einer modernen und sehr umfangreichen Web 2.0 Oberfläche und beinhaltet neuste AJAX und Web-Service Technologien. Sie umfasst die HMM basierte Sequenzannotation, Strukturvorhersage durch Energieminimierung bzw. Homologiemodellierung, Alignmentberechnung und Baumrekonstruktion basierend auf einem flexiblen Datenpool, der Änderungen am Datensatz automatisch aktualisiert. Zusätzlich wird eine Detektion von Sequenzmotiven ermöglicht, die zur Kontrolle von Annotation und Strukturvorhersage dienen kann. Eine BLAST basierte Suche auf Sequenz- und Strukturebene bietet zusätzlich eine Vereinfachung des Taxonsamplings. Alle Funktionen sowie die Nutzung der ITS2 Webseite sind in einer kurzen Videoanleitung dargestellt. Die Plattform lässt jedoch nur eine bestimmte Größe von Datensätzen zu. Dies liegt vor allem an der erheblichen Rechenleistung, die bei diesen Berechnungen benötigt wird. Um die Funktion dieses Verfahrens auch auf großen Datenmengen zu demonstrieren, wurde eine voll automatisierte Rekonstruktion des Grünalgenbaumes (Chlorophyta) durchgeführt. Diese erfolgreiche, auf dem ITS2 Marker basierende Studie spricht für die Sequenz-Strukturanalyse auf weiteren Daten in der Phylogenetik. Hier bietet der ITS2 Workbench den idealen Ausgangspunkt. N2 - During the past years, the internal transcribed spacer 2 (ITS2) was established as a commonly used molecular phylogenetic marker for the eukaryotes. Its fast evolving sequence is predestinated for the use in low-level phylogenetics. However, the ITS2 also consists of a very conserved secondary structure. This enables the discrimination between more distantly related species. The combination of both in a sequence-structure based analysis increases the resolution of the marker and enables even more robust tree reconstructions on a broader taxonomic range. But, performing such an analysis required the application of different programs and databases making the use of the ITS2 non trivial for the typical biologist. To overcome this hindrance, I have developed the ITS2 Workbench, a completely web-based tool for automated phylogenetic sequence-structure analyses using the ITS2 (http://its2.bioapps.biozentrum.uni-wuerzburg.de). The development started with an optimization of length modelling topologies for Hidden Markov Models (HMMs), which were successfully applied on a secondary structure prediction model of the ITS2 marker. Here, structure is predicted by considering the sequences' composition in combination with the length distribution of different helical regions. Next, I integrated HMMs into the sequence-structure generation process for the delineation of the ITS2 within a given sequence. This re-implemented pipeline could more than double the number of structure predictions and reduce the runtime to a few days. Together with further optimizations of the homology modelling process I can now exhaustively predict secondary structures in several iterations. These modifications currently provide 380,000 annotated sequences including 288,000 structure predictions. To include these structures in the calculation of alignments and phylogenetic trees, I developed the R-package "treeforge". It generates sequence-structure alignments on up to four different coding alphabets. For the first time also structural bonds were considered in alignments, which required the estimation of new scoring matrices. Now, the reconstruction of Maximum Parsimony, Maximum Likelihood as well as Neighbour Joining trees on all four alphabets requires just a few lines of code. The package was used to resolve the controversial chlorophyceaen dataset and could be integrated into future versions of the ITS2 workbench. The platform is based on a modern, feature-rich Web 2.0 user interface equipped with the latest AJAX and Web-service technologies. It performs HMM-based sequence annotation, structure prediction by energy minimization or homology modelling, alignment calculation and tree reconstruction on a flexible data pool that repeats calculations according to data changes. Further, it provides sequence motif detection to control annotation and structure prediction and a sequence-structure based BLAST search, which facilitates the taxon sampling process. All features and the usage of the ITS2 workbench are explained in a video tutorial. However, the workbench bears some limitations regarding the size of datasets. This is caused mainly due to the immense computational power needed for such extensive calculations. To demonstrate the validity of the approach also for large-scale analyses, a fully automated reconstruction of the Chlorophyta (Green Algal) Tree of Life was performed. The successful application of the marker even on large datasets underlines the capabilities of ITS2 sequence-structure analysis and suggests its utilization on further datasets. The ITS2 workbench provides an excellent starting point for such endeavours. KW - Ribosomale RNA KW - Datenbank KW - Marker KW - Phylogenie KW - Evolution KW - Sequenz KW - Struktur KW - Hidden Markov Model KW - Evolution KW - ribosomal RNA KW - workbench KW - sequence-structure Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-73128 ER - TY - JOUR A1 - Merget, Benjamin A1 - Koetschan, Christian A1 - Hackl, Thomas A1 - Förster, Frank A1 - Dandekar, Thomas A1 - Müller, Tobias A1 - Schultz, Jörg A1 - Wolf, Matthias T1 - The ITS2 Database JF - Journal of Visual Expression N2 - The internal transcribed spacer 2 (ITS2) has been used as a phylogenetic marker for more than two decades. As ITS2 research mainly focused on the very variable ITS2 sequence, it confined this marker to low-level phylogenetics only. However, the combination of the ITS2 sequence and its highly conserved secondary structure improves the phylogenetic resolution1 and allows phylogenetic inference at multiple taxonomic ranks, including species delimitation. The ITS2 Database presents an exhaustive dataset of internal transcribed spacer 2 sequences from NCBI GenBank accurately reannotated. Following an annotation by profile Hidden Markov Models (HMMs), the secondary structure of each sequence is predicted. First, it is tested whether a minimum energy based fold (direct fold) results in a correct, four helix conformation. If this is not the case, the structure is predicted by homology modeling. In homology modeling, an already known secondary structure is transferred to another ITS2 sequence, whose secondary structure was not able to fold correctly in a direct fold. The ITS2 Database is not only a database for storage and retrieval of ITS2 sequence-structures. It also provides several tools to process your own ITS2 sequences, including annotation, structural prediction, motif detection and BLAST search on the combined sequence-structure information. Moreover, it integrates trimmed versions of 4SALE and ProfDistS for multiple sequence-structure alignment calculation and Neighbor Joining tree reconstruction. Together they form a coherent analysis pipeline from an initial set of sequences to a phylogeny based on sequence and secondary structure. In a nutshell, this workbench simplifies first phylogenetic analyses to only a few mouse-clicks, while additionally providing tools and data for comprehensive large-scale analyses. KW - homology modeling KW - molecular systematics KW - internal transcribed spacer 2 KW - alignment KW - genetics KW - secondary structure KW - ribosomal RNA KW - phylogenetic tree KW - phylogeny Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-124600 VL - 61 IS - e3806 ER - TY - JOUR A1 - Koetschan, Christian A1 - Kittelmann, Sandra A1 - Lu, Jingli A1 - Al-Halbouni, Djamila A1 - Jarvis, Graeme N. A1 - Müller, Tobias A1 - Wolf, Matthias A1 - Janssen, Peter H. T1 - Internal Transcribed Spacer 1 Secondary Structure Analysis Reveals a Common Core throughout the Anaerobic Fungi (Neocallimastigomycota) JF - PLOS ONE N2 - The internal transcribed spacer (ITS) is a popular barcode marker for fungi and in particular the ITS1 has been widely used for the anaerobic fungi (phylum Neocallimastigomycota). A good number of validated reference sequences of isolates as well as a large number of environmental sequences are available in public databases. Its highly variable nature predisposes the ITS1 for low level phylogenetics; however, it complicates the establishment of reproducible alignments and the reconstruction of stable phylogenetic trees at higher taxonomic levels (genus and above). Here, we overcame these problems by proposing a common core secondary structure of the ITS1 of the anaerobic fungi employing a Hidden Markov Model-based ITS1 sequence annotation and a helix-wise folding approach. We integrated the additional structural information into phylogenetic analyses and present for the first time an automated sequence-structure-based taxonomy of the ITS1 of the anaerobic fungi. The methodology developed is transferable to the ITS1 of other fungal groups, and the robust taxonomy will facilitate and improve high-throughput anaerobic fungal community structure analysis of samples from various environments. KW - profile distances KW - ITS2 KW - phylogenetic trees KW - RNA sequence KW - reconstruction KW - diversity KW - populations KW - tool KW - systematics KW - herbivores Y1 - 2014 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-117058 VL - 9 IS - 3 ER -