OPUS Würzburg

1 Treffer

1 bis 1

The Eukaryotic ITS2 Database - A workbench for modelling RNA sequence-structure evolution (2012)

During the past years, the internal transcribed spacer 2 (ITS2) was established as a commonly used molecular phylogenetic marker for the eukaryotes. Its fast evolving sequence is predestinated for the use in low-level phylogenetics. However, the ITS2 also consists of a very conserved secondary structure. This enables the discrimination between more distantly related species. The combination of both in a sequence-structure based analysis increases the resolution of the marker and enables even more robust tree reconstructions on a broader taxonomic range. But, performing such an analysis required the application of different programs and databases making the use of the ITS2 non trivial for the typical biologist. To overcome this hindrance, I have developed the ITS2 Workbench, a completely web-based tool for automated phylogenetic sequence-structure analyses using the ITS2 (http://its2.bioapps.biozentrum.uni-wuerzburg.de). The development started with an optimization of length modelling topologies for Hidden Markov Models (HMMs), which were successfully applied on a secondary structure prediction model of the ITS2 marker. Here, structure is predicted by considering the sequences' composition in combination with the length distribution of different helical regions. Next, I integrated HMMs into the sequence-structure generation process for the delineation of the ITS2 within a given sequence. This re-implemented pipeline could more than double the number of structure predictions and reduce the runtime to a few days. Together with further optimizations of the homology modelling process I can now exhaustively predict secondary structures in several iterations. These modifications currently provide 380,000 annotated sequences including 288,000 structure predictions. To include these structures in the calculation of alignments and phylogenetic trees, I developed the R-package "treeforge". It generates sequence-structure alignments on up to four different coding alphabets. For the first time also structural bonds were considered in alignments, which required the estimation of new scoring matrices. Now, the reconstruction of Maximum Parsimony, Maximum Likelihood as well as Neighbour Joining trees on all four alphabets requires just a few lines of code. The package was used to resolve the controversial chlorophyceaen dataset and could be integrated into future versions of the ITS2 workbench. The platform is based on a modern, feature-rich Web 2.0 user interface equipped with the latest AJAX and Web-service technologies. It performs HMM-based sequence annotation, structure prediction by energy minimization or homology modelling, alignment calculation and tree reconstruction on a flexible data pool that repeats calculations according to data changes. Further, it provides sequence motif detection to control annotation and structure prediction and a sequence-structure based BLAST search, which facilitates the taxon sampling process. All features and the usage of the ITS2 workbench are explained in a video tutorial. However, the workbench bears some limitations regarding the size of datasets. This is caused mainly due to the immense computational power needed for such extensive calculations. To demonstrate the validity of the approach also for large-scale analyses, a fully automated reconstruction of the Chlorophyta (Green Algal) Tree of Life was performed. The successful application of the marker even on large datasets underlines the capabilities of ITS2 sequence-structure analysis and suggests its utilization on further datasets. The ITS2 workbench provides an excellent starting point for such endeavours.

1 bis 1

Filtern

Volltext vorhanden

Gehört zur Bibliographie

Erscheinungsjahr

Dokumenttyp

Sprache

Schlagworte

Autor

Institut

1 Treffer