TY - THES A1 - Bischler, Thorsten David T1 - Data mining and software development for RNA-seq-based approaches in bacteria T1 - Data-Mining und Softwareentwicklung für RNA-seq-basierte Methoden bei Bakterien N2 - RNA sequencing (RNA-seq) has in recent years become the preferred method for gene expression analysis and whole transcriptome annotation. While initial RNA-seq experiments focused on eukaryotic messenger RNAs (mRNAs), which can be purified from the cellular ribonucleic acid (RNA) pool with relative ease, more advanced protocols had to be developed for sequencing of microbial transcriptomes. The resulting RNA-seq data revealed an unexpected complexity of bacterial transcriptomes and the requirement for specific analysis methods, which in many cases is not covered by tools developed for processing of eukaryotic data. The aim of this thesis was the development and application of specific data analysis methods for different RNA-seq-based approaches used to gain insights into transcription and gene regulatory processes in prokaryotes. The differential RNA sequencing (dRNA-seq) approach allows for transcriptional start site (TSS) annotation by differentiating between primary transcripts with a 5’-triphosphate (5’-PPP) and processed transcripts with a 5’-monophosphate (5’-P). This method was applied in combination with an automated TSS annotation tool to generate global trancriptome maps for Escherichia coli (E. coli) and Helicobacter pylori (H. pylori). In the E. coli study we conducted different downstream analyses to gain a deeper understanding of the nature and properties of transcripts in our TSS map. Here, we focused especially on putative antisense RNAs (asRNAs), an RNA class transcribed from the opposite strand of known protein-coding genes with the potential to regulate corresponding sense transcripts. Besides providing a set of putative asRNAs and experimental validation of candidates via Northern analysis, we analyzed and discussed different sources of variation in RNA-seq data. The aim of the H. pylori study was to provide a detailed description of the dRNA-seq approach and its application to a bacterial model organism. It includes information on experimental protocols and requirements for data analysis to generate a genome-wide TSS map. We show how the included TSS can be used to identify and analyze transcriptome and regulatory features and discuss challenges in terms oflibrary preparation protocols, sequencing platforms, and data analysis including manual and automated TSS annotation. The TSS maps and associated transcriptome data from both H. pylori and E. coli were made available for visualization in an easily accessible online browser. Furthermore, a modified version of dRNA-seq was used to identify transcriptome targets of the RNA pyrophosphohydrolase (RppH) in H. pylori. RppH initiates 5’-end-dependent degradation of transcripts by converting the 5’-PPP of primary transcripts to a 5’-P. I developed an analysis method, which uses data from complementary DNA (cDNA) libraries specific for transcripts carrying a 5’-PPP, 5’-P or both, to specifically identify transcripts modified by RppH. For this, the method assessed the 5’-phosphorylation state and cellular concentration of transcripts in rppH deletion in comparison to strains with the intact gene. Several of the identified potential RppH targets were further validated via half-life measurements and quantification of their 5’-phosphorylation state in wild-type and mutant cells. Our findings suggest an important role for RppH in post-transcriptional gene regulationin H. pylori and related organisms. In addition, we applied two RNA-seq -based approaches, RNA immunoprecipitation followed by sequencing (RIP-seq) and cross-linking immunoprecipitation followed by sequencing (CLIP-seq), to identify transcripts bound by Hfq and CsrA, two RNA-binding proteins (RBPs) with an important role in post-transcriptional regulation. For RIP-seq -based identification of CsrA binding regions in Campylobacter jejuni(C. jejuni), we used annotation-based analysis and, in addition, a self-developed peak calling method based on a sliding window approach. Both methods revealed flaA mRNA, encoding the major flagellin, as the main target and functional analysis of identified targets showed a significant enrichment of genes involved in flagella biosynthesis. Further experimental analysis revealed the role of flaA mRNA in post-transcriptional regulation. In comparison to RIP-seq, CLIP-seq allows mapping of RBP binding sites with a higher resolution. To identify these sites an approach called “block-based peak calling” was developed and resulting peaks were used to identify sequence and structural constraints required for interaction of Hfq and CsrA with Salmonella transcripts. Overall, the different RNA-seq-based approaches described in this thesis together with their associated analyis pipelines extended our knowledge on the transcriptional repertoire and modes of post-transcriptional regulation in bacteria. The global TSS maps, including further characterized asRNA candidates, putative RppH targets, and identified RBP interactomes will likely trigger similar global studies in the same or different organisms or will be used as a resource for closer examination of these features. N2 - RNA-Sequenzierung (RNA-seq) entwickelte sich in den letzten Jahren zur bevorzugten Methode für Genexpressionsanalysen und die Annotation ganzer Transkriptome. Nachdem sich erste RNA-seq-Experimente hauptsächlich mit eukaryotischen Boten-RNAs (mRNAs) beschäftigt hatten, da diese sich relativ einfach aus dem zellulären RNA-Gemisch aufreinigen lassen, war die Entwicklung von fortschrittlicheren Methoden nötig, um mikrobielle Transkriptome zu sequenzieren. Die sich daraus ergebenden RNA-seq-Daten enthüllten eine unerwartete Komplexität bakterieller Transkriptome und die Notwendigkeit der Anwendung spezifischer Analyseverfahren, welche von Tools zur Prozessierung eukaryotischer Daten häufig nicht zur Verfügung gestellt werden. Das Ziel dieser Doktorarbeit war die Entwicklung und Anwendung spezifischer Verfahren zur Datenanalyse für verschiedene RNA-seq-basierte Methoden, um Erkenntnisse bezüglich Transkription und genregulatorischer Vorgänge bei Prokaryoten zu erlangen. Die Differentielle-RNA-Sequenzierungsmethode (dRNA-seq) ermöglicht die Annotation von Transkriptionsstartpunkten (TSS), indem sie Primärtranskripte mit einem 5'-Triphosphat (5'-PPP) von prozessierten Transkripten mit einem 5'-Monophosphat (5'-P) unterscheidet. Diese Methode wurde in Kombination mit einem automatisierten TSS-Annotationstool zur Erstellung globaler Transkriptomkarten für Escherichia coli (E. coli) and Helicobacter pylori (H. pylori) verwendet. In der E. coli-Studie haben wir verschiedene Folgeanalysen durchgeführt, um ein tieferes Verständnis für die Natur und Eigenschaften der in unserer Transkriptomkarte enthaltenen Transkripte zu erlangen. Das Hauptaugenmerk lag dabei auf mutmaßlichen Antisense-RNAs (asRNAs). Diese stellen eine RNA-Klasse dar, welche vom entgegengesetzten Strang von bekannten proteinkodierenden Genen transkribiert wird, und die das Potenzial hat, entsprechende Sense-Transkripte zu regulieren. Wir stellen nicht nur eine Liste mutmaßlicher asRNAs zur Verfügung, von der einige Kandidaten durch Northern Blots validiert wurden, sondern diskutierten auch von uns untersuchte Gründe für auftretende Variation bei RNA-seq-Daten. Das Ziel der H. pylori-Studie war es, eine detaillierte Beschreibung der dRNA-seq-Methode und deren Anwendung auf einen bakteriellen Modellorganismus zur Verfügung zu stellen. Sie enthält Informationen bezüglich experimenteller Protokolle und für die Datenanalyse notwendige Schritte, zur Erstellung einer genomweiten TSS-Karte. Wir zeigen, wie die enthaltenen TSS verwendet werden können, um verschiedene Transkriptomelemente, einschließlich solcher mit regulatorischen Eigenschaften, zu identifizieren und zu analysieren. Zusätzlich diskutieren wir Probleme, welche bei der Erstellung von Sequenzierlibraries, der Verwendung von Sequenzierplattformen und bei der Datenanalyse, einschließlich manueller und automatisierter TSS-Annotation, auftreten können. Die TSS-Karten für H. pylori und E. coli, einschließlich der damit verbundenen Transkriptomdaten, haben wir in Form eines leicht zugänglichen Online-Browsers verfügbar gemacht. Desweiteren wurde eine modifizierte Version der dRNA-seq-Methode verwendet, um Transkripte zu identifizieren, welche von der RNA Pyrophosphohydrolase (RppH) in H. pylori gespalten werden. RppH initiiert den vom 5'-Ende abhängigen RNA-Abbau, indem sie das 5'-PPP von Primärtranskripten in ein 5'-P umwandelt. Ich habe eine Analysemethode entwickelt, welche Daten basierend auf unterschiedlichen Komplementär-DNA (cDNA)-Libraries verwendet, welche entweder spezifisch für Transkripte mit einem 5'-PPP oder einem 5'-P sind, oder beides enthalten, um spezifisch Transkripte zu indentifizieren, die durch RppH modifiziert werden. Um dies zu erreichen wurden der 5'-Phosphorylierungsstatus und die zelluläre Konzentration der Transkripte zwischen einer rppH-Deletionsmutante und Stämmen mit intaktem Gen verglichen. Weiterhin wurden mehrere der identifizierten, von RppH gespaltenen Transkripte durch Messung ihrer Halbwertszeit und Quantifizierung ihres 5'-Phosphorylierungsstatus bei Wildtyp- und mutierten Zellen validiert. Unsere Ergebnisse lassen auf eine wichtige Rolle von RppH bei der Genregulation in H. pylori und verwandten Organismen schließen. Zusätzlich haben wir zwei weitere RNA-seq-basierte Methoden namens RNA-Immunpräzipitation gefolgt von RNA-Sequenzierung (RIP-seq) und Quervernetzung und Immunpräzipitation gefolgt von RNA-Sequenzierung (CLIP-seq) verwendet, um Transkripte zu identifizieren, welche von Hfq und CsrA gebunden werden, zwei RNA-Bindeproteinen (RBPs), die eine wichtige Rolle bei posttranskriptionaler Regulation spielen. Zur RIP-seq-basierten Identifikation von CsrA-Binderegionen bei Campylobacter jejuni (C. jejuni) haben wir eine annotationsbasierte Analyse und zusätzlich eine eigens entwickelte Peak-Bestimmungsmethode verwendet. Beide Methoden haben die flaA mRNA, welche das Hauptflagellin kodiert, als stärksten Bindepartner identifiziert. Die Funktionale-Anreicherungsanalyse hat außerdem eine Anreicherung von Genen ergeben, welche für die Flagellenbiosynthese von Bedeutung sind. Im Vergleich zu RIP-seq ermöglicht CLIP-seq eine höhere Auflösung bei der Kartografierung von Bindestellen. Um diese Stellen zu identifizieren wurde eine Methode mit der Bezeichnung ``block-based peak calling'' entwickelt, und die daraus resultierenden Peaks wurden verwendet, um sequenz- und strukturabhängige Bedingungen zu bestimmen, die bei Salmonella für die Interaktion von Transkripten mit Hfq und CsrA notwendig sind. Insgesamt betrachtet haben die verschiedenen RNA-seq-basierten Methoden, welche in dieser Doktorarbeit beschrieben wurden, in Kombination mit den damit verbundenen Analysepipelines, unser Verständnis des transkriptionellen Repertoires und der Art und Weise, wie posttranskriptionelle Regulation bei Bakterien abläuft, erweitert. Die globalen TSS-Karten, einschließlich der charakterisierten asRNA-Kandidaten, die mutmaßlich von RppH gespaltenen Transkripte und die identifizierten RBP-Interaktome werden höchstwahrscheinlich zur Durchführung ähnlicher Studien bei den gleichen oder anderen Organismen führen, oder können als Grundlage für eine detailliertere Untersuchung dieser Elemente verwendet werden. KW - Bakterien KW - RNA sequencing KW - Bioinformatics KW - Bacteria KW - Transcriptome KW - Post-transcriptional regulation KW - RNA-binding proteins KW - Sequenzanalyse KW - RNS Y1 - 2018 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-166108 ER - TY - JOUR A1 - Conrad, Thomas A1 - Albrecht, Anne-Susann A1 - Rodrigues de Melo Costa, Veronica A1 - Sauer, Sascha A1 - Meierhofer, David A1 - Andersson Ørom, Ulf T1 - Serial interactome capture of the human cell nucleus JF - Nature Communications N2 - Novel RNA-guided cellular functions are paralleled by an increasing number of RNA-binding proteins (RBPs). Here we present ‘serial RNA interactome capture’ (serIC), a multiple purification procedure of ultraviolet-crosslinked poly(A)–RNA–protein complexes that enables global RBP detection with high specificity. We apply serIC to the nuclei of proliferating K562 cells to obtain the first human nuclear RNA interactome. The domain composition of the 382 identified nuclear RBPs markedly differs from previous IC experiments, including few factors without known RNA-binding domains that are in good agreement with computationally predicted RNA binding. serIC extends the number of DNA–RNA-binding proteins (DRBPs), and reveals a network of RBPs involved in p53 signalling and double-strand break repair. serIC is an effective tool to couple global RBP capture with additional selection or labelling steps for specific detection of highly purified RBPs. KW - human cell nucleus KW - serial RNA interactome capture KW - RNA-binding proteins Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-166172 VL - 7 IS - 11212 ER - TY - JOUR A1 - Gerova, Milan A1 - Wicke, Laura A1 - Chihara, Kotaro A1 - Schneider, Cornelius A1 - Lavigne, Rob A1 - Vogel, Jörg T1 - A grad-seq view of RNA and protein complexes in Pseudomonas aeruginosa under standard and bacteriophage predation conditions JF - mbio N2 - The Gram-negative rod-shaped bacterium Pseudomonas aeruginosa is not only a major cause of nosocomial infections but also serves as a model species of bacterial RNA biology. While its transcriptome architecture and posttranscriptional regulation through the RNA-binding proteins Hfq, RsmA, and RsmN have been studied in detail, global information about stable RNA-protein complexes in this human pathogen is currently lacking. Here, we implement gradient profiling by sequencing (Grad-seq) in exponentially growing P. aeruginosa cells to comprehensively predict RNA and protein complexes, based on glycerol gradient sedimentation profiles of >73% of all transcripts and ∼40% of all proteins. As to benchmarking, our global profiles readily reported complexes of stable RNAs of P. aeruginosa, including 6S RNA with RNA polymerase and associated product RNAs (pRNAs). We observe specific clusters of noncoding RNAs, which correlate with Hfq and RsmA/N, and provide a first hint that P. aeruginosa expresses a ProQ-like FinO domain-containing RNA-binding protein. To understand how biological stress may perturb cellular RNA/protein complexes, we performed Grad-seq after infection by the bacteriophage ΦKZ. This model phage, which has a well-defined transcription profile during host takeover, displayed efficient translational utilization of phage mRNAs and tRNAs, as evident from their increased cosedimentation with ribosomal subunits. Additionally, Grad-seq experimentally determines previously overlooked phage-encoded noncoding RNAs. Taken together, the Pseudomonas protein and RNA complex data provided here will pave the way to a better understanding of RNA-protein interactions during viral predation of the bacterial cell. IMPORTANCE Stable complexes by cellular proteins and RNA molecules lie at the heart of gene regulation and physiology in any bacterium of interest. It is therefore crucial to globally determine these complexes in order to identify and characterize new molecular players and regulation mechanisms. Pseudomonads harbor some of the largest genomes known in bacteria, encoding ∼5,500 different proteins. Here, we provide a first glimpse on which proteins and cellular transcripts form stable complexes in the human pathogen Pseudomonas aeruginosa. We additionally performed this analysis with bacteria subjected to the important and frequently encountered biological stress of a bacteriophage infection. We identified several molecules with established roles in a variety of cellular pathways, which were affected by the phage and can now be explored for their role during phage infection. Most importantly, we observed strong colocalization of phage transcripts and host ribosomes, indicating the existence of specialized translation mechanisms during phage infection. All data are publicly available in an interactive and easy to use browser. KW - Grad-seq KW - Pseudomonas KW - UKZ KW - bacteriophage KW - infection KW - Pseudomonas aeruginosa KW - RNA-binding proteins KW - noncoding RNA KW - phage Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-259054 VL - 12 IS - 1 ER - TY - JOUR A1 - Prezza, Gianluca A1 - Ryan, Daniel A1 - Mädler, Gohar A1 - Reichardt, Sarah A1 - Barquist, Lars A1 - Westermann, Alexander J. T1 - Comparative genomics provides structural and functional insights into Bacteroides RNA biology JF - Molecular Microbiology N2 - Bacteria employ noncoding RNA molecules for a wide range of biological processes, including scaffolding large molecular complexes, catalyzing chemical reactions, defending against phages, and controlling gene expression. Secondary structures, binding partners, and molecular mechanisms have been determined for numerous small noncoding RNAs (sRNAs) in model aerobic bacteria. However, technical hurdles have largely prevented analogous analyses in the anaerobic gut microbiota. While experimental techniques are being developed to investigate the sRNAs of gut commensals, computational tools and comparative genomics can provide immediate functional insight. Here, using Bacteroides thetaiotaomicron as a representative microbiota member, we illustrate how comparative genomics improves our understanding of RNA biology in an understudied gut bacterium. We investigate putative RNA-binding proteins and predict a Bacteroides cold-shock protein homolog to have an RNA-related function. We apply an in silico protocol incorporating both sequence and structural analysis to determine the consensus structures and conservation of nine Bacteroides noncoding RNA families. Using structure probing, we validate and refine these predictions and deposit them in the Rfam database. Through synteny analyses, we illustrate how genomic coconservation can serve as a predictor of sRNA function. Altogether, this work showcases the power of RNA informatics for investigating the RNA biology of anaerobic microbiota members. KW - BT_1884 KW - cold-shock protein KW - GibS KW - RNA-binding proteins KW - secondary structure KW - 6S RNA Y1 - 2022 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-259594 VL - 117 IS - 1 ER - TY - JOUR A1 - Salehi, Saeede A1 - Zare, Abdolhossein A1 - Prezza, Gianluca A1 - Bader, Jakob A1 - Schneider, Cornelius A1 - Fischer, Utz A1 - Meissner, Felix A1 - Mann, Matthias A1 - Briese, Michael A1 - Sendtner, Michael T1 - Cytosolic Ptbp2 modulates axon growth in motoneurons through axonal localization and translation of Hnrnpr JF - Nature Communications N2 - The neuronal RNA-binding protein Ptbp2 regulates neuronal differentiation by modulating alternative splicing programs in the nucleus. Such programs contribute to axonogenesis by adjusting the levels of protein isoforms involved in axon growth and branching. While its functions in alternative splicing have been described in detail, cytosolic roles of Ptbp2 for axon growth have remained elusive. Here, we show that Ptbp2 is located in the cytosol including axons and growth cones of motoneurons, and that depletion of cytosolic Ptbp2 affects axon growth. We identify Ptbp2 as a major interactor of the 3’ UTR of Hnrnpr mRNA encoding the RNA-binding protein hnRNP R. Axonal localization of Hnrnpr mRNA and local synthesis of hnRNP R protein are strongly reduced when Ptbp2 is depleted, leading to defective axon growth. Ptbp2 regulates hnRNP R translation by mediating the association of Hnrnpr with ribosomes in a manner dependent on the translation factor eIF5A2. Our data thus suggest a mechanism whereby cytosolic Ptbp2 modulates axon growth by fine-tuning the mRNA transport and local synthesis of an RNA-binding protein. KW - molecular neuroscience KW - RNA-binding proteins KW - RNA transport Y1 - 2023 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-357639 VL - 14 ER - TY - JOUR A1 - Sharan, Malvika A1 - Förstner, Konrad U. A1 - Eulalio, Ana A1 - Vogel, Jörg T1 - APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins JF - Nucleic Acids Research N2 - RNA-binding proteins (RBPs) have been established as core components of several post-transcriptional gene regulation mechanisms. Experimental techniques such as cross-linking and co-immunoprecipitation have enabled the identification of RBPs, RNA-binding domains (RBDs) and their regulatory roles in the eukaryotic species such as human and yeast in large-scale. In contrast, our knowledge of the number and potential diversity of RBPs in bacteria is poorer due to the technical challenges associated with the existing global screening approaches. We introduce APRICOT, a computational pipeline for the sequence-based identification and characterization of proteins using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences using position-specific scoring matrices and Hidden Markov Models of the functional domains and statistically scores them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them by several biological properties. Here we demonstrate the application and adaptability of the pipeline on large-scale protein sets, including the bacterial proteome of Escherichia coli. APRICOT showed better performance on various datasets compared to other existing tools for the sequence-based prediction of RBPs by achieving an average sensitivity and specificity of 0.90 and 0.91 respectively. The command-line tool and its documentation are available at https://pypi.python.org/pypi/bio-apricot. KW - RNA-binding proteins KW - identification KW - characterization Y1 - 2017 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-157963 VL - 45 IS - 11 ER -