Refine
Has Fulltext
- yes (5)
Is part of the Bibliography
- yes (5)
Document Type
- Doctoral Thesis (5)
Language
- English (5)
Keywords
- Bacteria (1)
- Bioinformatics (1)
- Complexome (1)
- Dual RNA-seq (1)
- Escherichia coli (1)
- Fusobacterium nucleatum (1)
- Grad-seq (1)
- Kleine Proteine (1)
- MgrB (1)
- Multiproteinkomplex (1)
Institute
- Institut für Molekulare Infektionsbiologie (5) (remove)
Sonstige beteiligte Institutionen
The infection of a eukaryotic host cell by a bacterial pathogen is one of the most intimate examples of cross-kingdom interactions in biology. Infection processes are highly relevant from both a basic research as well as a clinical point of view. Sophisticated mechanisms have evolved in the pathogen to manipulate the host response and vice versa host cells have developed a wide range of anti-microbial defense strategies to combat bacterial invasion and clear infections. However, it is this diversity and complexity that makes infection research so challenging to technically address as common approaches have either been optimized for bacterial or eukaryotic organisms. Instead, methods are required that are able to deal with the often dramatic discrepancy between host and pathogen with respect to various cellular properties and processes. One class of cellular macromolecules that exemplify this host-pathogen heterogeneity is given by their transcriptomes: Bacterial transcripts differ from their eukaryotic counterparts in many aspects that involve both quantitative and qualitative traits. The entity of RNA transcripts present in a cell is of paramount interest as it reflects the cell’s physiological state under the given condition. Genome-wide transcriptomic techniques such as RNA-seq have therefore been used for single-organism analyses for several years, but their applicability has been limited for infection studies.
The present work describes the establishment of a novel transcriptomic approach for infection biology which we have termed “Dual RNA-seq”. Using this technology, it was intended to shed light particularly on the contribution of non-protein-encoding transcripts to virulence, as these classes have mostly evaded previous infection studies due to the lack of suitable methods. The performance of Dual RNA-seq was evaluated in an in vitro infection model based on the important facultative intracellular pathogen Salmonella enterica serovar Typhimurium and different human cell lines. Dual RNA-seq was found to be capable of capturing all major bacterial and human transcript classes and proved reproducible. During the course of these experiments, a previously largely uncharacterized bacterial small non-coding RNA (sRNA), referred to as STnc440, was identified as one of the most strongly induced genes in intracellular Salmonella. Interestingly, while inhibition of STnc440 expression has been previously shown to cause a virulence defect in different animal models of Salmonellosis, the underlying molecular mechanisms have remained obscure. Here, classical genetics, transcriptomics and biochemical assays proposed a complex model of Salmonella gene expression control that is orchestrated by this sRNA. In particular, STnc440 was found to be involved in the regulation of multiple bacterial target mRNAs by direct base pair interaction with consequences for Salmonella virulence and implications for the host’s immune response. These findings exemplify the scope of Dual RNA-seq for the identification and characterization of novel bacterial virulence factors during host infection.
RNA-binding proteins (RBPs) have been extensively studied in eukaryotes, where they post-transcriptionally regulate many cellular events including RNA transport, translation, and stability. Experimental techniques, such as cross-linking and co-purification followed by either mass spectrometry or RNA sequencing has enabled the identification and characterization of RBPs, their conserved RNA-binding domains (RBDs), and the regulatory roles of these proteins on a genome-wide scale. These developments in quantitative, high-resolution, and high-throughput screening techniques have greatly expanded our understanding of RBPs in human and yeast cells. In contrast, our knowledge of number and potential diversity of RBPs in bacteria is comparatively poor, in part due to the technical challenges associated with existing global screening approaches developed in eukaryotes.
Genome- and proteome-wide screening approaches performed in silico may circumvent these technical issues to obtain a broad picture of the RNA interactome of bacteria and identify strong RBP candidates for more detailed experimental study. Here, I report APRICOT (“Analyzing Protein RNA Interaction by Combined Output Technique”), a computational pipeline for the sequence-based identification and characterization of candidate RNA-binding proteins encoded in the genomes of all domains of life using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences of an input proteome using position-specific scoring matrices and hidden Markov models of all conserved domains available in the databases and then statistically score them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them according to functionally relevant structural properties. APRICOT performed better than other existing tools for the sequence-based prediction on the known RBP data sets. The applications and adaptability of the software was demonstrated on several large bacterial RBP data sets including the complete proteome of Salmonella Typhimurium strain SL1344. APRICOT reported 1068 Salmonella proteins as RBP candidates, which were subsequently categorized using the RBDs that have been reported in both eukaryotic and bacterial proteins. A set of 131 strong RBP candidates was selected for experimental confirmation and characterization of RNA-binding activity using RNA co-immunoprecipitation followed by high-throughput sequencing (RIP-Seq) experiments. Based on the relative abundance of transcripts across the RIP-Seq libraries, a catalogue of enriched genes was established for each candidate, which shows the RNA-binding potential of 90% of these proteins. Furthermore, the direct targets of few of these putative RBPs were validated by means of cross-linking and co-immunoprecipitation (CLIP) experiments.
This thesis presents the computational pipeline APRICOT for the global screening of protein primary sequences for potential RBPs in bacteria using RBD information from all kingdoms of life. Furthermore, it provides the first bio-computational resource of putative RBPs in Salmonella, which could now be further studied for their biological and regulatory roles. The command line tool and its documentation are available at https://malvikasharan.github.io/APRICOT/.
Complex formation between macromolecules constitutes the foundation of most cellular processes. Most known complexes are made up of two or more proteins interacting in order to build a functional entity and therefore enabling activities which
the single proteins could otherwise not fulfill. With the increasing knowledge about
noncoding RNAs (ncRNAs) it has become evident that, similar to proteins, many of
them also need to form a complex to be functional. This functionalization is usually executed by specific or global RNA-binding proteins (RBPs) that are specialized
binders of a certain class of ncRNAs. For instance, the enterobacterial global RBPs
Hfq and ProQ together bind >80 % of the known small regulatory RNAs (sRNAs),
a class of ncRNAs involved in post-transcriptional regulation of gene expression.
However, identification of RNA-protein interactions so far was performed individually by employing low-throughput biochemical methods and thereby hindered the discovery of such interactions, especially in less studied organisms such
as Gram-positive bacteria. Using gradient profiling by sequencing (Grad-seq), the
present thesis aimed to establish high-throughput, global RNA/protein complexome resources for Escherichia coli and Streptococcus pneumoniae in order to provide a
new way to investigate RNA-protein as well as protein-protein interactions in these
two important model organisms.
In E. coli, Grad-seq revealed the sedimentation profiles of 4,095 (∼85 % of
total) transcripts and 2,145 (∼49 % of total) proteins and with that reproduced
its major ribonucleoprotein particles. Detailed analysis of the in-gradient distribution of the RNA and protein content uncovered two functionally unknown
molecules—the ncRNA RyeG and the small protein YggL—to be ribosomeassociated. Characterization of RyeG revealed it to encode for a 48 aa long, toxic protein that drastically increases lag times when overexpressed. YggL was shown to
be bound by the 50S subunit of the 70S ribosome, possibly indicating involvement
of YggL in ribosome biogenesis or translation of specific mRNAs.
S. pneumoniae Grad-seq detected 2,240 (∼88 % of total) transcripts and 1,301
(∼62 % of total) proteins, whose gradient migration patterns were successfully reconstructed, and thereby represents the first RNA/protein complexome resource
of a Gram-positive organism. The dataset readily verified many conserved major
complexes for the first time in S. pneumoniae and led to the discovery of a specific
interaction between the 3’!5’ exonuclease Cbf1 and the competence-regulating ciadependent sRNAs (csRNAs). Unexpectedly, trimming of the csRNAs by Cbf1 stabilized the former, thereby promoting their inhibitory function. cbf1 was further shown
to be part of the late competence genes and as such to act as a negative regulator of
competence.
The anaerobe Fusobacterium nucleatum (F. nucleatum) is an important member of the oral microbiome but can also colonize different tissues of the human body. In particular, its association with multiple human cancers has drawn much attention.
This association has prompted growing interest into the interaction of F. nucleatum with cancer, with studies focusing primarily on the host cells. At the same time, F. nucleatum itself remains poorly understood, which includes its transcriptomic architecture but also gene regulation such as global stress responses that typically enable survival of bacteria in new environments. An important aspect of such regulatory networks is the post-transcriptional regulation, which is entirely unknown in F. nucleatum. This paucity extents to any knowledge on small regulatory RNAs (sRNAs), despite their important role as post-transcriptional regulators of the bacterial physiology.
Investigating the above stated aspects is further complicated by the fact that F. nucleatum is phylogenetically distant from all other bacteria, displays very limited genetic tractability and lacks genetic tools for dissecting gene function.
This leaves many open questions on basic gene regulation in F. nucleatum, such as if the bacterium combines transcriptional and post-transcriptional regulation in its adaptation to a changing environment.
To begin answering this question, this works elucidated the transcriptomic landscape of F. nucleatum by performing differential RNA-seq (dRNA-seq). Conducted for five representative strains of all F. nucleatum subspecies and the closely related F. periodonticum, the analysis globally uncovered transcriptional start sites (TSS), 5'untranslated regions (UTRs) and improved the existing annotation. Importantly, the dRNA-seq analysis also identified a conserved suite of sRNAs specific to Fusobacterium.
The development of five genetic tools enabled further investigations of gene functions in F. nucleatum. These include vectors that enable the expression of different fluorescent proteins, inducible gene expression and scarless gene deletion in addition to transcriptional and translational reporter systems.
These tools enabled the dissection of a Sigma E response and uncovered several commonalities with its counterpart in the phylogenetically distant Proteobacteria. The similarities include the upregulation of genes involved in membrane homeostasis but also a Simga E-dependent regulatory sRNA. Surprisingly, oxygen was found to activated Sigma E in F. nucleatum contrasting the typical role of the factor in envelope stress.
The non-coding Sigma E-dependent sRNA, named FoxI, was shown to repress the translation of several envelope proteins which represented yet another parallel to the envelope stress response in Proteobacteria.
Overall, this work sheds light on the RNA landscape of the cancer-associated bacterium leading to the discovery of a conserved global stress response consisting of a coding and a non-coding arm. The development of new genetic tools not only aided the latter discovery but also provides the means for further dissecting the molecular and infection biology of this enigmatic bacterium.
Small proteins, often defined as shorter than 50 amino acids, have been implicated
in fundamental cellular processes. Despite this, they have been largely understudied throughout all domains of life, since their size often makes their identification and characterization challenging.
This work addressed the knowledge gap surrounding small proteins with a focus
on the model bacterial pathogen Salmonella Typhimurium. In a first step,
new small proteins were identified with a combination of computational and experimental approaches. Infection-relevant datasets were then investigated with
the updated Salmonella annotation to prioritize promising candidates involved in virulence.
To implement the annotation of new small proteins, predictions from the algorithm
sPepFinder were merged with those derived from Ribo-seq. These were added to the Salmonella annotation and used to (re)analyse different datasets. Information
regarding expression during infection (dual RNA-seq) and requirement for virulence (TraDIS) was collected for each given coding sequence. In parallel,
Grad-seq data were mined to identify small proteins engaged in intermolecular
interactions.
The combination of dual RNA-seq and TraDIS lead to the identification of small
proteins with features of virulence factors, namely high intracellular induction
and a virulence phenotype upon transposon insertion. As a proof of principle of
the power of this approach in highlighting high confidence candidates, two small
proteins were characterized in the context of Salmonella infection.
MgrB, a known regulator of the PhoPQ two-component system, was shown to be essential for the infection of epithelial cells and macrophages, possibly via its stabilizing effect on flagella or by interacting with other sensor kinases of twocomponent
systems. YjiS, so far uncharacterized in Salmonella, had an opposite role in infection, with its deletion rendering Salmonella hypervirulent. The mechanism underlying this, though still obscure, likely relies on the interaction with
inner-membrane proteins.
Overall, this work provides a global description of Salmonella small proteins in
the context of infection with a combinatorial approach that expedites the identification
of interesting candidates. Different high-throughput datasets available for
a broad range of organisms can be analysed in a similar manner with a focus on small proteins. This will lead to the identification of key factors in the regulation
of various processes, thus for example providing targets for the treatment of bacterial
infections or, in the case of commensal bacteria, for the modulation of the microbiota composition.