Refine
Has Fulltext
- yes (15)
Is part of the Bibliography
- yes (15)
Year of publication
Document Type
- Journal article (11)
- Doctoral Thesis (4)
Language
- English (15) (remove)
Keywords
- metagenomics (15) (remove)
Institute
- Theodor-Boveri-Institut für Biowissenschaften (9)
- Graduate School of Life Sciences (2)
- Julius-von-Sachs-Institut für Biowissenschaften (2)
- Fakultät für Biologie (1)
- Institut für Molekulare Infektionsbiologie (1)
- Klinik und Poliklinik für Psychiatrie, Psychosomatik und Psychotherapie (1)
- Medizinische Fakultät (1)
- Pathologisches Institut (1)
Sonstige beteiligte Institutionen
The human gut is home for thousands of microbes that are important for human life. As most of these cannot be cultivated, metagenomics is an important means to understand this important community. To perform comparative metagenomic analysis of the human gut microbiome, I have developed SMASH (Simple metagenomic analysis shell), a computational pipeline. SMASH can also be used to assemble and analyze single genomes, and has been successfully applied to the bacterium Mycoplasma pneumoniae and the fungus Chaetomium thermophilum. In the context of the MetaHIT (Metagenomics of the human intestinal tract) consortium our group is participating in, I used SMASH to validate the assembly and to estimate the assembly error rate of 576.7 Gb metagenome sequence obtained using Illumina Solexa technology from fecal DNA of 124 European individuals. I also estimated the completeness of the gene catalogue containing 3.3 million open reading frames obtained from these metagenomes. Finally, I used SMASH to analyze human gut metagenomes of 39 individuals from 6 countries encompassing a wide range of host properties such as age, body mass index and disease states. We find that the variation in the gut microbiome is not continuous but stratified into enterotypes. Enterotypes are complex host-microbial symbiotic states that are not explained by host properties, nutritional habits or possible technical biases. The concept of enterotypes might have far reaching implications, for example, to explain different responses to diet or drug intake. We also find several functional markers in the human gut microbiome that correlate with a number of host properties such as body mass index, highlighting the need for functional analysis and raising hopes for the application of microbial markers as diagnostic or even prognostic tools for microbiota-associated human disorders.
Postencephalitic parkinsonism (PEP) is a disease of unknown etiology and pathophysiology following encephalitis lethargica (EL), an acute-onset polioencephalitis of cryptic cause in the 1920s. PEP is a tauopathy with multisystem neuronal loss and gliosis, clinically characterized by bradykinesia, rigidity, rest tremor, and oculogyric crises. Though a viral cause of EL is likely, past polymerase chain reaction-based investigations in the etiology of both PEP and EL were negative. PEP might be caused directly by an unknown viral pathogen or the consequence of a post-infectious immunopathology. The development of metagenomic next-generation sequencing in conjunction with bioinformatic techniques has generated a broad-range tool for the detection of unknown pathogens in the recent past. Retrospective identification and characterization of pathogens responsible for past infectious diseases can be successfully performed with formalin-fixed paraffin-embedded (FFPE) tissue samples. In this study, we analyzed 24 FFPE brain samples from six patients with PEP by unbiased metagenomic next-generation sequencing. Our results show that no evidence for the presence of a specific or putative (novel) viral pathogen was found, suggesting a likely post-infectious immune-mediated etiology of PEP.
The vast microbial diversity on the planet represents an invaluable source for identifying novel activities with potential industrial and therapeutic application. In this regard, metagenomics has emerged as a group of strategies that have significantly facilitated the analysis of DNA from multiple environments and has expanded the limits of known microbial diversity. However, the functional characterization of enzymes, metabolites, and products encoded by diverse microbial genomes is limited by the inefficient heterologous expression of foreign genes. We have implemented a pipeline that combines NGS and Sanger sequencing as a way to identify fosmids within metagenomic libraries. This strategy facilitated the identification of putative proteins, subcloning of targeted genes and preliminary characterization of selected proteins. Overall, the in silico approach followed by the experimental validation allowed us to efficiently recover the activity of previously hidden enzymes derived from agricultural soil samples. Therefore, the methodology workflow described herein can be applied to recover activities encoded by environmental DNA from multiple sources.
Background
Shotgun metagenomes contain a sample of all the genomic material in an environment, allowing for the characterization of a microbial community. In order to understand these communities, bioinformatics methods are crucial. A common first step in processing metagenomes is to compute abundance estimates of different taxonomic or functional groups from the raw sequencing data.
Given the breadth of the field, computational solutions need to be flexible and extensible, enabling the combination of different tools into a larger pipeline.
Results
We present NGLess and NG-meta-profiler. NGLess is a domain specific language for describing next-generation sequence processing pipelines. It was developed with the goal of enabling user-friendly computational reproducibility. It provides built-in support for many common operations on sequencing data and is extensible with external tools with configuration files.
Using this framework, we developed NG-meta-profiler, a fast profiler for metagenomes which performs sequence preprocessing, mapping to bundled databases, filtering of the mapping results, and profiling (taxonomic and functional). It is significantly faster than either MOCAT2 or htseq-count and (as it builds on NGLess) its results are perfectly reproducible.
Conclusions
NG-meta-profiler is a high-performance solution for metagenomics processing built on NGLess. It can be used as-is to execute standard analyses or serve as the starting point for customization in a perfectly reproducible fashion.
NGLess and NG-meta-profiler are open source software (under the liberal MIT license) and can be downloaded from https://ngless.embl.de or installed through bioconda.
Background
Gut microbes influence their hosts in many ways, in particular by modulating the impact of diet. These effects have been studied most extensively in humans and mice. In this work, we used whole genome metagenomics to investigate the relationship between the gut metagenomes of dogs, humans, mice, and pigs.
Results
We present a dog gut microbiome gene catalog containing 1,247,405 genes (based on 129 metagenomes and a total of 1.9 terabasepairs of sequencing data). Based on this catalog and taxonomic abundance profiling, we show that the dog microbiome is closer to the human microbiome than the microbiome of either pigs or mice. To investigate this similarity in terms of response to dietary changes, we report on a randomized intervention with two diets (high-protein/low-carbohydrate vs. lower protein/higher carbohydrate). We show that diet has a large and reproducible effect on the dog microbiome, independent of breed or sex. Moreover, the responses were in agreement with those observed in previous human studies.
Conclusions
We conclude that findings in dogs may be predictive of human microbiome results. In particular, a novel finding is that overweight or obese dogs experience larger compositional shifts than lean dogs in response to a high-protein diet.
Population genomics of prokaryotes has been studied in depth in only a small number of primarily pathogenic bacteria, as genome sequences of isolates of diverse origin are lacking for most species. Here, we conducted a large‐scale survey of population structure in prevalent human gut microbial species, sampled from their natural environment, with a culture‐independent metagenomic approach. We examined the variation landscape of 71 species in 2,144 human fecal metagenomes and found that in 44 of these, accounting for 72% of the total assigned microbial abundance, single‐nucleotide variation clearly indicates the existence of sub‐populations (here termed subspecies). A single subspecies (per species) usually dominates within each host, as expected from ecological theory. At the global scale, geographic distributions of subspecies differ between phyla, with Firmicutes subspecies being significantly more geographically restricted. To investigate the functional significance of the delineated subspecies, we identified genes that consistently distinguish them in a manner that is independent of reference genomes. We further associated these subspecies‐specific genes with properties of the microbial community and the host. For example, two of the three Eubacterium rectale subspecies consistently harbor an accessory pro‐inflammatory flagellum operon that is associated with lower gut community diversity, higher host BMI, and higher blood fasting insulin levels. Using an additional 676 human oral samples, we further demonstrate the existence of niche specialized subspecies in the different parts of the oral cavity. Taken together, we provide evidence for subspecies in the majority of abundant gut prokaryotes, leading to a better functional and ecological understanding of the human gut microbiome in conjunction with its host.
The microbial communities that live inside the human gastrointestinal tract -the human gut
microbiome- are important for host health and wellbeing. Characterizing this new “organ”,
made up of as many cells as the human body itself, has recently become possible through
technological advances. Metagenomics, the high-throughput sequencing of DNA directly from
microbial communities, enables us to take genomic snapshots of thousands of microbes living
together in this complex ecosystem, without the need for isolating and growing them.
Quantifying the composition of the human gut microbiome allows us to investigate its
properties and connect it to host physiology and disease. The wealth of such connections was
unexpected and is probably still underestimated. Due to the fact that most of our dietary as well
as medicinal intake affects the microbiome and that the microbiome itself interacts with our
immune system through a multitude of pathways, many mechanisms have been proposed to
explain the observed correlations, though most have yet to be understood in depth.
An obvious prerequisite to characterizing the microbiome and its interactions with the host is
the accurate quantification of its composition, i.e. determining which microbes are present and
in what numbers they occur. Historically, standard practices have existed for sample handling,
DNA extraction and data analysis for many years. However, these were generally developed for
single microbe cultures and it is not always feasible to implement them in large scale
metagenomic studies. Partly because of this and partly because of the excitement that new
technology brings about, the first metagenomic studies each took the liberty to define their own
approach and protocols. From early meta-analysis of these studies it became clear that the
differences in sample handling, as well as differences in computational approaches, made
comparisons across studies very difficult. This restricts our ability to cross-validate findings of
individual studies and to pool samples from larger cohorts. To address the pressing need for
standardization, we undertook an extensive comparison of 21 different DNA extraction methods
as well as a series of other sample manipulations that affect quantification. We developed a
number of criteria for determining the measurement quality in the absence of a mock
community and used these to propose best practices for sampling, DNA extraction and library
preparation. If these were to be accepted as standards in the field, it would greatly improve
comparability across studies, which would dramatically increase the power of our inferences
and our ability to draw general conclusions about the microbiome.
Most metagenomics studies involve comparisons between microbial communities, for example
between fecal samples from cases and controls. A multitude of approaches have been proposed
to calculate community dissimilarities (beta diversity) and they are often combined with
various preprocessing techniques. Direct metagenomics quantification usually counts
sequencing reads mapped to specific taxonomic units, which can be species, genera, etc. Due to
technology-inherent differences in sampling depth, normalizing counts is necessary, for
instance by dividing each count by the sum of all counts in a sample (i.e. total sum scaling), or by
subsampling. To derive a single value for community (dis-)similarity, multiple distance
measures have been proposed. Although it is theoretically difficult to benchmark these
approaches, we developed a biologically motivated framework in which distance measures can
be evaluated. This highlights the importance of data transformations and their impact on the
measured distances.
Building on our experience with accurate abundance estimation and data preprocessing
techniques, we can now try and understand some of the basic properties of microbial
communities. In 2011, it was proposed that the space of genus level variation of the human gut
microbial community is structured into three basic types, termed enterotypes. These were
described in a multi-country cohort, so as to be independent of geography, age and other host
properties. Operationally defined through a clustering approach, they are “densely populated
areas in a multidimensional space of community composition”(source) and were proposed as a
general stratifier for the human population. Later studies that applied this concept to other
datasets raised concerns about the optimum number of clusters and robustness of the
clustering approach. This heralded a long standing debate about the existence of structure and
the best ways to determine and capture it. Here, we reconsider the concept of enterotypes, in
the context of the vastly increased amounts of available data. We propose a refined framework
in which the different types should be thought of as weak attractors in compositional space and
we try to implement an approach to determining which attractor a sample is closest to. To this
end, we train a classifier on a reference dataset to assign membership to new samples. This way,
enterotypes assignment is no longer dataset dependent and effects due to biased sampling are
minimized. Using a model in which we assume the existence of three enterotypes characterized
by the same driver genera, as originally postulated, we show the relevance of this stratification
and propose it to be used in a clinical setting as a potential marker for disease development.
Moreover, we believe that these attractors underline different rules of community assembly and
we recommend they be accounted for when analyzing gut microbiome samples.
While enterotypes describe structure in the community at genus level, metagenomic sequencing
can in principle achieve single-nucleotide resolution, allowing us to identify single nucleotide
polymorphisms (SNPs) and other genomic variants in the gut microbiome. Analysis
methodology for this level of resolution has only recently been developed and little exploration
has been done to date. Assessing SNPs in a large, multinational cohort, we discovered that the
landscape of genomic variation seems highly structured even beyond species resolution,
indicating that clearly distinguishable subspecies are prevalent among gut microbes. In several
cases, these subspecies exhibit geo-stratification, with some subspecies only found in the
Chinese population. Generally however, they present only minor dispersion limitations and are
seen across most of our study populations. Within one individual, one subspecies is commonly
found to dominate and only rarely are several subspecies observed to co-occur in the same
ecosystem. Analysis of longitudinal data indicates that the dominant subspecies remains stable
over periods of more than three years. When interrogating their functional properties we find
many differences, with specific ones appearing relevant to the host. For example, we identify a
subspecies of E. rectale that is lacking the flagellum operon and find its presence to be
significantly associated with lower body mass index and lower insulin resistance of their hosts;
it also correlates with higher microbial community diversity. These associations could not be
seen at the species level (where multiple subspecies are convoluted), which illustrates the
importance of this increased resolution for a more comprehensive understanding of microbial
interactions within the microbiome and with the host.
Taken together, our results provide a rigorous basis for performing comparative metagenomics
of the human gut, encompassing recommendations for both experimental sample processing
and computational analysis. We furthermore refine the concept of community stratification into
enterotypes, develop a reference-based approach for enterotype assignment and provide
compelling evidence for their relevance. Lastly, by harnessing the full resolution of
metagenomics, we discover a highly structured genomic variation landscape below the
microbial species level and identify common subspecies of the human gut microbiome. By
developing these high-precision metagenomics analysis tools, we thus hope to contribute to a
greatly improved understanding of the properties and dynamics of the human gut microbiome.
With the technological advances of the last decade, it is now feasible to analyze microbiome samples, such as human stool specimens, using multi-omic techniques. Given the inherent sample complexity, there exists a need for sample methods which preserve as much information as possible about the biological system at the time of sampling. Here, we analyzed human stool samples preserved and stored using different methods, applying metagenomics as well as metaproteomics. Our results demonstrate that sample preservation and storage have a significant effect on the taxonomic composition of identified proteins. The overall identification rates, as well as the proportion of proteins from Actinobacteria were much higher when samples were flash frozen. Preservation in RNAlater overall led to fewer protein identifications and a considerable increase in the share of Bacteroidetes, as well as Proteobacteria. Additionally, a decrease in the share of metabolism-related proteins and an increase of the relative amount of proteins involved in the processing of genetic information was observed for RNAlater-stored samples. This suggests that great care should be taken in choosing methods for the preservation and storage of microbiome samples, as well as in comparing the results of analyses using different sampling and storage methods. Flash freezing and subsequent storage at −80 °C should be chosen wherever possible.
Microalga are of high relevance for the global carbon cycling and it is well-known that they are associated with a microbiota. However, it remains unclear, if the associated microbiota, often found in phycosphere biofilms, is specific for the microalga strains and which role individual bacterial taxa play. Here we provide experimental evidence that \(Chlorella\) \(saccharophila\), \(Scenedesmus\) \(quadricauda\), and \(Micrasterias\) \(crux-melitensis\), maintained in strain collections, are associated with unique and specific microbial populations. Deep metagenome sequencing, binning approaches, secretome analyses in combination with RNA-Seq data implied fundamental differences in the gene expression profiles of the microbiota associated with the different microalga. Our metatranscriptome analyses indicates that the transcriptionally most active bacteria with respect to key genes commonly involved in plant–microbe interactions in the Chlorella (Trebouxiophyceae) and Scenedesmus (Chlorophyceae) strains belong to the phylum of the α-Proteobacteria. In contrast, in the Micrasterias (Zygnematophyceae) phycosphere biofilm bacteria affiliated with the phylum of the Bacteroidetes showed the highest gene expression rates. We furthermore show that effector molecules known from plant-microbe interactions as inducers for the innate immunity are already of relevance at this evolutionary early plant-microbiome level.
The biosphere harbors a large quantity and diversity of microbial organisms that can thrive in all environments. Estimates of the total number of microbial species reach up to 1012, of which less than 15,000 have been characterized to date. It has been challenging to delineate phenotypically, evolutionary and ecologically meaningful lineages such as for example, species, subspecies and strains. Even within recognized species, gene content can vary considerably between sublineages (for example strains), a problem that can be addressed by analyzing pangenomes, defined as the non-redundant set of genes within a phylogenetic clade, as evolutionary units.
Species considered to be ecologically and evolutionary coherent units, however to date it is still not fully understood what are primary habitats and ecological niches of many prokaryotic species and how environmental preferences drive their genomic diversity. Majority of comparative genomics studies focused on a single prokaryotic species in context of clinical relevance and ecology. With accumulation of sequencing data due to genomics and metagenomics, it is now possible to investigate trends across many species, which will facilitate understanding of pangenome evolution, species and subspecies delineation.
The major aims of this thesis were 1) to annotate habitat preferences of prokaryotic species and strains; 2) investigate to what extent these environmental preferences drive genomic diversity of prokaryotes and to what extent phylogenetic constraints limit this diversification; 3) explore natural nucleotide identity thresholds to delineate species in bacteria in metagenomics gene catalogs; 4) explore species delineation for applications in subspecies and strain delineation in metagenomics.
The first part of the thesis describes methods to infer environmental preferences of microbial species. This data is a prerequisite for the analyses performed in the second part of the thesis which explores how the structure of bacterial pangenomes is predetermined by past evolutionary history and how is it linked to environmental preferences of the species. The main finding in this subchapter that habitat preferences explained up to 49% of the variance for pangenome structure, compared to 18% by phylogenetic inertia. In general, this trend indicates that phylogenetic inertia does not limit evolution of pangenome size and diversity, but that convergent evolution may overcome phylogenetic constraints. In this project we show that core genome size is associated with higher environmental ubiquity of species. It is likely this is due to the fact that species need to have more versatile genomes and most necessary genes need to be present in majority of genomes of that species to be highly prevalent. Taken together these findings may be useful for future predictive analyses of ecological niches in newly discovered species.
The third part of the thesis explores data-driven, operational species boundaries. I show that homologous genes from the same species from different genomes tend to share at least 95% of nucleotide identity, while different species within the same genus have lower nucleotide identity. This is in line with other studies showing that genome-wide natural species boundary might be in range of 90-95% of nucleotide identity. Finally, the fourth part of the thesis discusses how challenges in species delineation are relevant for the identification of meaningful within-species groups, followed by a discussion on how advancements in species delineation can be applied for classification of within-species genomic diversity in the age of metagenomics.
Outdoor dust covers a shattered range of microbial agents from land over transportation, human microbial flora, which includes pathogen and commensals, and airborne from the environment. Dust aerosols are rich in bacterial communities that have a major impact on human health and living environments. In this study, outdoor samples from roadside barricades, safety walls, and fences (18 samples) were collected from Abu Dhabi, UAE and bacterial diversity was assessed through a 16S rRNA amplicon next generation sequencing approach. Clean data from HiSeq produced 1,099,892 total reads pairs for 18 samples. For all samples, taxonomic classifications were assigned to the OTUs (operational taxonomic units) representative sequence using the Ribosomal Database Project database. Analysis such as alpha diversity, beta diversity, differential species analysis, and species relative abundance were performed in the clustering of samples and a functional profile heat map was obtained from the OTUs by using bioinformatics tools. A total of 2814 OTUs were identified from those samples with a coverage of more than 99%. In the phylum, all 18 samples had most of the bacterial groups such as Actinobacteria, Proteobacteria, Firmicutes, and Bacteroidetes. Twelve samples had Propionibacteria acnes and were mainly found in RD16 and RD3. Major bacteria species such as Propionibacteria acnes, Bacillus persicus, and Staphylococcus captis were found in all samples. Most of the samples had Streptococcus mitis, Staphylococcus capitis. and Nafulsella turpanensis and Enhydrobacter aerosaccus was part of the normal microbes of the skin. Salinimicrobium sp., Bacillus alkalisediminis, and Bacillus persicus are halophilic bacteria found in sediments. The heat map clustered the samples and species in vertical and horizontal classification, which represents the relationship between the samples and bacterial diversity. The heat map for the functional profile had high properties of amino acids, carbohydrate, and cofactor and vitamin metabolisms of all bacterial species from all samples. Taken together, our analyses are very relevant from the perspective of out-door air quality, airborne diseases, and epidemics, with broader implications for health safety and monitoring.
Indoor house dust is a blend of organic and inorganic materials, upon which diverse microbial communities such as viruses, bacteria and fungi reside. Adequate moisture in the indoor environment helps microbial communities multiply fast. The outdoor air and materials that are brought into the buildings by airflow, sandstorms, animals pets and house occupants endow the indoor dust particles with extra features that impact human health. Assessment of the health effects of indoor dust particles, the type of indoor microbial inoculants and the secreted enzymes by indoor insects as allergens merit detailed investigation. Here, we discuss the applications of next generation sequencing (NGS) technology which is used to assess microbial diversity and abundance of the indoor dust environments. Likewise, the applications of NGS are discussed to monitor the gene expression profiles of indoor human occupants or their surrogate cellular models when exposed to aqueous solution of collected indoor dust samples. We also highlight the detection methods of dust allergens and analytical procedures that quantify the chemical nature of indoor particulate matter with a potential impact on human health. Our review is thus unique in advocating the applications of interdisciplinary approaches that comprehensively assess the health effects due to bad air quality in built environments.
Diversity of Nonribosomal Peptide Synthetase Genes in the Microbial Metagenomes of Marine Sponges
(2012)
Genomic mining revealed one major nonribosomal peptide synthetase (NRPS) phylogenetic cluster in 12 marine sponge species, one ascidian, an actinobacterial isolate and seawater. Phylogenetic analysis predicts its taxonomic affiliation to the actinomycetes and hydroxy-phenyl-glycine as a likely substrate. Additionally, a phylogenetically distinct NRPS gene cluster was discovered in the microbial metagenome of the sponge Aplysina aerophoba, which shows highest similarities to NRPS genes that were previously assigned, by ways of single cell genomics, to a Chloroflexi sponge symbiont. Genomic mining studies such as the one presented here for NRPS genes, contribute to on-going efforts to characterize the genomic potential of sponge-associated microbiota for secondary metabolite biosynthesis.
The gastrointestinal tract is abundantly colonized by microbes, yet the translocation of oral species to the intestine is considered a rare aberrant event, and a hallmark of disease. By studying salivary and fecal microbial strain populations of 310 species in 470 individuals from five countries, we found that transmission to, and subsequent colonization of, the large intestine by oral microbes is common and extensive among healthy individuals. We found evidence for a vast majority of oral species to be transferable, with increased levels of transmission in colorectal cancer and rheumatoid arthritis patients and, more generally, for species described as opportunistic pathogens. This establishes the oral cavity as an endogenous reservoir for gut microbial strains, and oral-fecal transmission as an important process that shapes the gastrointestinal microbiome in health and disease.
Sponges (phylum Porifera) are evolutionary ancient, sessile filter-feeders that harbor a largely diverse microbial community within their internal mesohyl matrix. Throughout this thesis project, I aimed at exploring the adaptations of these symbionts to life within their sponge host by sequencing and analyzing the genomes of a variety of bacteria from the microbiome of the Mediterranean sponge Aplysina aerophoba. Employed methods were fluorescence-activated cell sorting with subsequent multiple displacement amplification and single-cell / ‘mini-metagenome’ sequencing, and metagenomic sequencing followed by differential coverage binning. These two main approaches both aimed at obtaining genome sequences of bacterial symbionts of A. aerophoba, that were then compared to each other and to references from other environments, to gain information on adaptations to the host sponge environment and on possible interactions with the host and within the microbial community.
Cyanobacteria are frequent members of the sponge microbial community. My ‘mini-metagenome’ sequencing project delivered three draft genomes of “Candidatus Synechococcus spongiarum,” the cyanobacterial symbiont of A. aerophoba and many more sponges inhabiting the photic zone. The most complete of these genomes was compared to other clades of this symbiont and to closely related free-living cyanobacterial references in a collaborative project published in Burgsdorf I*, Slaby BM* et al. (2015; *shared first authorship). Although the four clades of “Ca. Synechococcus spongiarum” from the four sponge species A. aerophoba, Ircinia variabilis, Theonella swinhoei, and Carteriospongia foliascens were approximately 99% identical on the level of 16S rRNA gene sequences, they greatly differed on the genomic level. Not only the genome sizes were different from clade to clade, but also the gene content and a number of features including proteins containing the eukaryotic-type domains leucine-rich repeats or tetratricopeptide repeats. On the other hand, the four clades shared a number of features such as ankyrin repeat domain-containing proteins that seemed to be conserved also among other microbial phyla in different sponge hosts and from different geographic locations. A possible novel mechanism for host phagocytosis evasion and phage resistance by means of an altered O antigen of the lipopolysaccharide was identified.
To test previous hypotheses on adaptations of sponge-associated bacteria on a broader spectrum of the microbiome of A. aerophoba while also taking a step forward in methodology, I developed a bioinformatic pipeline to combine metagenomic Illumina short-read sequencing data with PacBio long-read data. At the beginning of this project, no pipelines to combine short-read and long-read data for metagenomics were published, and at time of writing, there are still no projects published with a comparable aim of un-targeted assembly, binning and analysis of a metagenome. I tried a variety of assembly programs and settings on a simulated test dataset reflecting the properties of the real metagenomic data. The developed assembly pipeline improved not only the overall assembly statistics, but also the quality of the binned genomes, which was evaluated by comparison to the originally published genome assemblies.
The microbiome of A. aerophoba was studied from various angles in the recent years, but only genomes of the candidate phylum Poribacteria and the cyanobacterial sequences from my above-described project have been published to date. By applying my newly developed assembly pipeline to a metagenomic dataset of A. aerophoba consisting of a PacBio long-read dataset and six Illumina short-read datasets optimized for subsequent differential coverage binning, I aimed at sequencing a larger number and greater diversity of symbionts. The results of this project are currently in review by The ISME Journal. The complementation of Illumina short-read with PacBio long-read sequencing data for binning of this highly complex metagenome greatly improved the overall assembly statistics and improved the quality of the binned genomes. Thirty-seven genomes from 13 bacterial phyla and candidate phyla were binned representing the most prominent members of the microbiome of A. aerophoba. A statistical comparison revealed an enrichment of genes involved in restriction modification and toxin-antitoxin systems in most symbiont genomes over selected reference genomes. Both are defense features against incoming foreign DNA, which may be important for sponge symbionts due to the sponge’s filtration and phagocytosis activity that exposes the symbionts to high levels of free DNA. Also host colonization and matrix utilization features were significantly enriched. Due to the diversity of the binned symbiont genomes, a within-symbionts genome comparison was possible, that revealed three guilds of symbionts characterized by i) nutritional specialization on the metabolization of carnitine, ii) specialization on sulfated polysaccharides, and iii) apparent nutritional generalism. Both carnitine and sulfated polysaccharides are abundant in the sponge extracellular matrix and therefore available to the sponge symbionts as substrates. In summary, the genomes of the diverse community of symbionts in A. aerophoba were united in their defense features, but specialized regarding their nutritional preferences.