TY  - THES
A1  - Arumugam, Manimozhiyan
T1  - Comparative metagenomic analysis of the human intestinal microbiota
T1  - Vergleichende metagenomische Analyse des menschlichen Darmflora
N2  - The human gut is home for thousands of microbes that are important for human life. As most of these cannot be cultivated, metagenomics is an important means to understand this important community. To perform comparative metagenomic analysis of the human gut microbiome, I have developed SMASH (Simple metagenomic analysis shell), a computational pipeline. SMASH can also be used to assemble and analyze single genomes, and has been successfully applied to the bacterium Mycoplasma pneumoniae and the fungus Chaetomium thermophilum. In the context of the MetaHIT (Metagenomics of the human intestinal tract) consortium our group is participating in, I used SMASH to validate the assembly and to estimate the assembly error rate of 576.7 Gb metagenome sequence obtained using Illumina Solexa technology from fecal DNA of 124 European individuals. I also estimated the completeness of the gene catalogue containing 3.3 million open reading frames obtained from these metagenomes. Finally, I used SMASH to analyze human gut metagenomes of 39 individuals from 6 countries encompassing a wide range of host properties such as age, body mass index and disease states. We find that the variation in the gut microbiome is not continuous but stratified into enterotypes. Enterotypes are complex host-microbial symbiotic states that are not explained by host properties, nutritional habits or possible technical biases. The concept of enterotypes might have far reaching implications, for example, to explain different responses to diet or drug intake. We also find several functional markers in the human gut microbiome that correlate with a number of host properties such as body mass index, highlighting the need for functional analysis and raising hopes for the application of microbial markers as diagnostic or even prognostic tools for microbiota-associated human disorders.
N2  - Der menschliche Darm beheimatet tausende Mikroben, die für das menschliche Leben wichtig sind. Da die meisten dieser Mikroben nicht kultivierbar sind, ist „Metagenomics“ ein wichtiges Werkzeug zum Verständnis dieser wichtigen mikrobiellen Gemeinschaft. Um vergleichende Metagenomanalysen durchführen zu können, habe ich das Computerprogramm SMASH (Simple metagenomic analysis shell) entwickelt. SMASH kann auch zur Assemblierung und Analyse von Einzelgenomen benutzt werden und wurde erfolgreich auch das Bakterium Mycoplasma pneumoniae und den Pilz Chaetomium thermophilum angewandt. Im Zusammenhang mit der Beteiligung unserer Arbeitsgruppe am MetaHIT (Metagenomics of the human intestinal tract) Konsortium, habe ich SMASH benutzt um die Assemblierung zu validieren und die Fehlerrate der Assemblierung von 576.7 Gb Metagenomsequenzen, die mit der Illumina Solexa Technologie aus der fäkalen DNS von 124 europäischen Personen gewonnen wurde, zu bestimmen. Des Weiteren habe ich die Vollständigkeit des Genkatalogs dieser Metagenome, der 3.3 Millionen offene Leserahmen enthält, geschätzt. Zuletzt habe ich SMASH benutzt um die Darmmetagenome von 39 Personen aus 6 Ländern zu analysieren. Hauptergebnis dieser Analyse war, dass die Variation der Darmmikrobiota nicht kontinuierlich ist. Anstatt dessen fanden wir so genannte Enterotypen. Enterotypen sind komplexe Zustände der Symbiose zwischen Wirt und Mikroben, die sich nicht durch Wirteigenschaften, wie Alter, Body-Mass-Index, Erkrankungen und Ernährungseigenschaften oder ein mögliches technisches Bias erklären lassen. Das Konzept der Enterotypen könnte weitgehende Folgen haben. Diese könnten zum Beispiel die unterschiedlichen Reaktionen auf Diäten oder Medikamenteneinahmen erklären. Weiterhin konnten wir eine Anzahl an Markern im menschlichen Darmmikrobiome finden, die mit unterschiedlichen Wirtseigenschaften wie dem Body-Mass-Index korrelieren. Dies hebt die Wichtigkeit dieser Analysemethode hervor und erweckt Hoffnungen auf Anwendung mikrobieller Marker als diagnostisches oder sogar prognostisches Werkzeug für menschliche Erkrankungen in denen das Mikrobiom eine Rolle spielt.
KW  - Darmflora
KW  - Metagenom
KW  - Bioinformatik
KW  - human gut microbiome
KW  - metagenomics
KW  - comparative metagenomics
KW  - computational analysis
Y1  - 2010
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-55903
ER  - 
TY  - THES
A1  - Costea, Paul Igor
T1  - Stratification and variation of the human gut microbiota
T1  - Stratifikation und Variation des menschlichen Darmmikrobioms
N2  - The microbial communities that live inside the human gastrointestinal tract -the human gut
microbiome- are important for host health and wellbeing. Characterizing this new “organ”,
made up of as many cells as the human body itself, has recently become possible through
technological advances. Metagenomics, the high-throughput sequencing of DNA directly from
microbial communities, enables us to take genomic snapshots of thousands of microbes living
together in this complex ecosystem, without the need for isolating and growing them.
Quantifying the composition of the human gut microbiome allows us to investigate its
properties and connect it to host physiology and disease. The wealth of such connections was
unexpected and is probably still underestimated. Due to the fact that most of our dietary as well
as medicinal intake affects the microbiome and that the microbiome itself interacts with our
immune system through a multitude of pathways, many mechanisms have been proposed to
explain the observed correlations, though most have yet to be understood in depth.
An obvious prerequisite to characterizing the microbiome and its interactions with the host is
the accurate quantification of its composition, i.e. determining which microbes are present and
in what numbers they occur. Historically, standard practices have existed for sample handling,
DNA extraction and data analysis for many years. However, these were generally developed for
single microbe cultures and it is not always feasible to implement them in large scale
metagenomic studies. Partly because of this and partly because of the excitement that new
technology brings about, the first metagenomic studies each took the liberty to define their own
approach and protocols. From early meta-analysis of these studies it became clear that the
differences in sample handling, as well as differences in computational approaches, made
comparisons across studies very difficult. This restricts our ability to cross-validate findings of
individual studies and to pool samples from larger cohorts. To address the pressing need for
standardization, we undertook an extensive comparison of 21 different DNA extraction methods
as well as a series of other sample manipulations that affect quantification. We developed a
number of criteria for determining the measurement quality in the absence of a mock
community and used these to propose best practices for sampling, DNA extraction and library
preparation. If these were to be accepted as standards in the field, it would greatly improve
comparability across studies, which would dramatically increase the power of our inferences
and our ability to draw general conclusions about the microbiome.
Most metagenomics studies involve comparisons between microbial communities, for example
between fecal samples from cases and controls. A multitude of approaches have been proposed
to calculate community dissimilarities (beta diversity) and they are often combined with
various preprocessing techniques. Direct metagenomics quantification usually counts
sequencing reads mapped to specific taxonomic units, which can be species, genera, etc. Due to
technology-inherent differences in sampling depth, normalizing counts is necessary, for
instance by dividing each count by the sum of all counts in a sample (i.e. total sum scaling), or by
subsampling. To derive a single value for community (dis-)similarity, multiple distance
measures have been proposed. Although it is theoretically difficult to benchmark these
approaches, we developed a biologically motivated framework in which distance measures can
be evaluated. This highlights the importance of data transformations and their impact on the
measured distances.
Building on our experience with accurate abundance estimation and data preprocessing
techniques, we can now try and understand some of the basic properties of microbial
communities. In 2011, it was proposed that the space of genus level variation of the human gut
microbial community is structured into three basic types, termed enterotypes. These were
described in a multi-country cohort, so as to be independent of geography, age and other host
properties. Operationally defined through a clustering approach, they are “densely populated
areas in a multidimensional space of community composition”(source) and were proposed as a
general stratifier for the human population. Later studies that applied this concept to other
datasets raised concerns about the optimum number of clusters and robustness of the
clustering approach. This heralded a long standing debate about the existence of structure and
the best ways to determine and capture it. Here, we reconsider the concept of enterotypes, in
the context of the vastly increased amounts of available data. We propose a refined framework
in which the different types should be thought of as weak attractors in compositional space and
we try to implement an approach to determining which attractor a sample is closest to. To this
end, we train a classifier on a reference dataset to assign membership to new samples. This way,
enterotypes assignment is no longer dataset dependent and effects due to biased sampling are
minimized. Using a model in which we assume the existence of three enterotypes characterized
by the same driver genera, as originally postulated, we show the relevance of this stratification
and propose it to be used in a clinical setting as a potential marker for disease development.
Moreover, we believe that these attractors underline different rules of community assembly and
we recommend they be accounted for when analyzing gut microbiome samples.
While enterotypes describe structure in the community at genus level, metagenomic sequencing
can in principle achieve single-nucleotide resolution, allowing us to identify single nucleotide
polymorphisms (SNPs) and other genomic variants in the gut microbiome. Analysis
methodology for this level of resolution has only recently been developed and little exploration
has been done to date. Assessing SNPs in a large, multinational cohort, we discovered that the
landscape of genomic variation seems highly structured even beyond species resolution,
indicating that clearly distinguishable subspecies are prevalent among gut microbes. In several
cases, these subspecies exhibit geo-stratification, with some subspecies only found in the
Chinese population. Generally however, they present only minor dispersion limitations and are
seen across most of our study populations. Within one individual, one subspecies is commonly
found to dominate and only rarely are several subspecies observed to co-occur in the same
ecosystem. Analysis of longitudinal data indicates that the dominant subspecies remains stable
over periods of more than three years. When interrogating their functional properties we find
many differences, with specific ones appearing relevant to the host. For example, we identify a
subspecies of E. rectale that is lacking the flagellum operon and find its presence to be
significantly associated with lower body mass index and lower insulin resistance of their hosts;
it also correlates with higher microbial community diversity. These associations could not be
seen at the species level (where multiple subspecies are convoluted), which illustrates the
importance of this increased resolution for a more comprehensive understanding of microbial
interactions within the microbiome and with the host.
Taken together, our results provide a rigorous basis for performing comparative metagenomics
of the human gut, encompassing recommendations for both experimental sample processing
and computational analysis. We furthermore refine the concept of community stratification into
enterotypes, develop a reference-based approach for enterotype assignment and provide
compelling evidence for their relevance. Lastly, by harnessing the full resolution of
metagenomics, we discover a highly structured genomic variation landscape below the
microbial species level and identify common subspecies of the human gut microbiome. By
developing these high-precision metagenomics analysis tools, we thus hope to contribute to a
greatly improved understanding of the properties and dynamics of the human gut microbiome.
N2  - Die mikrobiellen Gemeinschaften innerhalb des menschlichen Darmtrakts – das menschliche
Darm-Mikrobiom - sind wichtig für das Wohlbefinden und die Gesundheit des Wirts. Die Charakterisierung dieses neuen “Organs”, welches aus ähnlich vielen Zellen besteht wie der menschliche Körper, ist in jüngster Zeit durch technologische Fortschritte möglich geworden. Die Metagenomik, die direkte Hochdurchsatz-Sequenzierung mikrobieller DNA, ermöglicht die Aufnahme “genomischer Schnappschüsse” tausender verschiedener, in einem komplexen Ökosystem zusammenlebender  Bakterien, ohne dafür auf deren Isolierung und Wachstum angewiesen zu sein. Die Quantifizierung des menschlichen Mikrobioms erlaubt es uns, seine Eigenschaften zu untersuchen und Verbindungen zu Wirtsphysiologie und -krankheiten zu knüpfen. Der Reichtum dieser Informationen ist unerwartet hoch und wahrscheinlich noch immer unterbewertet. Aufgrund der Tatsache, dass der Großteil unserer Ernährung und unseres Medikamentenkonsums unser Mikrobiom, welches wiederum selbst über verschiedene Arten mit unserem Immunsystem interagiert, beeinflusst, wurden viele Mechanismen vorgeschlagen, um die beobachteten Korrelationen zu erklären. Die meisten davon sind jedoch noch nicht vollständig verstanden.

Eine offensichtliche Komponente zur Charakterisierung des Mikrobioms und dessen Interaktionen mit dem Wirt ist eine akkurate Quantifizierung seiner genauen Zusammensetzung, womit sowohl die Anwesenheit von bestimmten Bakterien als auch deren Anzahl gemeint ist. Obwohl etablierte Standardprozeduren zur Probenbehandlung, DNA- Extrahierung und Datenanalyse existieren, sind sie nicht immer für metagenomische Studien anwendbar, da sie für isolierte Bakterienkulturen entwickelt  worden. Deswegen und auch wegen der Begeisterung, die neuartige Technologien mit sich bringen, nahmen sich die ersten metagenomischen Studien jeweils die Freiheit, ihre eigenen Protokolle und Herangehensweisen zu definieren. Die Metaanalyse dieser Studien zeigte, dass Unterschiede sowohl in der Probenbehandlung als auch in der statistischen Auswertung den Vergleich zwischen Studien sehr schwierig machen. Das wiederum beschneidet unsere Fähigkeit, Entdeckungen zu bestätigen und Daten über Studien hinweg zu kombinieren. Um die zwingend notwendige Standardisierung voranzutreiben haben wir einen umfassenden Vergleich von 21 verschiedenen DNA-Extraktionsmethoden sowie verschiedener weiterer Probenbehandlungen, welche Quantifizierungen beeinflussen, vorgenommen. Wir haben eine Reihe von Kriterien entwickelt, um die Messqualität in Abwesenheit von Mock-Kontrollen zu bestimmen und schlagen anhand dieser Methoden für Probenbeschaffung, DNA-Extraktion und Library- Generierung optimale Verfahren vor. Wenn diese als Standard akzeptiert werden, würde das eine stark verbesserte Vergleichbarkeit zwischen Studien ermöglichen und damit sowohl einen extremen Zuwachs an statistischer Power als auch unserer Fähigkeit, generelle Schlüsse über das Mikrobiom zu ziehen, zur Folge haben.

Die meisten metagenomischen Studien teilen ihre Datensätze auf um Vergleiche anzustellen, z.B. zwischen Stuhlproben gesunder und erkrankter Menschen. Eine Vielzahl verschiedener Ansätze, welche wiederum oft mit verschiedenen Datenvorbehandlungen kombiniert werden, wurden vorgeschlagen, um Dissimilarität zwischen  Gemeinschaften (Beta-Diversität) zu berechnen. Um metagenomische Daten auf Spezies-, Genus- und höheren Ebenen zu quantifizieren werden üblicherweise reads auf Referenzgenome bestimmter taxonomischer Einheiten aligniert und gezählt. Aufgrund technologieabhängiger Unterschiede in Sequenziertiefe müssen reads normalisiert werden, z.B. indem man alle counts durch die Gesamtanzahl der counts einer Sequenzierung teilt (total sum scaling), oder durch subsampling. Für die Messung der Gemeinschafts(dis)similarität wurden viele Distanzmaße vorgeschlagen.
Da  es  schwierig  ist  diese  Ansätze  theoretisch  zu  vergleichen,  haben  wir  ein  biologisch
 

motiviertes Konzept entwickelt, mit dem man Distanzmaße evaluieren kann. Dies unterstreicht die Wichtigkeit der Datentransformation und dessen Einwirkung auf Distanzmaße.

Aufbauend auf unserer Erfahrung mit Häufigkeitsabschätzungen und Techniken zur Datenvorbehandlung können wir nun versuchen, grundlegende Eigenschaften mikrobieller Gemeinschaften zu verstehen. 2011 wurde vorgeschlagen, dass sich die Variation auf Genusebene im menschlichen Darm auf drei grundlegende Typen beschränkt, welche Enterotypen getauft wurden. Diese wurden in Datensätzen verschiedener Länder als unabhängig von Herkunft, Alter und anderer Wirtseigenschaften beschrieben. Die Enterotypen sind durch einen Cluster-Ansatz als „dicht besiedelte Bereiche in einem multidimensionalen Raum der Gemeinschaftszusammensetzung“ definiert und wurden als grundlegende Stratifikatoren für die menschlichen Population vorgeschlagen. Spätere Studien, welche dieses Konzept auf andere Datensätze anwandten, erhoben Zweifel bezüglich der optimalen Anzahl an Clustern und an der generellen Robustheit des Ansatzes. Dies leitete erneut eine langanhaltende Debate über die  Existenz von Strukturen und die besten Wege, diese zu bestimmen und einzufangen, ein. Hier überdenken wir, in Anbetracht der stark gestiegenen Anzahl an verfügbaren Daten, das Enterotypen-Konzept. Wir schlagen ein überarbeitetes Konzept vor, in welchem die verschiedenen Enterotypen als schwache Attraktoren im multidimensionalen Raum verstanden werden und implementieren einen Ansatz zur Berechnung des Attraktors, der dem Datensatz am ähnlichsten ist. Dafür trainieren wir einen Klassifizierer auf einen Referenz- Datensatz, um neue Datensätze zuzuordnen. Damit ist Enterotypisierung nicht mehr datensatzabhängig und der Effekt von sampling bias ist minimiert. Indem wir ein Modell nutzen für das wir die Existenz dreier Enterotypen (definiert durch die selben Genera wie ursprünglich postuliert) annehmen, zeigen wir die Relevanz dieser Stratifikation und schlagen es in einem klinischen Zusammenhang als potentiellen Marker für Krankheitsfortschritt vor. Außerdem glauben wir, dass diese Attraktoren verschiedene Regeln mikrobieller Zusammensetzung widerspiegeln und schlagen vor, sie bei der Analyse von mikrobiellen Daten zu berücksichtigen.

Während Enterotypen Struktur in der Gemeinschaft auf Genusebene beschreiben, kann metagenomische Sequenzierung prinzipiell Auflösung auf Nukleotidebene erreichen, womit single nucleotide polymorphisms (SNPs) und andere genomische Variationen im Darm- Mikrobiom identifiziert werden können. Analysemethoden für dieses Auflösungsniveau wurden erst kürzlich entwickelt und bis heute wurden diese erst wenig erforscht. Wir zeigen, dass die Landschaft an genomischer Variation von SNPs in einer großen, multinationalen Kohorte sogar über die Speziesebene hinaus geht und hochgradig strukturiert ist, was das Vorkommen klar abgrenzbarer Subspezies unter Darmmikroben suggeriert. In mehreren Fällen zeigen diese Subspezies geographische Stratifikation, wobei einige Subspezies nur in chinesischen Populationen vorkommen. Im Allgemein zeigen Sie jedoch nur eine geringfügige Beschränkung der Dispersion und sind in der Mehrzahl der Populationen vorhanden. Innerhalb eines Individuums dominiert häufig eine bestimmte Subspezies, nur selten dominieren verschieden gemeinsam im gleichen Ökosystem. Eine Analyse von Zeitreihenexperimenten deutet darauf hin, dass die dominante Subspezies über Zeiträume von mehr als drei Jahren stabil bleibt. Wenn man ihre funktionalen Eigenschaften untersucht findet man viele Unterschiede, von denen bestimmte relevant für den Wirt erscheinen. Zum Beispiel identifizieren wir eine Subspezies von E. rectale, welcher das Flagellum-Operon fehlt, die signifikant assoziiert ist mit geringerem BMI und geringerer Insulinresistenz ihres Wirts; sie korreliert zudem mit höherer mikrobieller Diversität. Diese Assoziationen konnten auf Speziesebene nicht gesehen werden (auf der mehrere Subspezies überlagert sind), was die Wichtigkeit dieser erhöhten Auflösung für ein umfassenderes Verständnis mikrobieller Interaktionen innerhalb des Mikrobioms und mit dem Wirt illustriert.
 
Zusammenfassend   bieten  unsere  Ergebnisse  eine  präzise   Grundlage  für   vergleichende
Metagenomik des  menschlichen Darms, einschließlich Empfehlungen über experimentelles Sampling und statistische Analysen. Weiterhin verfeinern wir das Konzept der Enterotypen- Stratifikation in Gemeinschaften, entwickeln referenzbasierte Ansätze für Enterotypen- Zuordnung und bieten überzeugende Beweise für ihre Relevanz. Indem wir die volle Auflösung metagenomischer Sequenzierungen nutzen entdecken wir eine Landschaft hochgradig strukturierter genomischer Variation  unterhalb  der Speziesebene und identifizieren gemeinsame Subspezies des menschlichen Darm-Mikrobioms. Durch die Entwicklung dieser hochpräzisen  metagenomischen  Untersuchungsansätze  tragen  wir  zu  einem  verbesserten
KW  - metagenomics
KW  - microbiology
KW  - Mensch
KW  - Darmflora
KW  - Metagenom
Y1  - 2016
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-139649
ER  - 
TY  - JOUR
A1  - Costea, Paul I.
A1  - Coelho, Louis Pedro
A1  - Sunagawa, Shinichi
A1  - Munch, Robin
A1  - Huerta-Cepas, Jaime
A1  - Forslund, Kristoffer
A1  - Hildebrand, Falk
A1  - Kushugulova, Almagul
A1  - Zeller, Georg
A1  - Bork, Peer
T1  - Subspecies in the global human gut microbiome
JF  - Molecular Systems Biology
N2  - Population genomics of prokaryotes has been studied in depth in only a small number of primarily pathogenic bacteria, as genome sequences of isolates of diverse origin are lacking for most species. Here, we conducted a large‐scale survey of population structure in prevalent human gut microbial species, sampled from their natural environment, with a culture‐independent metagenomic approach. We examined the variation landscape of 71 species in 2,144 human fecal metagenomes and found that in 44 of these, accounting for 72% of the total assigned microbial abundance, single‐nucleotide variation clearly indicates the existence of sub‐populations (here termed subspecies). A single subspecies (per species) usually dominates within each host, as expected from ecological theory. At the global scale, geographic distributions of subspecies differ between phyla, with Firmicutes subspecies being significantly more geographically restricted. To investigate the functional significance of the delineated subspecies, we identified genes that consistently distinguish them in a manner that is independent of reference genomes. We further associated these subspecies‐specific genes with properties of the microbial community and the host. For example, two of the three Eubacterium rectale subspecies consistently harbor an accessory pro‐inflammatory flagellum operon that is associated with lower gut community diversity, higher host BMI, and higher blood fasting insulin levels. Using an additional 676 human oral samples, we further demonstrate the existence of niche specialized subspecies in the different parts of the oral cavity. Taken together, we provide evidence for subspecies in the majority of abundant gut prokaryotes, leading to a better functional and ecological understanding of the human gut microbiome in conjunction with its host.
KW  - biology
KW  - genetic variation
KW  - metagenomics
KW  - microbiome
KW  - population structure
KW  - prokaryotic subspecies
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-172674
VL  - 13
IS  - 12
ER  - 
TY  - JOUR
A1  - Coelho, Luis Pedro
A1  - Kultima, Jens Roat
A1  - Costea, Paul Igor
A1  - Fournier, Coralie
A1  - Pan, Yuanlong
A1  - Czarnecki-Maulden, Gail
A1  - Hayward, Matthew Robert
A1  - Forslund, Sofia K.
A1  - Schmidt, Thomas Sebastian Benedikt
A1  - Descombes, Patrick
A1  - Jackson, Janet R.
A1  - Li, Qinghong
A1  - Bork, Peer
T1  - Similarity of the dog and human gut microbiomes in gene content and response to diet
JF  - Microbiome
N2  - Background
Gut microbes influence their hosts in many ways, in particular by modulating the impact of diet. These effects have been studied most extensively in humans and mice. In this work, we used whole genome metagenomics to investigate the relationship between the gut metagenomes of dogs, humans, mice, and pigs.

Results
We present a dog gut microbiome gene catalog containing 1,247,405 genes (based on 129 metagenomes and a total of 1.9 terabasepairs of sequencing data). Based on this catalog and taxonomic abundance profiling, we show that the dog microbiome is closer to the human microbiome than the microbiome of either pigs or mice. To investigate this similarity in terms of response to dietary changes, we report on a randomized intervention with two diets (high-protein/low-carbohydrate vs. lower protein/higher carbohydrate). We show that diet has a large and reproducible effect on the dog microbiome, independent of breed or sex. Moreover, the responses were in agreement with those observed in previous human studies.

Conclusions
We conclude that findings in dogs may be predictive of human microbiome results. In particular, a novel finding is that overweight or obese dogs experience larger compositional shifts than lean dogs in response to a high-protein diet.
KW  - microbiome
KW  - diet
KW  - metagenomics
KW  - dog microbiome
KW  - human microbiome
KW  - mouse microbiome
KW  - pig microbiome
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-223177
VL  - 6
ER  - 
TY  - JOUR
A1  - Schmidt, Thomas S. B.
A1  - Hayward, Matthew R.
A1  - Coelho, Luiis P.
A1  - Li, Simone S.
A1  - Costea, Paul I.
A1  - Voigt, Anita Y.
A1  - Wirbel, Jakob
A1  - Maistrenko, Oleksandr M.
A1  - Alves, Renato J. C.
A1  - Bergsten, Emma
A1  - de Beaufort, Carine
A1  - Sobhani, Iradj
A1  - Heintz-Buschart, Anna
A1  - Sunagawa, Shinichi
A1  - Zeller, Georg
A1  - Wilmes, Paul
A1  - Bork, Peer
T1  - Extensive transmission of microbes along the gastrointestinal tract
JF  - eLife
N2  - The gastrointestinal tract is abundantly colonized by microbes, yet the translocation of oral species to the intestine is considered a rare aberrant event, and a hallmark of disease. By studying salivary and fecal microbial strain populations of 310 species in 470 individuals from five countries, we found that transmission to, and subsequent colonization of, the large intestine by oral microbes is common and extensive among healthy individuals. We found evidence for a vast majority of oral species to be transferable, with increased levels of transmission in colorectal cancer and rheumatoid arthritis patients and, more generally, for species described as opportunistic pathogens. This establishes the oral cavity as an endogenous reservoir for gut microbial strains, and oral-fecal transmission as an important process that shapes the gastrointestinal microbiome in health and disease.
KW  - Colonization
KW  - Annotation
KW  - Dynamics
KW  - Accurate
KW  - Strains
KW  - Barrier
KW  - Health
KW  - Acids
KW  - Research Article
KW  - Computational and Systems Biology
KW  - Microbiology and Infectious Disease
KW  - microbiome
KW  - gastrointestinal tract
KW  - colorectal cancer
KW  - rheumatoid arthritis
KW  - metagenomics
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-228954
VL  - 8
ER  - 
TY  - JOUR
A1  - Coelho, Luis Pedro
A1  - Alves, Renato
A1  - Monteiro, Paulo
A1  - Huerta-Cepas, Jaime
A1  - Freitas, Ana Teresa
A1  - Bork, Peer
T1  - NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language
JF  - Microbiome
N2  - Background
Shotgun metagenomes contain a sample of all the genomic material in an environment, allowing for the characterization of a microbial community. In order to understand these communities, bioinformatics methods are crucial. A common first step in processing metagenomes is to compute abundance estimates of different taxonomic or functional groups from the raw sequencing data.

Given the breadth of the field, computational solutions need to be flexible and extensible, enabling the combination of different tools into a larger pipeline.

Results
We present NGLess and NG-meta-profiler. NGLess is a domain specific language for describing next-generation sequence processing pipelines. It was developed with the goal of enabling user-friendly computational reproducibility. It provides built-in support for many common operations on sequencing data and is extensible with external tools with configuration files.

Using this framework, we developed NG-meta-profiler, a fast profiler for metagenomes which performs sequence preprocessing, mapping to bundled databases, filtering of the mapping results, and profiling (taxonomic and functional). It is significantly faster than either MOCAT2 or htseq-count and (as it builds on NGLess) its results are perfectly reproducible.

Conclusions
NG-meta-profiler is a high-performance solution for metagenomics processing built on NGLess. It can be used as-is to execute standard analyses or serve as the starting point for customization in a perfectly reproducible fashion.

NGLess and NG-meta-profiler are open source software (under the liberal MIT license) and can be downloaded from https://ngless.embl.de or installed through bioconda.
KW  - metagenomics
KW  - next-generation sequencing
KW  - domain-specific language
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-223161
VL  - 7
IS  - 84
ER  - 
TY  - THES
A1  - Maistrenko, Oleksandr
T1  - Pangenome analysis of bacteria and its application in metagenomics
T1  - Bakterielle Pan-Genome und ihre Anwendungen in der Metagenomik
N2  - The biosphere harbors a large quantity and diversity of microbial organisms that can thrive in all environments. Estimates of the total number of microbial species reach up to 1012, of which less than 15,000 have been characterized to date. It has been challenging to delineate phenotypically, evolutionary and ecologically meaningful lineages such as for example, species, subspecies and strains. Even within recognized species, gene content can vary considerably between sublineages (for example strains), a problem that can be addressed by analyzing pangenomes, defined as the non-redundant set of genes within a phylogenetic clade, as evolutionary units. 
Species considered to be ecologically and evolutionary coherent units, however to date it is still not fully understood what are primary habitats and ecological niches of many prokaryotic species and how environmental preferences drive their genomic diversity. Majority of comparative genomics studies focused on a single prokaryotic species in context of clinical relevance and ecology. With accumulation of sequencing data due to genomics and metagenomics, it is now possible to investigate trends across many species, which will facilitate understanding of pangenome evolution, species and subspecies delineation.
The major aims of this thesis were 1) to annotate habitat preferences of prokaryotic species and strains; 2) investigate to what extent these environmental preferences drive genomic diversity of prokaryotes and to what extent phylogenetic constraints limit this diversification; 3) explore natural nucleotide identity thresholds to delineate species in bacteria in metagenomics gene catalogs; 4) explore species delineation for applications in subspecies and strain delineation in metagenomics.
The first part of the thesis describes methods to infer environmental preferences of microbial species. This data is a prerequisite for the analyses performed in the second part of the thesis which explores how the structure of bacterial pangenomes is predetermined by past evolutionary history and how is it linked to environmental preferences of the species. The main finding in this subchapter that habitat preferences explained up to 49% of the variance for pangenome structure, compared to 18% by phylogenetic inertia. In general, this trend indicates that phylogenetic inertia does not limit evolution of pangenome size and diversity, but that convergent evolution may overcome phylogenetic constraints. In this project we show that core genome size is associated with higher environmental ubiquity of species. It is likely this is due to the fact that species need to have more versatile genomes and most necessary genes need to be present in majority of genomes of that species to be highly prevalent. Taken together these findings may be useful for future predictive analyses of ecological niches in newly discovered species.
The third part of the thesis explores data-driven, operational species boundaries. I show that homologous genes from the same species from different genomes tend to share at least 95% of nucleotide identity, while different species within the same genus have lower nucleotide identity. This is in line with other studies showing that genome-wide natural species boundary might be in range of 90-95% of nucleotide identity. Finally, the fourth part of the thesis discusses how challenges in species delineation are relevant for the identification of meaningful within-species groups, followed by a discussion on how advancements in species delineation can be applied for classification of within-species genomic diversity in the age of metagenomics.
N2  - Die Biosphäre beherbergt eine große Zahl verschiedener Mikroorganismen, die fast alle bekannten Lebensräume besiedeln können. Die Gesamtzahl mikrobieller Spezies liegt Schätzungen zu Folge bei bis zu 1012, von denen jedoch bis heute erst 15.000 beschrieben worden sind. Die Beschreibung von phänotypisch, evolutionsbiologisch und ökologisch kohärenten Spezies, Sub-Spezies oder Stämmen stellt Forscher vor konzeptionelle Herausforderungen. Selbst innerhalb anerkannter Spezies kann die Kombination einzelner Gene oft stark variieren. Diese Beobachtung ist die Grundlage der Analyse von Pan-Genomen. also der Konstellation originärer Gene innerhalb einer Abstammunsglinie, als evolutionsbiologische Einheiten.
Spezies entsprechen prinzipiell ökologisch und evolutionär kohärenten Einheiten, jedoch sind die primären Habitate und ökologischen Nischen vieler prokaryotischer Spezies bis heute nur unzureichend beschrieben, insbesondere mit Blick auf den Einfluss ökologischer Präferenzen auf die Evolution von Genomen. Die Mehrheit vergleichender genomischer Studien untersucht einzelne prokaryotische Spezies mit Bezug auf deren klinische oder ökologische Relevanz. Aufgrund der wachsenden Verfügbarkeit genomischer Daten ist es nun jedoch möglich, vergleichende Studien über Speziesgrenzen hinweg durchzuführen, um allgemeine Prinzipien der Evolution von Pan-Genomen, Spezies und Sub-Spezies zu untersuchen.
Die wesentlichen Ziele der vorliegenden Arbeit waren 1) die Annotation von Habitatpräferenzen prokaryotischer Spezies und Stämme; 2) die Quantifizierung des Einflusses von Umwelt und Evolutionsgeschichte (Phylogenie) auf die genomische Diversität von Prokaryoten; 3) die Bestimmung natürlicher Schwellenwerte der Genomsequenzähnlichkeit zwischen Spezies, auch anhand von Genkatalogen; 4) die Untersuchung der Abgrenzung zwischen Spezies, Sub-Spezies und Stämmen mithilfe metagenomischer Daten.
Im ersten Teil der Arbeit werden Methoden zur Bestimmung ökologischer Präferenzen mikrobieller Spezies beschrieben. Die so gewonnenen Daten dienen in der Folge als Grundlage für die Quantifizierung von Umwelt- und evolutionsgeschichtlichen Einflüssen auf die Struktur und Evolution bakterieller Pan-Genome im zweiten Teil der Arbeit. Ein zentrales Ergebnis dieser Untersuchung war, dass bis zu 49% der strukturellen Varianz in Pan-Genomen durch Habitatpräferenzen erklärt werden kann, im Gegensatz zu lediglich 18% durch phylogenetische Trägheitseffekte. Dies zeigt, dass die Größe und Diversität von Pan-Genomen nicht phylogenetisch limitiert ist, insbesondere in Fällen von konvergenter Evolution. Große Kern-Genome sind ferner mit einer weiten ökologischen Verbreitung von Spezies assoziiert; eine mögliche Erklärung ist, dass weit verbreitete Spezies vielseitigere Genome mit mehr notwendigen Genen besitzen, die ein Überleben in vielfältigen Umgebungen ermöglichen. Die vorgelegte Arbeit kann weiterhin einen Beitrag zur Vorhersage ökologischer Profile neu beschriebener Spezies leisten.
Im dritten Teil der Arbeit werden datenbezogene, operationelle Definition von Spezies-Grenzen untersucht. Es konnte gezeigt werden, dass Gene verschiedener Genome innerhalb derselben Spezies normalerweise mindestens 95% Ähnlichkeit der Nukleotidsequenz aufweisen, während die Ähnlichkeit zwischen Spezies desselben Genus geringer ausfällt. Dieser Wert liegt im Rahmen früherer Schätzungen. Der vierte Teil der Arbeit beschreibt abschließend die Herausforderungen bei der Bestimmung von evolutionären Linien innerhalb von Spezies und diskutiert anschließend, wie konzeptionelle Entwicklungen in dieser Frage für die Klassifizierung und Quantifizierung von Diversität anhand metagenomischer Daten genutzt werden kann.
KW  - Pangenom
KW  - phylogenetische Trägheit
KW  - Lebensraum
KW  - Stammvielfalt
KW  - mikrobielle Ökologie und Evolution
KW  - pangenome
KW  - phylogenetic inertia
KW  - habitat
KW  - strain diversity
KW  - microbial ecology and evolution
KW  - metagenomics
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-214996
ER  - 
TY  - JOUR
A1  - Nazzal, Yousef
A1  - Howari, Fares M.
A1  - Yaslam, Aya
A1  - Iqbal, Jibran
A1  - Maloukh, Lina
A1  - Ambika, Lakshmi Kesari
A1  - Al-Taani, Ahmed A.
A1  - Ali, Ijaz
A1  - Othman, Eman M.
A1  - Jamal, Arshad
A1  - Naseem, Muhammad
T1  - A methodological review of tools that assess dust microbiomes, metatranscriptomes and the particulate chemistry of indoor dust
JF  - Atmosphere
N2  - Indoor house dust is a blend of organic and inorganic materials, upon which diverse microbial communities such as viruses, bacteria and fungi reside. Adequate moisture in the indoor environment helps microbial communities multiply fast. The outdoor air and materials that are brought into the buildings by airflow, sandstorms, animals pets and house occupants endow the indoor dust particles with extra features that impact human health. Assessment of the health effects of indoor dust particles, the type of indoor microbial inoculants and the secreted enzymes by indoor insects as allergens merit detailed investigation. Here, we discuss the applications of next generation sequencing (NGS) technology which is used to assess microbial diversity and abundance of the indoor dust environments. Likewise, the applications of NGS are discussed to monitor the gene expression profiles of indoor human occupants or their surrogate cellular models when exposed to aqueous solution of collected indoor dust samples. We also highlight the detection methods of dust allergens and analytical procedures that quantify the chemical nature of indoor particulate matter with a potential impact on human health. Our review is thus unique in advocating the applications of interdisciplinary approaches that comprehensively assess the health effects due to bad air quality in built environments.
KW  - indoor dust
KW  - allergens
KW  - metagenomics
KW  - particulate matter
KW  - microbiomes
KW  - transcriptomes
KW  - health effects
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-285957
SN  - 2073-4433
VL  - 13
IS  - 8
ER  - 
TY  - JOUR
A1  - Maloukh, Lina
A1  - Nazzal, Yousef
A1  - Kumarappan, Alagappan
A1  - Howari, Fares
A1  - Ambika, Lakshmi Kesari
A1  - Yahmadi, Rihab
A1  - Sharma, Manish
A1  - Iqbal, Jibran
A1  - Al-Taani, Ahmed A.
A1  - Salem, Imen Ben
A1  - Xavier, Cijo M.
A1  - Naseem, Muhamad
T1  - Metagenomic analysis of the outdoor dust microbiomes: a case study from Abu Dhabi, UAE
JF  - Atmosphere
N2  - Outdoor dust covers a shattered range of microbial agents from land over transportation, human microbial flora, which includes pathogen and commensals, and airborne from the environment. Dust aerosols are rich in bacterial communities that have a major impact on human health and living environments. In this study, outdoor samples from roadside barricades, safety walls, and fences (18 samples) were collected from Abu Dhabi, UAE and bacterial diversity was assessed through a 16S rRNA amplicon next generation sequencing approach. Clean data from HiSeq produced 1,099,892 total reads pairs for 18 samples. For all samples, taxonomic classifications were assigned to the OTUs (operational taxonomic units) representative sequence using the Ribosomal Database Project database. Analysis such as alpha diversity, beta diversity, differential species analysis, and species relative abundance were performed in the clustering of samples and a functional profile heat map was obtained from the OTUs by using bioinformatics tools. A total of 2814 OTUs were identified from those samples with a coverage of more than 99%. In the phylum, all 18 samples had most of the bacterial groups such as Actinobacteria, Proteobacteria, Firmicutes, and Bacteroidetes. Twelve samples had Propionibacteria acnes and were mainly found in RD16 and RD3. Major bacteria species such as Propionibacteria acnes, Bacillus persicus, and Staphylococcus captis were found in all samples. Most of the samples had Streptococcus mitis, Staphylococcus capitis. and Nafulsella turpanensis and Enhydrobacter aerosaccus was part of the normal microbes of the skin. Salinimicrobium sp., Bacillus alkalisediminis, and Bacillus persicus are halophilic bacteria found in sediments. The heat map clustered the samples and species in vertical and horizontal classification, which represents the relationship between the samples and bacterial diversity. The heat map for the functional profile had high properties of amino acids, carbohydrate, and cofactor and vitamin metabolisms of all bacterial species from all samples. Taken together, our analyses are very relevant from the perspective of out-door air quality, airborne diseases, and epidemics, with broader implications for health safety and monitoring.
KW  - dust microbiomes
KW  - metagenomics
KW  - microbial diversity
KW  - pollution
KW  - GIS
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-304391
SN  - 2073-4433
VL  - 14
IS  - 2
ER  -