Refine
Has Fulltext
- yes (14)
Is part of the Bibliography
- yes (14)
Year of publication
Document Type
- Doctoral Thesis (14)
Keywords
- Mensch (14) (remove)
Institute
- Theodor-Boveri-Institut für Biowissenschaften (14) (remove)
The microbial communities that live inside the human gastrointestinal tract -the human gut
microbiome- are important for host health and wellbeing. Characterizing this new “organ”,
made up of as many cells as the human body itself, has recently become possible through
technological advances. Metagenomics, the high-throughput sequencing of DNA directly from
microbial communities, enables us to take genomic snapshots of thousands of microbes living
together in this complex ecosystem, without the need for isolating and growing them.
Quantifying the composition of the human gut microbiome allows us to investigate its
properties and connect it to host physiology and disease. The wealth of such connections was
unexpected and is probably still underestimated. Due to the fact that most of our dietary as well
as medicinal intake affects the microbiome and that the microbiome itself interacts with our
immune system through a multitude of pathways, many mechanisms have been proposed to
explain the observed correlations, though most have yet to be understood in depth.
An obvious prerequisite to characterizing the microbiome and its interactions with the host is
the accurate quantification of its composition, i.e. determining which microbes are present and
in what numbers they occur. Historically, standard practices have existed for sample handling,
DNA extraction and data analysis for many years. However, these were generally developed for
single microbe cultures and it is not always feasible to implement them in large scale
metagenomic studies. Partly because of this and partly because of the excitement that new
technology brings about, the first metagenomic studies each took the liberty to define their own
approach and protocols. From early meta-analysis of these studies it became clear that the
differences in sample handling, as well as differences in computational approaches, made
comparisons across studies very difficult. This restricts our ability to cross-validate findings of
individual studies and to pool samples from larger cohorts. To address the pressing need for
standardization, we undertook an extensive comparison of 21 different DNA extraction methods
as well as a series of other sample manipulations that affect quantification. We developed a
number of criteria for determining the measurement quality in the absence of a mock
community and used these to propose best practices for sampling, DNA extraction and library
preparation. If these were to be accepted as standards in the field, it would greatly improve
comparability across studies, which would dramatically increase the power of our inferences
and our ability to draw general conclusions about the microbiome.
Most metagenomics studies involve comparisons between microbial communities, for example
between fecal samples from cases and controls. A multitude of approaches have been proposed
to calculate community dissimilarities (beta diversity) and they are often combined with
various preprocessing techniques. Direct metagenomics quantification usually counts
sequencing reads mapped to specific taxonomic units, which can be species, genera, etc. Due to
technology-inherent differences in sampling depth, normalizing counts is necessary, for
instance by dividing each count by the sum of all counts in a sample (i.e. total sum scaling), or by
subsampling. To derive a single value for community (dis-)similarity, multiple distance
measures have been proposed. Although it is theoretically difficult to benchmark these
approaches, we developed a biologically motivated framework in which distance measures can
be evaluated. This highlights the importance of data transformations and their impact on the
measured distances.
Building on our experience with accurate abundance estimation and data preprocessing
techniques, we can now try and understand some of the basic properties of microbial
communities. In 2011, it was proposed that the space of genus level variation of the human gut
microbial community is structured into three basic types, termed enterotypes. These were
described in a multi-country cohort, so as to be independent of geography, age and other host
properties. Operationally defined through a clustering approach, they are “densely populated
areas in a multidimensional space of community composition”(source) and were proposed as a
general stratifier for the human population. Later studies that applied this concept to other
datasets raised concerns about the optimum number of clusters and robustness of the
clustering approach. This heralded a long standing debate about the existence of structure and
the best ways to determine and capture it. Here, we reconsider the concept of enterotypes, in
the context of the vastly increased amounts of available data. We propose a refined framework
in which the different types should be thought of as weak attractors in compositional space and
we try to implement an approach to determining which attractor a sample is closest to. To this
end, we train a classifier on a reference dataset to assign membership to new samples. This way,
enterotypes assignment is no longer dataset dependent and effects due to biased sampling are
minimized. Using a model in which we assume the existence of three enterotypes characterized
by the same driver genera, as originally postulated, we show the relevance of this stratification
and propose it to be used in a clinical setting as a potential marker for disease development.
Moreover, we believe that these attractors underline different rules of community assembly and
we recommend they be accounted for when analyzing gut microbiome samples.
While enterotypes describe structure in the community at genus level, metagenomic sequencing
can in principle achieve single-nucleotide resolution, allowing us to identify single nucleotide
polymorphisms (SNPs) and other genomic variants in the gut microbiome. Analysis
methodology for this level of resolution has only recently been developed and little exploration
has been done to date. Assessing SNPs in a large, multinational cohort, we discovered that the
landscape of genomic variation seems highly structured even beyond species resolution,
indicating that clearly distinguishable subspecies are prevalent among gut microbes. In several
cases, these subspecies exhibit geo-stratification, with some subspecies only found in the
Chinese population. Generally however, they present only minor dispersion limitations and are
seen across most of our study populations. Within one individual, one subspecies is commonly
found to dominate and only rarely are several subspecies observed to co-occur in the same
ecosystem. Analysis of longitudinal data indicates that the dominant subspecies remains stable
over periods of more than three years. When interrogating their functional properties we find
many differences, with specific ones appearing relevant to the host. For example, we identify a
subspecies of E. rectale that is lacking the flagellum operon and find its presence to be
significantly associated with lower body mass index and lower insulin resistance of their hosts;
it also correlates with higher microbial community diversity. These associations could not be
seen at the species level (where multiple subspecies are convoluted), which illustrates the
importance of this increased resolution for a more comprehensive understanding of microbial
interactions within the microbiome and with the host.
Taken together, our results provide a rigorous basis for performing comparative metagenomics
of the human gut, encompassing recommendations for both experimental sample processing
and computational analysis. We furthermore refine the concept of community stratification into
enterotypes, develop a reference-based approach for enterotype assignment and provide
compelling evidence for their relevance. Lastly, by harnessing the full resolution of
metagenomics, we discover a highly structured genomic variation landscape below the
microbial species level and identify common subspecies of the human gut microbiome. By
developing these high-precision metagenomics analysis tools, we thus hope to contribute to a
greatly improved understanding of the properties and dynamics of the human gut microbiome.
The first goal of this study was to develop cell lines with a stable expression of bio-fluorescent topo II and topo I. This was successfully achieved using a bicistronic vector system. Control experiments showed that proteins of expected size were expressed, and that GFP-tagged topos I, IIa, and IIb were active in the cells and fully integrated in the endogenous pools of the enzymes. These cell-lines provided a novel tool for investigating the cell biology of human DNA topoisomerases. Our most important finding was, that both types of mammalian topoisomerases are entirely mobile proteins that are in continuous and rapid flux between all compartments of the nucleus and between the cytososl and the chromosomes of mitotic cells. This was particularly surprising with regard to topo II, which is considered to be a structural component of the nuclear matrix and the chromosome scaffold. We must conclude that if this was the case, then these architectural structures appear to be much more dynamic than believed until now. In this context it should also be mentioned, that the alignment of topo II with the central axes of the chromosome arms, which has until now been considered a hall-mark of the enzyme’s association with the chromosomal scaffold, is not seen in vivo and can be demonstrated to be to some extent an artefact of immunohistochemistry. Furthermore, we show that the two isoforms of topo II (a and b) have a different localisation during mitotic cell division, supporting the general concept that topo II functions at mitosis are exclusively assigned to the a-form, whereas at interphase the two isoenzymes work in concert. Despite unrestricted mobility within the entire nuclear space, topoisomerases I and II impose as mostly nucleolar proteins. We show that this is due to the fact that in the nucleoli they are moving slower than in the nucleoplasm. The decreased nucleolar mobility cannot be due to DNA-interactions, because compounds that fix topoisomerases to the DNA deplete them from the nucleoli. Interestingly, the subnucleolar distribution of topoisomerases I and II was complementary. The type II enzyme filled the entire nucleolar space, but excluded the fibrial centers, whereas topo I accumulated at the fibrial centers, an allocation directed by the enzyme’s N-terminus. During mitosis, it also mediates association with the nucleolar organising regions of the acrocentric chromosomes. Thus, topo I stays associated with the rDNA during the entire cell-cycle and consistently colocalizes there with RNA-polymerase I. Finally, we show that certain cancer drugs believed to act by stabilising covalent catalytic DNA-intermediates of topoisomerases, do indeed immobilize the enzymes in living cells. Interestingly, these drugs do not target topoisomerases in the nucleoli but only in the nucleoplasm.
Diese Arbeit untersucht zelluläre Netzwerke mit dem Ziel, die so gewonnenen Einsichten medizinisch beziehungsweise biotechnologisch zu nutzen. Hierzu müssen zunächst Proteindomänen und wichtige regulatorische RNA Elemente erkannt werden. Dies geschieht für regulatorische Elemente in Nukleinsäuren am Beispiel von Iron Responsive Elements (IREs) in Staphylococcus aureus, wobei sich solche Elemente in viel versprechender Nähe zu exprimierten Sequenzen finden lassen (T. Dandekar, F. Du, H. Bertram (2001) Nonlinear Analysis 47(1): 225-34). Noch bedeutsamer als Ziele zur Medikamentenentwicklung gegen Parasiten sind Domänenunterschiede in Struktur und Sequenz bei Proteinen (T. Dandekar, F. Du, H. Bertram (2001) Nonlinear Analysis 47(1): 225-34). Ihre Identifikation wird am Beispiel eines potentiellen Transportproteins in Plasmodium falciparum exemplarisch dargestellt. Anschließend wird das Zusammenwirken von regulatorischen Elementen und Domänen in Netzwerken betrachtet (einschließlich experimenteller Daten). Dies kann einerseits zu allgemeineren Schlussfolgerungen über das Netzwerkverhalten führen, andererseits für konkrete Anwendungen genutzt werden. Als Beispiel wählten wir hier Redoxnetzwerke und die Bekämpfung von Plasmodien als Verursacher der Malaria. Da das gesamte Redoxnetzwerk einer lebenden Zelle mit Methoden der pH Wert Messung nur unzureichend zu erfassen ist, werden als alternative Messmethode für dieses Netzwerk Mikrokristalle der Glutathionreduktase als Indikatorsystem nach digitaler Verstärkung experimentell genutzt (H. Bertram, M. A. Keese, C. Boulin, R. H. Schirmer, R. Pepperkok, T. Dandekar (2002) Chemical Nanotechnology Talks III - Nano for Life Sciences). Um komplexe Redoxnetzwerke auch bioinformatisch zu modulieren, werden Verfahren der metabolischen Fluxanalyse vorgestellt und verbessert, um insbesondere ihrer Verzahnung besser gerecht zu werden und solche Netzwerke mit möglichst wenig elementaren Flussmoden zutreffend beschreiben zu können. Die Reduktion der Anzahl von Elementarmoden bei sehr großen metabolischen Netzwerken einer Zelle gelingt hier mit Hilfe unterschiedlicher Methoden und führt zu einer vereinfachten Darstellungsmöglichkeit komplexer Stoffwechselwege von Metaboliten. Dabei dient bei jeder dieser Methoden die biochemisch sinnvolle Definition von externen Metaboliten als Grundlage (T. Dandekar, F. Moldenhauer, S. Bulik, H. Bertram, S. Schuster (2003) Biosystems 70(3): 255-70). Allgemeiner werden Verfahren der Proteindomänenklassifikation sowie neue Strategien gegen mikrobielle Erreger betrachtet. In Bezug auf automatisierte Einteilung von Proteinen in Domänen wird ein neues System von Taylor (2002b) mit bekannten Systemen verglichen, die in unterschiedlichem Umfang menschlichen Eingriffs bedürfen (H. Bertram, T. Dandekar (2002) Chemtracts 15: 735-9). Außerdem wurde neben einer Arbeit über die verschiedenen Methoden aus den Daten eines Genoms Informationen über das metabolische Netzwerk der Zelle zu erlangen (H. Bertram, T. Dandekar (2004) it 46(1): 5-11) auch eine Übersicht über die Schwerpunkte der Bioinformatik in Würzburg zusammengestellt (H. Bertram, S. Balthasar, T. Dandekar (2003) Bioforum 1-2: 26-7). Schließlich wird beschrieben, wie die Pathogenomik und Virulenz von Bakterien der bioinformatischen Analyse zugänglich gemacht werden können (H. Bertram, S. Balthasar, T. Dandekar (2003) Bioforum Eur. 3: 157-9). Im letzten Teil wird die metabolische Fluxanalyse zur Identifikation neuer Strategien zur Bekämpfung von Plasmodien dargestellt: Beim Vergleich der Stoffwechselwege mit Glutathion und Thioredoxin in Plasmodium falciparum, Anopheles und Mensch geht es darum, gezielte Störungen im Stoffwechsel des Malariaerregers auszulösen und dabei den Wirt zu schonen. Es ergeben sich einige interessante Ansatzpunkte, deren medizinische Nutzung experimentell angestrebt werden kann.
Im Katabolismus methylverzweigter Fettsäuren spielt die alpha-Methylacyl-CoA-Racemase eine wichtige Rolle, indem sie die (R)- und (S)-Isomere von alpha-methylverzweigten Fettsäuren als Coenzym A Thioester racemisiert. Methylverzweigte Fettsäuren entstehen beim Abbau von Isoprenoiden und werden darüber hinaus auch von vielen Organismen, wie z.B. Mycobakterien, synthetisiert. Die Hauptaufgabe der Racemase ist aber vermutlich in der Biosynthese von Gallensäuren zu sehen. Das Ziel der vorliegenden Arbeit war es, die alpha-Methylacyl-CoA-Racemase aus humanem Gewebe zu reinigen und zu charakterisieren sowie ihre physiologische Rolle im Katabolismus verzweigtkettiger Fettsäuren und der Gallensäurebiosynthese zu untersuchen. Die alpha-Methylacyl-CoA-Racemase wurde aus humanem Gewebe zur Homogenität gereinigt, umfassend biochemisch charakterisiert und zur genauen molekularbiologischen Analyse in E.coli kloniert. Die Aktivität der Racemase wurde anhand der [³H]H2O-Freisetzung aus [alpha-³H]-a-Methylacyl-CoAs bestimmt. Die humane Racemase ist in der aktiven Form ein monomeres Protein und besteht aus 382 Aminosäuren. Als Substrate akzeptiert das Enzym ein breites Spektrum von alpha-Methylacyl-CoAs. Neben den Coenzym A-Thioestern alpha-methylverzweigter Fettsäuren, wie Pristansäure, werden auch CoA-Ester von Steroidderivaten, z.B. des Gallensäureintermediats Trihydroxycoprostansäure, und aromatischen Phenylpropionsäuren, wie dem Analgetikum Ibuprofen, umgesetzt. Freie Fettsäuren, geradkettige oder beta-methylverzweigte Acyl-CoAs werden nicht racemisiert. Die alpha-Methylacyl-CoA-Racemase ist im Menschen zu ca. 80 Prozent auf die Peroxisomen und ca. 20 Prozent auf die Mitochondrien verteilt, wobei entsprechende peroxisomale (PTS 1) und mitochondriale (MTS) Transportsignale die Lokalisation bestimmen. Die vollständige cDNA-Sequenz der humanen a-Methylacyl-CoA-Racemase hat eine Gesamtlänge von 2039 Basenpaaren mit einem offenen Leseraster von 89 - 1237 bp. Das Startcodon ATG ist in eine klassische Kozak-Sequenz zum Translationsstart eingebettet. Die Protein endet am C-Terminus mit dem Sequenzmotiv –KASL, das dem peroxisomalen Transportsignal (PTS I) einiger Säugetierkatalasen entspricht. Aufgrund alternativer Polyadenylierung sind in allen untersuchten menschlichen Geweben Transkripte von 1,6 kb bzw. 2,0 kb zu finden. Es liegt keine gewebsabhängige Polyadenylierung vor, die Racemase wird aber gewebsspezifisch exprimiert (besonders stark in Leber und Niere). Das humane Racemasegen liegt auf dem kurzen Arm des Chromosoms 5 nahe am Centromer (5p1.3), im Intervall von D5S651 (46,6 cM) und D5S634 (59.9 cM).