Refine
Has Fulltext
- yes (3)
Is part of the Bibliography
- yes (3)
Document Type
- Doctoral Thesis (3)
Language
- English (3)
Keywords
- Protein (3) (remove)
Institute
- Theodor-Boveri-Institut für Biowissenschaften (3) (remove)
The human genome has been sequenced since 2001. Most proteins have been characterized now and with everyday more bioinformatical predictions are experimentally verified. A project is underway to sequence thousand humans. But still, little is known about the evolution of the human proteome itself. Domains and their combinations are analysed in detail but not all of the human domain architectures at once. Like no one before, we have large datasets of high quality human protein-protein-protein interactions and complexes available which allow us to characterize the human proteome with unmatched accuracy. Advanced clustering algorithms and computing power enable us to gain new information about protein interactions without touching a pipette. In this work, the human proteome is analysed at three different levels. First, the origin of the different types of proteins was analysed based on their domain architectures. The second part focuses on the protein-protein interactions. Finally, in the third part, proteins are clustered based on their interactions and non-interactions. Most proteins are built of domains and their function is the sum of their domain functions. Proteins that share the same domain architecture, the linear order of domains are homologues and should have originated from one common ancestral protein. This ancestor was calculated for roughly 750 000 proteins from 1313 species. The relations between the species are based on the NCBI Taxonomy and additional molecular data. The resulting data set of 5817 domains and 32868 domain architectures was used to estimate the origin of these proteins based on their architectures. It could be observed, that new domain architectures are only in a small fraction composed of domains arisen at the same taxon. It was also found that domain architectures increase in length and complexity in the course of evolution and that different organisms like worm, and human share nearly the same amount of proteins but differ in their number of distinct domain architectures. The second part of this thesis focuses on protein-protein interactions. This chapter addresses the question how new evolved proteins form connections within the existing network. The network built of protein-protein interactions was shown to be scale free. Scale free networks, like the internet, consist of few hubs with many connections and many nodes with few connections. They are thought to arise by two mechanisms. First, newly emerged proteins interact with proteins of the network. Second, according to the theory of preferential attachment, new proteins have a higher chance to interact with already interaction rich proteins. The Human Protein Reference Database provides an on in-vivo interaction data based network for human. With the data obtained from chapter one, proteins were marked with their taxon of origin based on their domain architectures. The interaction ratio of proteins of the same taxa compared to all interactions was calculated and higher values than the random model showed for nearly every taxa. On the other hand, there was no enrichment of proteins originated at the taxon of cellular organisms for the node degree found. The node degree is the number of links for this node. According to the theorie of preferential attachment the oldest nodes should have the most interactions and newly arisen proteins should be preferably attached to them not together. Both could not be shown in this analysis, preferential attachment could therefore not be the only explanation for the forming of the human protein interaction network. Finally in part three, proteins and all their interactions in the network are analysed. Protein networks can be divided into smaller highly interacting parts carrying out specific functions. This can be done with high statistical significance but still, it does not reflect the biological significance. Proteins were clustered based on their interactions and non-interactions with other proteins. A version with eleven clusters showed high gene ontology based ratings and clusters related to specific cell parts. One cluster consists of proteins having very few interactions together but many to proteins of two other clusters. This first cluster is significantly enriched with transport proteins and the two others are enriched with extracellular and cytoplasm/membrane located proteins. The algorithm seems therefore well suited to reflect the biological importance behind functional modules. Although we are still far from understanding the origin of species, this work has significantly contributed to a better understanding of evolution at the protein level and has, in particular, shown the relation of protein domains and protein architectures and their preferences for binding partners within interaction networks.
Insights into the evolution of protein domains give rise to improvements of function prediction
(2005)
The growing number of uncharacterised sequences in public databases has turned the prediction of protein function into a challenging research field. Traditional annotation methods are often error-prone due to the small subset of proteins with experimentally verified function. Goal of this thesis was to analyse the function and evolution of protein domains in order to understand molecular processes in the cell. The focus was on signalling domains of little understood function, as well as on functional sites of protein domains in general. Glucosaminidases (GlcNAcases) represent key enzymes in signal transduction pathways. Together with glucosamine transferases, they serve as molecular switches, similar to kinases and phosphatases. Little was known about the molecular function and structure of the GlcNAcases. In this thesis, the GlcNAcases were identified as remote homologues of N-acetyltransferases. By comparing the homologous sequences, I was able to predict functional sites of the GlcNAcase family and to identify the GlcNAcases as the first family member of the acetyltransferase superfamily with a distinct catalytic mechanism, which is not involved in the transfer of acetyl groups. In a similar approach, the sensor domain of a plant hormone receptor was studied. I was able to predict putative ligand-binding sites by comparing evolutionary constraints in functionally diverged subfamilies. Most of the putative ligand-binding sites have been experimentally confirmed in the meantime. Due to the importance of enzymes involved in cellular signalling, it seems impossible to find substitutions of catalytic amino acids that turn them catalytically inactive. Nevertheless, by scanning catalytic positions of the protein tyrosine phosphatase families, I found many inactive domains among single domain and tandem domain phosphatases in metazoan proteomes. In addition, I found that inactive phosphatases are conserved throughout evolution, which led to the question about the function of these catalytically inactive phosphatase domains. An analysis of evolutionary site rates of amino acid substitutions revealed a cluster of conserved residues in the apparently redundant domain of tandem phosphatases. This putative regulatory center might be responsible for the experimentally verified dimerization of the active and inactive domain in order to control the catalytic activity of the active phosphatase domain. Moreover, I detected a subgroup of inactive phosphatases, which presumably functions in substrate recognition, based on different evolutionary site rates within the phosphatase family. The characterization of these new regulatory modules in the phosphatase family raised the question whether inactivation of enzymes is a more general evolutionary mechanism to enlarge signalling pathways and whether inactive domains are also found in other enzyme families. A large-scale analysis of substitutions at catalytic positions of enzymatic domains was performed in this work. I identified many domains with inactivating substitutions in various enzyme families. Signalling domains harbour a particular high occurrence of catalytically inactive domains indicating that these domains have evolved to modulate existing regulatory pathways. Furthermore, it was shown that inactivation of enzymes by single substitutions happened multiple times independently in evolution. The surprising variability of amino acids at catalytic positions was decisive for a subsequent analysis of the diversity of functional sites in general. Using functional residues extracted from structural complexes I could show that functional sites of protein domains do not only vary in their type of amino acid but also in their structural location within the domain. In the process of evolution, protein domains have arisen from duplication events and subsequently adapted to new binding partners and developed new functions, which is reflected in the high variability of functional sites. However, great differences exist between domain families. The analysis demonstrated that functional sites of nuclear domains are more conserved than functional sites of extracellular domains. Furthermore, the type of ligand influences the degree of conservation, for example ion binding sites are more conserved than peptide binding sites. The work presented in this thesis has led to the detection of functional sites in various protein domains involved in signalling pathways and it has resulted in insights into the molecular function of those domains. In addition, properties of functional sites of protein domains were revealed. This knowledge can be used in the future to improve the prediction of protein function and to identify functional sites of proteins.
Due to the earth´s rotation around itself and the sun, rhythmic daily and seasonal changes in illumination, temperature and many other environmental factors occur. Adaptation to these environmental rhythms presents a considerable advantage to survival. Thus, almost all living beings have developed a mechanism to time their behavior in accordance. This mechanism is the endogenous clock. If it fulfills the criteria of (1) entraining to zeitgebers (2) free-running behavior with a period of ~ 24 hours (3) temperature compensation, it is also referred to as “circadian clock”. Well-timed behavior is crucial for eusocial insects, which divide their tasks among different behavioral castes and need to respond to changes in the environment quickly and in an orchestrated fashion. Circadian rhythms have thus been studied and observed in many eusocial species, from ants to bees. The underlying mechanism of this clock is a molecular feedback loop that generates rhythmic changes in gene expression and protein levels with a phase length of approximately 24 hours. The properties of this feedback loop are well characterized in many insects, from the fruit fly Drosophila melanogaster, to the honeybee Apis mellifera. Though the basic principles and components of this loop are seem similar at first glance, there are important differences between the Drosophila feedback loop and that of hymenopteran insects, whose loop resembles the mammalian clock loop. The protein PERIOD (PER) is thought to be a part of the negative limb of the hymenopteran clock, partnering with CRYPTOCHROME (CRY). The anatomical location of the clock-related neurons and the PDF-network (a putative in- and output mediator of the clock) is also well characterized in Drosophila, the eusocial honeybee as well as the nocturnal cockroach Leucophea maderae. The circadian behavior, anatomy of the clock and its molecular underpinnings were studied in the carpenter ant Camponotus floridanus, a eusocial insect Locomotor activity recordings in social isolation proved that the majority of ants could entrain to different LD cycles, free-ran in constant darkness and had a temperature-compensated clock with a period slightly shorter than 24 hours. Most individuals proved to be nocturnal, but different types of activity like diurnality, crepuscularity, rhythmic activity during both phases of the LD, or arrhythmicity were also observed. The LD cycle had a slight influence on the distribution of these activities among individuals, with more diurnal ants at shorter light phases. The PDF-network of C. floridanus was revealed with the anti-PDH antibody, and partly resembled that of other eusocial or nocturnal insects. A comparison of minor and major worker brains, only revealed slight differences in the number of somata and fibers crossing the posterior midline. All in all, most PDF-structures that are conserved in other insects where found, with numerous fibers in the optic lobes, a putative accessory medulla, somata located near the proximal medulla and many fibers in the protocerebrum. A putative connection between the mushroom bodies, the optic lobes and the antennal lobes was found, indicating an influence of the clock on olfactory learning. Lastly, the location and intensity of PER-positive cell bodies at different times of a 24 hour day was established with an antibody raised against Apis mellifera PER. Four distinct clusters, which resemble those found in A. mellifera, were detected. The clusters could be grouped in dorsal and lateral neurons, and the PER-levels cycled in all examined clusters with peaks around lights on and lowest levels after lights off.
In summary, first data on circadian behavior and the anatomy and workings of the clock of C. floridanus was obtained. Firstly, it´s behavior fulfills all criteria for the presence of a circadian clock. Secondly, the PDF-network is very similar to those of other insects. Lastly, the location of the PER cell bodies seems conserved among hymenoptera. Cycling of PER levels within 24 hours confirms the suspicion of its role in the circadian feedback loop.