OPUS Würzburg

The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible (2017)

Szklarczyk, Damian ; Morris, John H. ; Cook, Helen ; Kuhn, Michael ; Wyder, Stefan ; Simonovic, Milan ; Santos, Aalberto ; Doncheva, Nadezhda T. ; Roth, Alexander ; Bork, Peer ; Jensen, Lars J. ; von Mering, Christian

A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein–protein association data for a large number of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein–protein interactions, and importing known pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer of interaction knowledge between organisms based on gene orthology. In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework. Further improvements include automated background analysis of user inputs for functional enrichments, and streamlined download options. The STRING resource is available online, at http://string-db.org/.

Coupling proteomics and metabolomics for the unsupervised identification of protein–metabolite interactions in Chaetomium thermophilum (2021)

Li, Yuanyue ; Kuhn, Michael ; Zukowska-Kasprzyk, Joanna ; Hennrich, Marco L. ; Kastritis, Panagiotis L. ; O'Reilly, Francis J. ; Phapale, Prasad ; Beck, Martin ; Gavin, Anne-Claude ; Bork, Peer

Protein–metabolite interactions play an important role in the cell’s metabolism and many methods have been developed to screen them in vitro. However, few methods can be applied at a large scale and not alter biological state. Here we describe a proteometabolomic approach, using chromatography to generate cell fractions which are then analyzed with mass spectrometry for both protein and metabolite identification. Integrating the proteomic and metabolomic analyses makes it possible to identify protein-bound metabolites. Applying the concept to the thermophilic fungus Chaetomium thermophilum, we predict 461 likely protein-metabolite interactions, most of them novel. As a proof of principle, we experimentally validate a predicted interaction between the ribosome and isopentenyl adenine.

Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation (2021)

Letunic, Ivica ; Bork, Peer

The Interactive Tree Of Life (https://itol.embl.de) is an online tool for the display, manipulation and annotation of phylogenetic and other trees. It is freely available and open to everyone. iTOL version 5 introduces a completely new tree display engine, together with numerous new features. For example, a new dataset type has been added (MEME motifs), while annotation options have been expanded for several existing ones. Node metadata display options have been extended and now also support non-numerical categorical values, as well as multiple values per node. Direct manual annotation is now available, providing a set of basic drawing and labeling tools, allowing users to draw shapes, labels and other features by hand directly onto the trees. Support for tree and dataset scales has been extended, providing fine control over line and label styles. Unrooted tree displays can now use the equal-daylight algorithm, proving a much greater display clarity. The user account system has been streamlined and expanded with new navigation options and currently handles >1 million trees from >70 000 individual users.

Newly designed 16S rRNA metabarcoding primers amplify diverse and novel archaeal taxa from the environment (2019)

Bahram, Mohammad ; Anslan, Sten ; Hildebrand, Falk ; Bork, Peer ; Tedersoo, Leho

High-throughput studies of microbial communities suggest that Archaea are a widespread component of microbial diversity in various ecosystems. However, proper quantification of archaeal diversity and community ecology remains limited, as sequence coverage of Archaea is usually low owing to the inability of available prokaryotic primers to efficiently amplify archaeal compared to bacterial rRNA genes. To improve identification and quantification of Archaea, we designed and validated the utility of several primer pairs to efficiently amplify archaeal 16S rRNA genes based on up-to-date reference genes. We demonstrate that several of these primer pairs amplify phylogenetically diverse Archaea with high sequencing coverage, outperforming commonly used primers. Based on comparing the resulting long 16S rRNA gene fragments with public databases from all habitats, we found several novel family- to phylum-level archaeal taxa from topsoil and surface water. Our results suggest that archaeal diversity has been largely overlooked due to the limitations of available primers, and that improved primer pairs enable to estimate archaeal diversity more accurately.

OGEE v2: an update of the online gene essentiality database with special focus on differentially essential genes in human cancer cell lines (2017)

Chen, Wei-Hua ; Lu, Guanting ; Chen, Xiao ; Zhao, Xing-Ming ; Bork, Peer

OGEE is an Online GEne Essentiality database. To enhance our understanding of the essentiality of genes, in OGEE we collected experimentally tested essential and non-essential genes, as well as associated gene properties known to contribute to gene essentiality. We focus on large-scale experiments, and complement our data with text-mining results. We organized tested genes into data sets according to their sources, and tagged those with variable essentiality statuses across data sets as conditionally essential genes, intending to highlight the complex interplay between gene functions and environments/experimental perturbations. Developments since the last public release include increased number of species and gene essentiality data sets, inclusion of non-coding essential sequences and genes with intermediate essentiality statuses. In addition, we included 16 essentiality data sets from cancer cell lines, corresponding to 9 human cancers; with OGEE, users can easily explore the shared and differentially essential genes within and between cancer types. These genes, especially those derived from cell lines that are similar to tumor samples, could reveal the oncogenic drivers, paralogous gene expression pattern and chromosomal structure of the corresponding cancer types, and can be further screened to identify targets for cancer therapy and/or new drug development. OGEE is freely available at http://ogee.medgenius.info.

Pervasive Protein Thermal Stability Variation during the Cell Cycle (2018)

Becher, Isabelle ; Andrés-Pons, Amparo ; Romanov, Natalie ; Stein, Frank ; Schramm, Maike ; Baudin, Florence ; Helm, Dominic ; Kurzawa, Nils ; Mateus, André ; Mackmull, Marie-Therese ; Typas, Athanasios ; Müller, Christoph W. ; Bork, Peer ; Beck, Martin ; Savitski, Mikhail M.

Quantitative mass spectrometry has established proteome-wide regulation of protein abundance and post-translational modifications in various biological processes. Here, we used quantitative mass spectrometry to systematically analyze the thermal stability and solubility of proteins on a proteome-wide scale during the eukaryotic cell cycle. We demonstrate pervasive variation of these biophysical parameters with most changes occurring in mitosis and G1. Various cellular pathways and components vary in thermal stability, such as cell-cycle factors, polymerases, and chromatin remodelers. We demonstrate that protein thermal stability serves as a proxy for enzyme activity, DNA binding, and complex formation in situ. Strikingly, a large cohort of intrinsically disordered and mitotically phosphorylated proteins is stabilized and solubilized in mitosis, suggesting a fundamental remodeling of the biophysical environment of the mitotic cell. Our data represent a rich resource for cell, structural, and systems biologists interested in proteome regulation during biological transitions.

Microbial abundance, activity and population genomic profiling with mOTUs2 (2019)

Milanese, Alessio ; Mende, Daniel R ; Paoli, Lucas ; Salazar, Guillem ; Ruscheweyh, Hans-Joachim ; Cuenca, Miguelangel ; Hingamp, Pascal ; Alves, Renato ; Costea, Paul I ; Coelho, Luis Pedro ; Schmidt, Thomas S. B. ; Almeida, Alexandre ; Mitchell, Alex L ; Finn, Robert D. ; Huerta-Cepas, Jaime ; Bork, Peer ; Zeller, Georg ; Sunagawa, Shinichi

Metagenomic sequencing has greatly improved our ability to profile the composition of environmental and host-associated microbial communities. However, the dependency of most methods on reference genomes, which are currently unavailable for a substantial fraction of microbial species, introduces estimation biases. We present an updated and functionally extended tool based on universal (i.e., reference-independent), phylogenetic marker gene (MG)-based operational taxonomic units (mOTUs) enabling the profiling of >7700 microbial species. As more than 30% of them could not previously be quantified at this taxonomic resolution, relative abundance estimates based on mOTUs are more accurate compared to other methods. As a new feature, we show that mOTUs, which are based on essential housekeeping genes, are demonstrably well-suited for quantification of basal transcriptional activity of community members. Furthermore, single nucleotide variation profiles estimated using mOTUs reflect those from whole genomes, which allows for comparing microbial strain populations (e.g., across different human body sites).

Cell-specific proteome analyses of human bone marrow reveal molecular features of age-dependent functional decline (2018)

Hennrich, Marco L. ; Romanov, Natalie ; Horn, Patrick ; Jaeger, Samira ; Eckstein, Volker ; Steeples, Violetta ; Ye, Fei ; Ding, Ximing ; Poisa-Beiro, Laura ; Mang, Ching Lai ; Lang, Benjamin ; Boultwood, Jacqueline ; Luft, Thomas ; Zaugg, Judith B. ; Pellagatti, Andrea ; Bork, Peer ; Aloy, Patrick ; Gavin, Anne-Claude ; Ho, Anthony D.

Diminishing potential to replace damaged tissues is a hallmark for ageing of somatic stem cells, but the mechanisms remain elusive. Here, we present proteome-wide atlases of age-associated alterations in human haematopoietic stem and progenitor cells (HPCs) and five other cell populations that constitute the bone marrow niche. For each, the abundance of a large fraction of the ~12,000 proteins identified is assessed in 59 human subjects from different ages. As the HPCs become older, pathways in central carbon metabolism exhibit features reminiscent of the Warburg effect, where glycolytic intermediates are rerouted towards anabolism. Simultaneously, altered abundance of early regulators of HPC differentiation reveals a reduced functionality and a bias towards myeloid differentiation. Ageing causes alterations in the bone marrow niche too, and diminishes the functionality of the pathways involved in HPC homing. The data represent a valuable resource for further analyses, and for validation of knowledge gained from animal models.

SMART: recent updates, new developments and status in 2020 (2021)

Letunic, Ivica ; Khedkar, Supriya ; Bork, Peer

SMART (Simple Modular Architecture Research Tool) is a web resource (https://smart.embl.de) for the identification and annotation of protein domains and the analysis of protein domain architectures. SMART version 9 contains manually curatedmodels formore than 1300 protein domains, with a topical set of 68 new models added since our last update article (1). All the new models are for diverse recombinase families and subfamilies and as a set they provide a comprehensive overview of mobile element recombinases namely transposase, integrase, relaxase, resolvase, cas1 casposase and Xer like cellular recombinase. Further updates include the synchronization of the underlying protein databases with UniProt (2), Ensembl (3) and STRING (4), greatly increasing the total number of annotated domains and other protein features available in architecture analysis mode. Furthermore, SMART's vector-based protein display engine has been extended and updated to use the latest web technologies and the domain architecture analysis components have been optimized to handle the increased number of protein features available.

A global ocean atlas of eukaryotic gene (2018)

While our knowledge about the roles of microbes and viruses in the ocean has increased tremendously due to recent advances in genomics and metagenomics, research on marine microbial eukaryotes and zooplankton has benefited much less from these new technologies because of their larger genomes, their enormous diversity, and largely unexplored physiologies. Here, we use a metatranscriptomics approach to capture expressed genes in open ocean Tara Oceans stations across four organismal size fractions. The individual sequence reads cluster into 116 million unigenes representing the largest reference collection of eukaryotic transcripts from any single biome. The catalog is used to unveil functions expressed by eukaryotic marine plankton, and to assess their functional biogeography. Almost half of the sequences have no similarity with known proteins, and a great number belong to new gene families with a restricted distribution in the ocean. Overall, the resource provides the foundations for exploring the roles of marine eukaryotes in ocean ecology and biogeochemistry.

Refine

Has Fulltext

Is part of the Bibliography

Year of publication

Document Type

Language

Keywords

Author

Institute

EU-Project number / Contract (GA) number

19 search hits