TY - JOUR A1 - Caliskan, Aylin A1 - Dangwal, Seema A1 - Dandekar, Thomas T1 - Metadata integrity in bioinformatics: bridging the gap between data and knowledge JF - Computational and Structural Biotechnology Journal N2 - In the fast-evolving landscape of biomedical research, the emergence of big data has presented researchers with extraordinary opportunities to explore biological complexities. In biomedical research, big data imply also a big responsibility. This is not only due to genomics data being sensitive information but also due to genomics data being shared and re-analysed among the scientific community. This saves valuable resources and can even help to find new insights in silico. To fully use these opportunities, detailed and correct metadata are imperative. This includes not only the availability of metadata but also their correctness. Metadata integrity serves as a fundamental determinant of research credibility, supporting the reliability and reproducibility of data-driven findings. Ensuring metadata availability, curation, and accuracy are therefore essential for bioinformatic research. Not only must metadata be readily available, but they must also be meticulously curated and ideally error-free. Motivated by an accidental discovery of a critical metadata error in patient data published in two high-impact journals, we aim to raise awareness for the need of correct, complete, and curated metadata. We describe how the metadata error was found, addressed, and present examples for metadata-related challenges in omics research, along with supporting measures, including tools for checking metadata and software to facilitate various steps from data analysis to published research. Highlights • Data awareness and data integrity underpins the trustworthiness of results and subsequent further analysis. • Big data and bioinformatics enable efficient resource use by repurposing publicly available RNA-Sequencing data. • Manual checks of data quality and integrity are insufficient due to the overwhelming volume and rapidly growing data. • Automation and artificial intelligence provide cost-effective and efficient solutions for data integrity and quality checks. • FAIR data management, various software solutions and analysis tools assist metadata maintenance. KW - meta-data KW - error KW - annotation KW - error-transfer KW - wrong labelling KW - patient data KW - control group KW - tools overview Y1 - 2023 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-349990 SN - 2001-0370 VL - 21 ER - TY - JOUR A1 - Gupta, Shishir K. A1 - Srivastava, Mugdha A1 - Osmanoglu, Oezge A1 - Dandekar, Thomas T1 - Genome-wide inference of the Camponotus floridanus protein-protein interaction network using homologous mapping and interacting domain profile pairs JF - Scientific Reports N2 - Apart from some model organisms, the interactome of most organisms is largely unidentified. High-throughput experimental techniques to determine protein-protein interactions (PPIs) are resource intensive and highly susceptible to noise. Computational methods of PPI determination can accelerate biological discovery by identifying the most promising interacting pairs of proteins and by assessing the reliability of identified PPIs. Here we present a first in-depth study describing a global view of the ant Camponotus floridanus interactome. Although several ant genomes have been sequenced in the last eight years, studies exploring and investigating PPIs in ants are lacking. Our study attempts to fill this gap and the presented interactome will also serve as a template for determining PPIs in other ants in future. Our C. floridanus interactome covers 51,866 non-redundant PPIs among 6,274 proteins, including 20,544 interactions supported by domain-domain interactions (DDIs), 13,640 interactions supported by DDIs and subcellular localization, and 10,834 high confidence interactions mediated by 3,289 proteins. These interactions involve and cover 30.6% of the entire C. floridanus proteome. KW - interaction map KW - drosophila KW - identification KW - evolutionary KW - reliability KW - annotation KW - database KW - target KW - cycle Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-229406 VL - 10 IS - 1 ER - TY - JOUR A1 - Horn, Hannes A1 - Keller, Alexander A1 - Hildebrandt, Ulrich A1 - Kämpfer, Peter A1 - Riederer, Markus A1 - Hentschel, Ute T1 - Draft genome of the \(Arabidopsis\) \(thaliana\) phyllosphere bacterium, \(Williamsia\) sp. ARP1 JF - Standards in Genomic Sciences N2 - The Gram-positive actinomycete \(Williamsia\) sp. ARP1 was originally isolated from the \(Arabidopsis\) \(thaliana\) phyllosphere. Here we describe the general physiological features of this microorganism together with the draft genome sequence and annotation. The 4,745,080 bp long genome contains 4434 protein-coding genes and 70 RNA genes. To our knowledge, this is only the second reported genome from the genus \(Williamsia\) and the first sequenced strain from the phyllosphere. The presented genomic information is interpreted in the context of an adaptation to the phyllosphere habitat. KW - arabidopsis thaliana KW - whole genome sequencing KW - adaption KW - Williamsia sp. ARP1 KW - phyllosphere KW - draft genome KW - next generation sequencing KW - assembly KW - annotation Y1 - 2016 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-146008 VL - 11 IS - 8 ER -