Refine
Has Fulltext
- yes (39)
Is part of the Bibliography
- yes (39)
Document Type
- Journal article (39) (remove)
Keywords
- machine learning (5)
- active zone (4)
- Caenorhabditis elegans (2)
- bee decline (2)
- dSTORM (2)
- deep learning (2)
- evolution (2)
- foraging (2)
- juvenile hormone (2)
- nutrition (2)
Institute
- Center for Computational and Theoretical Biology (39) (remove)
EU-Project number / Contract (GA) number
- 2020010013 (1)
- 250194-Carnivorom (1)
- 835102) (1)
Although the concept of botanical carnivory has been known since Darwin's time, the molecular mechanisms that allow animal feeding remain unknown, primarily due to a complete lack of genomic information. Here, we show that the transcriptomic landscape of the Dionaea trap is dramatically shifted toward signal transduction and nutrient transport upon insect feeding, with touch hormone signaling and protein secretion prevailing. At the same time, a massive induction of general defense responses is accompanied by the repression of cell death-related genes/processes. We hypothesize that the carnivory syndrome of Dionaea evolved by exaptation of ancient defense pathways, replacing cell death with nutrient acquisition.
Automatic image reconstruction is critical to cope with steadily increasing data from advanced microscopy. We describe here the Fiji macro 3D ART VeSElecT which we developed to study synaptic vesicles in electron tomograms. We apply this tool to quantify vesicle properties (i) in embryonic Danio rerio 4 and 8 days past fertilization (dpf) and (ii) to compare Caenorhabditis elegans N2 neuromuscular junctions (NMJ) wild-type and its septin mutant (unc-59(e261)). We demonstrate development-specific and mutant-specific changes in synaptic vesicle pools in both models. We confirm the functionality of our macro by applying our 3D ART VeSElecT on zebrafish NMJ showing smaller vesicles in 8 dpf embryos then 4 dpf, which was validated by manual reconstruction of the vesicle pool. Furthermore, we analyze the impact of C. elegans septin mutant unc-59(e261) on vesicle pool formation and vesicle size. Automated vesicle registration and characterization was implemented in Fiji as two macros (registration and measurement). This flexible arrangement allows in particular reducing false positives by an optional manual revision step. Preprocessing and contrast enhancement work on image-stacks of 1nm/pixel in x and y direction. Semi-automated cell selection was integrated. 3D ART VeSElecT removes interfering components, detects vesicles by 3D segmentation and calculates vesicle volume and diameter (spherical approximation, inner/outer diameter). Results are collected in color using the RoiManager plugin including the possibility of manual removal of non-matching confounder vesicles. Detailed evaluation considered performance (detected vesicles) and specificity (true vesicles) as well as precision and recall. We furthermore show gain in segmentation and morphological filtering compared to learning based methods and a large time gain compared to manual segmentation. 3D ART VeSElecT shows small error rates and its speed gain can be up to 68 times faster in comparison to manual annotation. Both automatic and semi-automatic modes are explained including a tutorial.
Experimental high-throughput analysis of molecular networks is a central approach to characterize the adaptation of plant metabolism to the environment. However, recent studies have demonstrated that it is hardly possible to predict in situ metabolic phenotypes from experiments under controlled conditions, such as growth chambers or greenhouses. This is particularly due to the high molecular variance of in situ samples induced by environmental fluctuations. An approach of functional metabolome interpretation of field samples would be desirable in order to be able to identify and trace back the impact of environmental changes on plant metabolism. To test the applicability of metabolomics studies for a characterization of plant populations in the field, we have identified and analyzed in situ samples of nearby grown natural populations of Arabidopsis thaliana in Austria. A. thaliana is the primary molecular biological model system in plant biology with one of the best functionally annotated genomes representing a reference system for all other plant genome projects. The genomes of these novel natural populations were sequenced and phylogenetically compared to a comprehensive genome database of A. thaliana ecotypes. Experimental results on primary and secondary metabolite profiling and genotypic variation were functionally integrated by a data mining strategy, which combines statistical output of metabolomics data with genome-derived biochemical pathway reconstruction and metabolic modeling. Correlations of biochemical model predictions and population-specific genetic variation indicated varying strategies of metabolic regulation on a population level which enabled the direct comparison, differentiation, and prediction of metabolic adaptation of the same species to different habitats. These differences were most pronounced at organic and amino acid metabolism as well as at the interface of primary and secondary metabolism and allowed for the direct classification of population-specific metabolic phenotypes within geographically contiguous sampling sites.
Although many genes have been identified using high throughput technologies in endometriosis (ES), only a small number of individual genes have been analyzed functionally. This is due to the complexity of the disease that has different stages and is affected by various genetic and environmental factors. Many genes are upregulated or downregulated at each stage of the disease, thus making it difficult to identify key genes. In addition, little is known about the differences between the different stages of the disease. We assumed that the study of the identified genes in ES at a system-level can help to better understand the molecular mechanism of the disease at different stages of the development. We used publicly available microarray data containing archived endometrial samples from women with minimal/mild endometriosis (MMES), mild/severe endometriosis (MSES) and without endometriosis. Using weighted gene co-expression analysis (WGCNA), functional modules were derived from normal endometrium (NEM) as the reference sample. Subsequently, we tested whether the topology or connectivity pattern of the modules was preserved in MMES and/or MSES. Common and specific hub genes were identified in non-preserved modules. Accordingly, hub genes were detected in the non-preserved modules at each stage. We identified sixteen co-expression modules. Of the 16 modules, nine were non-preserved in both MMES and MSES whereas five were preserved in NEM, MMES, and MSES. Importantly, two non-preserved modules were found in either MMES or MSES, highlighting differences between the two stages of the disease. Analyzing the hub genes in the non-preserved modules showed that they mostly lost or gained their centrality in NEM after developing the disease into MMES and MSES. The same scenario was observed, when the severeness of the disease switched from MMES to MSES. Interestingly, the expression analysis of the new selected gene candidates including CC2D2A, AEBP1, HOXB6, IER3, and STX18 as well as IGF-1, CYP11A1 and MMP-2 could validate such shifts between different stages. The overrepresented gene ontology (GO) terms were enriched in specific modules, such as genetic disposition, estrogen dependence, progesterone resistance and inflammation, which are known as endometriosis hallmarks. Some modules uncovered novel co-expressed gene clusters that were not previously discovered.
The abundance of high-quality genotype and phenotype data for the model organism Arabidopsis thaliana enables scientists to study the genetic architecture of many complex traits at an unprecedented level of detail using genome-wide association studies (GWAS). GWAS have been a great success in A. thaliana and many SNP-trait associations have been published. With the AraGWAS Catalog (https://aragwas.1001genomes.org) we provide a publicly available, manually curated and standardized GWAS catalog for all publicly available phenotypes from the central A. thaliana phenotype repository, AraPheno. All GWAS have been recomputed on the latest imputed genotype release of the 1001 Genomes Consortium using a standardized GWAS pipeline to ensure comparability between results. The catalog includes currently 167 phenotypes and more than 222 000 SNP-trait associations with P < 10\(^{-4}\), of which 3887 are significantly associated using permutation-based thresholds. The AraGWAS Catalog can be accessed via a modern web-interface and provides various features to easily access, download and visualize the results and summary statistics across GWAS.
Synaptic vesicles (SVs) are a key component of neuronal signaling and fulfil different roles depending on their composition. In electron micrograms of neurites, two types of vesicles can be distinguished by morphological criteria, the classical “clear core” vesicles (CCV) and the typically larger “dense core” vesicles (DCV), with differences in electron density due to their diverse cargos. Compared to CCVs, the precise function of DCVs is less defined. DCVs are known to store neuropeptides, which function as neuronal messengers and modulators [1]. In C. elegans, they play a role in locomotion, dauer formation, egg-laying, and mechano- and chemosensation [2]. Another type of DCVs, also referred to as granulated vesicles, are known to transport Bassoon, Piccolo and further constituents of the presynaptic density in the center of the active zone (AZ), and therefore are important for synaptogenesis [3].
To better understand the role of different types of SVs, we present here a new automated approach to classify vesicles. We combine machine learning with an extension of our previously developed vesicle segmentation workflow, the ImageJ macro 3D ART VeSElecT. With that we reliably distinguish CCVs and DCVs in electron tomograms of C. elegans NMJs using image-based features. Analysis of the underlying ground truth data shows an increased fraction of DCVs as well as a higher mean distance between DCVs and AZs in dauer larvae compared to young adult hermaphrodites. Our machine learning based tools are adaptable and can be applied to study properties of different synaptic vesicle pools in electron tomograms of diverse model organisms.
Understanding extinction debts: spatio-temporal scales, mechanisms and a roadmap for future research
(2019)
Extinction debt refers to delayed species extinctions expected as a consequence of ecosystem perturbation. Quantifying such extinctions and investigating long‐term consequences of perturbations has proven challenging, because perturbations are not isolated and occur across various spatial and temporal scales, from local habitat losses to global warming. Additionally, the relative importance of eco‐evolutionary processes varies across scales, because levels of ecological organization, i.e. individuals, (meta)populations and (meta)communities, respond hierarchically to perturbations. To summarize our current knowledge of the scales and mechanisms influencing extinction debts, we reviewed recent empirical, theoretical and methodological studies addressing either the spatio–temporal scales of extinction debts or the eco‐evolutionary mechanisms delaying extinctions. Extinction debts were detected across a range of ecosystems and taxonomic groups, with estimates ranging from 9 to 90% of current species richness. The duration over which debts have been sustained varies from 5 to 570 yr, and projections of the total period required to settle a debt can extend to 1000 yr. Reported causes of delayed extinctions are 1) life‐history traits that prolong individual survival, and 2) population and metapopulation dynamics that maintain populations under deteriorated conditions. Other potential factors that may extend survival time such as microevolutionary dynamics, or delayed extinctions of interaction partners, have rarely been analyzed. Therefore, we propose a roadmap for future research with three key avenues: 1) the microevolutionary dynamics of extinction processes, 2) the disjunctive loss of interacting species and 3) the impact of multiple regimes of perturbation on the payment of debts. For their ability to integrate processes occurring at different levels of ecological organization, we highlight mechanistic simulation models as tools to address these knowledge gaps and to deepen our understanding of extinction dynamics.
The prediction of breeding values and phenotypes is of central importance for both livestock and crop breeding. In this study, we analyze the use of artificial neural networks (ANN) and, in particular, local convolutional neural networks (LCNN) for genomic prediction, as a region-specific filter corresponds much better with our prior genetic knowledge on the genetic architecture of traits than traditional convolutional neural networks. Model performances are evaluated on a simulated maize data panel (n = 10,000; p = 34,595) and real Arabidopsis data (n = 2,039; p = 180,000) for a variety of traits based on their predictive ability. The baseline LCNN, containing one local convolutional layer (kernel size: 10) and two fully connected layers with 64 nodes each, is outperforming commonly proposed ANNs (multi layer perceptrons and convolutional neural networks) for basically all considered traits. For traits with high heritability and large training population as present in the simulated data, LCNN are even outperforming state-of-the-art methods like genomic best linear unbiased prediction (GBLUP), Bayesian models and extended GBLUP, indicated by an increase in predictive ability of up to 24%. However, for small training populations, these state-of-the-art methods outperform all considered ANNs. Nevertheless, the LCNN still outperforms all other considered ANNs by around 10%. Minor improvements to the tested baseline network architecture of the LCNN were obtained by increasing the kernel size and of reducing the stride, whereas the number of subsequent fully connected layers and their node sizes had neglectable impact. Although gains in predictive ability were obtained for large scale data sets by using LCNNs, the practical use of ANNs comes with additional problems, such as the need of genotyping all considered individuals, the lack of estimation of heritability and reliability. Furthermore, breeding values are additive by design, whereas ANN-based estimates are not. However, ANNs also comes with new opportunities, as networks can easily be extended to account for additional inputs (omics, weather etc.) and outputs (multi-trait models), and computing time increases linearly with the number of individuals. With advances in high-throughput phenotyping and cheaper genotyping, ANNs can become a valid alternative for genomic prediction.
White Blood Cell (WBC) Leukaemia is caused by excessive production of leukocytes in the bone marrow, and image-based detection of malignant WBCs is important for its detection. Convolutional Neural Networks (CNNs) present the current state-of-the-art for this type of image classification, but their computational cost for training and deployment can be high. We here present an improved hybrid approach for efficient classification of WBC Leukemia. We first extract features from WBC images using VGGNet, a powerful CNN architecture, pre-trained on ImageNet. The extracted features are then filtered using a statistically enhanced Salp Swarm Algorithm (SESSA). This bio-inspired optimization algorithm selects the most relevant features and removes highly correlated and noisy features. We applied the proposed approach to two public WBC Leukemia reference datasets and achieve both high accuracy and reduced computational complexity. The SESSA optimization selected only 1 K out of 25 K features extracted with VGGNet, while improving accuracy at the same time. The results are among the best achieved on these datasets and outperform several convolutional network models. We expect that the combination of CNN feature extraction and SESSA feature optimization could be useful for many other image classification tasks.
Osmotic adaptation and accumulation of compatible solutes is a key process for life at high osmotic pressure and elevated salt concentrations. Most important solutes that can protect cell structures and metabolic processes at high salt concentrations are glycine betaine and ectoine. The genome analysis of more than 130 phototrophic bacteria shows that biosynthesis of glycine betaine is common among marine and halophilic phototrophic Proteobacteria and their chemotrophic relatives, as well as in representatives of Pirellulaceae and Actinobacteria, but are also found in halophilic Cyanobacteria and Chloroherpeton thalassium. This ability correlates well with the successful toleration of extreme salt concentrations. Freshwater bacteria in general lack the possibilities to synthesize and often also to take up these compounds. The biosynthesis of ectoine is found in the phylogenetic lines of phototrophic Alpha- and Gammaproteobacteria, most prominent in the Halorhodospira species and a number of Rhodobacteraceae. It is also common among Streptomycetes and Bacilli. The phylogeny of glycine-sarcosine methyltransferase (GMT) and diaminobutyrate-pyruvate aminotransferase (EctB) sequences correlate well with otherwise established phylogenetic groups. Most significantly, GMT sequences of cyanobacteria form two major phylogenetic branches and the branch of Halorhodospira species is distinct from all other Ectothiorhodospiraceae. A variety of transport systems for osmolytes are present in the studied bacteria.
Bees need food of appropriate nutritional quality to maintain their metabolic functions. They largely obtain all required nutrients from floral resources, i.e., pollen and nectar. However, the diversity, composition and nutritional quality of floral resources varies with the surrounding environment and can be strongly altered in human-impacted habitats. We investigated whether differences in plant species richness as found in the surrounding environment correlated with variation in the floral diversity and nutritional quality of larval provisions (i.e., mixtures of pollen, nectar and salivary secretions) composed by the mass-provisioning stingless bee Tetragonula carbonaria (Apidae: Meliponini). We found that the floral diversity of larval provisions increased with increasing plant species richness. The sucrose and fat (total fatty acid) content and the proportion and concentration of the omega-6 fatty acid linoleic acid decreased, whereas the proportion of the omega-3 fatty acid linolenic acid increased with increasing plant species richness. Protein (total amino acid) content and amino acid composition did not change. The protein to fat (P:F) ratio, known to affect bee foraging, increased on average by more than 40% from plantations to forests and gardens, while the omega-6:3 ratio, known to negatively affect cognitive performance, decreased with increasing plant species richness. Our results suggest that plant species richness may support T. carbonaria colonies by providing not only a continuous resource supply (as shown in a previous study), but also floral resources of high nutritional quality.
Solitary bees are subject to a variety of pressures that cause severe population declines. Currently, habitat loss, temperature shifts, agrochemical exposure, and new parasites are identified as major threats. However, knowledge about detrimental bacteria is scarce, although they may disturb natural microbiomes, disturb nest environments, or harm the larvae directly. To address this gap, we investigated 12 Osmia bicornis nests with deceased larvae and 31 nests with healthy larvae from the same localities in a 16S ribosomal RNA (rRNA) gene metabarcoding study. We sampled larvae, pollen provisions, and nest material and then contrasted bacterial community composition and diversity in healthy and deceased nests. Microbiomes of pollen provisions and larvae showed similarities for healthy larvae, whilst this was not the case for deceased individuals. We identified three bacterial taxa assigned to Paenibacillus sp. (closely related to P. pabuli/amylolyticus/xylanexedens), Sporosarcina sp., and Bacillus sp. as indicative for bacterial communities of deceased larvae, as well as Lactobacillus for corresponding pollen provisions. Furthermore, we performed a provisioning experiment, where we fed larvae with untreated and sterilized pollens, as well as sterilized pollens inoculated with a Bacillus sp. isolate from a deceased larva. Untreated larval microbiomes were consistent with that of the pollen provided. Sterilized pollen alone did not lead to acute mortality, while no microbiome was recoverable from the larvae. In the inoculation treatment, we observed that larval microbiomes were dominated by the seeded bacterium, which resulted in enhanced mortality. These results support that larval microbiomes are strongly determined by the pollen provisions. Further, they underline the need for further investigation of the impact of detrimental bacterial acquired via pollens and potential buffering by a diverse pollen provision microbiome in solitary bees.
Spatial biological networks are abundant on all scales of life, from single cells to ecosystems, and perform various important functions including signal transmission and nutrient transport. These biological functions depend on the architecture of the network, which emerges as the result of a dynamic, feedback-driven developmental process. While cell behavior during growth can be genetically encoded, the resulting network structure depends on spatial constraints and tissue architecture. Since network growth is often difficult to observe experimentally, computer simulations can help to understand how local cell behavior determines the resulting network architecture. We present here a computational framework based on directional statistics to model network formation in space and time under arbitrary spatial constraints. Growth is described as a biased correlated random walk where direction and branching depend on the local environmental conditions and constraints, which are presented as 3D multilayer grid. To demonstrate the application of our tool, we perform growth simulations of a dense network between cells and compare the results to experimental data from osteocyte networks in bone. Our generic framework might help to better understand how network patterns depend on spatial constraints, or to identify the biological cause of deviations from healthy network function.
Author summary
We present a novel modeling approach and computational implementation to better understand the development of spatial biological networks under the influence of external signals. Our tool allows us to study the relationship between local biological growth parameters and the emerging macroscopic network function using simulations. This computational approach can generate plausible network graphs that take local feedback into account and provide a basis for comparative studies using graph-based methods.
In vitro rearing of honeybee larvae is an established method that enables exact control and monitoring of developmental factors and allows controlled application of pesticides or pathogens. However, only a few studies have investigated how the rearing method itself affects the behavior of the resulting adult honeybees. We raised honeybees in vitro according to a standardized protocol: marking the emerging honeybees individually and inserting them into established colonies. Subsequently, we investigated the behavioral performance of nurse bees and foragers and quantified the physiological factors underlying the social organization. Adult honeybees raised in vitro differed from naturally reared honeybees in their probability of performing social tasks. Further, in vitro-reared bees foraged for a shorter duration in their life and performed fewer foraging trips. Nursing behavior appeared to be unaffected by rearing condition. Weight was also unaffected by rearing condition. Interestingly, juvenile hormone titers, which normally increase strongly around the time when a honeybee becomes a forager, were significantly lower in three- and four-week-old in vitro bees. The effects of the rearing environment on individual sucrose responsiveness and lipid levels were rather minor. These data suggest that larval rearing conditions can affect the task performance and physiology of adult bees despite equal weight, pointing to an important role of the colony environment for these factors. Our observations of behavior and metabolic pathways offer important novel insight into how the rearing environment affects adult honeybees.
Investigating diversity gradients helps to understand biodiversity drivers and threats. However, one diversity gradient is rarely assessed, namely how plant species distribute along the depth gradient of lakes. Here, we provide the first comprehensive characterization of depth diversity gradient (DDG) of alpha, beta, and gamma species richness of submerged macrophytes across multiple lakes. We characterize the DDG for additive richness components (alpha, beta, gamma), assess environmental drivers, and address temporal change over recent years. We take advantage of yet the largest dataset of macrophyte occurrence along lake depth (274 depth transects across 28 deep lakes) as well as of physiochemical measurements (12 deep lakes from 2006 to 2017 across Bavaria), provided publicly online by the Bavarian State Office for the Environment. We found a high variability in DDG shapes across the study lakes. The DDGs for alpha and gamma richness are predominantly hump-shaped, while beta richness shows a decreasing DDG. Generalized additive mixed-effect models indicate that the depth of the maximum richness (Dmax) is influenced by light quality, light quantity, and layering depth, whereas the respective maximum alpha richness within the depth gradient (Rmax) is significantly influenced by lake area only. Most observed DDGs seem generally stable over recent years. However, for single lakes we found significant linear trends for Rmax and Dmax going into different directions. The observed hump-shaped DDGs agree with three competing hypotheses: the mid-domain effect, the mean–disturbance hypothesis, and the mean–productivity hypothesis. The DDG amplitude seems driven by lake area (thus following known species–area relationships), whereas skewness depends on physiochemical factors, mainly water transparency and layering depth. Our results provide insights for conservation strategies and for mechanistic frameworks to disentangle competing explanatory hypotheses for the DDG.
Single-molecule super-resolution microscopy (SMLM) techniques like dSTORM can reveal biological structures down to the nanometer scale. The achievable resolution is not only defined by the localization precision of individual fluorescent molecules, but also by their density, which becomes a limiting factor e.g., in expansion microscopy. Artificial deep neural networks can learn to reconstruct dense super-resolved structures such as microtubules from a sparse, noisy set of data points. This approach requires a robust method to assess the quality of a predicted density image and to quantitatively compare it to a ground truth image. Such a quality measure needs to be differentiable to be applied as loss function in deep learning. We developed a new trainable quality measure based on Fourier Ring Correlation (FRC) and used it to train deep neural networks to map a small number of sampling points to an underlying density. Smooth ground truth images of microtubules were generated from localization coordinates using an anisotropic Gaussian kernel density estimator. We show that the FRC criterion ideally complements the existing state-of-the-art multiscale structural similarity index, since both are interpretable and there is no trade-off between them during optimization. The TensorFlow implementation of our FRC metric can easily be integrated into existing deep learning workflows.
Propagule pressure and an invasion syndrome determine invasion success in a plant community model
(2021)
The success of species invasions depends on multiple factors, including propagule pressure, disturbance, productivity, and the traits of native and non-native species. While the importance of many of these determinants has already been investigated in relative isolation, they are rarely studied in combination. Here, we address this shortcoming by exploring the effect of the above-listed factors on the success of invasions using an individual-based mechanistic model. This approach enables us to explicitly control environmental factors (temperature as surrogate for productivity, disturbance, and propagule pressure) as well as to monitor whole-community trait distributions of environmental adaptation, mass, and dispersal abilities. We simulated introductions of plant individuals to an oceanic island to assess which factors and species traits contribute to invasion success. We found that the most influential factors were higher propagule pressure and a particular set of traits. This invasion trait syndrome was characterized by a relative similarity in functional traits of invasive to native species, while invasive species had on average higher environmental adaptation, higher body mass, and increased dispersal distances, that is, had greater competitive and dispersive abilities. Our results highlight the importance in management practice of reducing the import of alien species, especially those that display this trait syndrome and come from similar habitats as those being managed.
Neurotransmitter release is stabilized by homeostatic plasticity. Presynaptic homeostatic potentiation (PHP) operates on timescales ranging from minute- to life-long adaptations and likely involves reorganization of presynaptic active zones (AZs). At Drosophila melanogaster neuromuscular junctions, earlier work ascribed AZ enlargement by incorporating more Bruchpilot (Brp) scaffold protein a role in PHP. We use localization microscopy (direct stochastic optical reconstruction microscopy [dSTORM]) and hierarchical density-based spatial clustering of applications with noise (HDBSCAN) to study AZ plasticity during PHP at the synaptic mesoscale. We find compaction of individual AZs in acute philanthotoxin-induced and chronic genetically induced PHP but unchanged copy numbers of AZ proteins. Compaction even occurs at the level of Brp subclusters, which move toward AZ centers, and in Rab3 interacting molecule (RIM)-binding protein (RBP) subclusters. Furthermore, correlative confocal and dSTORM imaging reveals how AZ compaction in PHP translates into apparent increases in AZ area and Brp protein content, as implied earlier.
Understanding the genetic architecture of complex traits is a major objective in biology. The standard approach for doing so is genome-wide association studies (GWAS), which aim to identify genetic polymorphisms responsible for variation in traits of interest. In human genetics, consistency across studies is commonly used as an indicator of reliability. However, if traits are involved in adaptation to the local environment, we do not necessarily expect reproducibility. On the contrary, results may depend on where you sample, and sampling across a wide range of environments may decrease the power of GWAS because of increased genetic heterogeneity. In this study, we examine how sampling affects GWAS in the model plant species Arabidopsis thaliana. We show that traits like flowering time are indeed influenced by distinct genetic effects in local populations. Furthermore, using gene expression as a molecular phenotype, we show that some genes are globally affected by shared variants, whereas others are affected by variants specific to subpopulations. Remarkably, the former are essentially all cis-regulated, whereas the latter are predominately affected by trans-acting variants. Our result illustrate that conclusions about genetic architecture can be extremely sensitive to sampling and population structure.
Background: Renal cell carcinoma (RCC) is divided into three major histopathologic groups—clear cell (ccRCC), papillary (pRCC) and chromophobe RCC (chRCC). We performed a comprehensive re-analysis of publicly available RCC datasets from the TCGA (The Cancer Genome Atlas) database, thereby combining samples from all three subgroups, for an exploratory transcriptome profiling of RCC subgroups.
Materials and Methods: We used FPKM (fragments per kilobase per million) files derived from the ccRCC, pRCC and chRCC cohorts of the TCGA database, representing transcriptomic data of 891 patients. Using principal component analysis, we visualized datasets as t-SNE plot for cluster detection. Clusters were characterized by machine learning, resulting gene signatures were validated by correlation analyses in the TCGA dataset and three external datasets (ICGC RECA-EU, CPTAC-3-Kidney, and GSE157256).
Results: Many RCC samples co-clustered according to histopathology. However, a substantial number of samples clustered independently from histopathologic origin (mixed subgroup)—demonstrating divergence between histopathology and transcriptomic data. Further analyses of mixed subgroup via machine learning revealed a predominant mitochondrial gene signature—a trait previously known for chRCC—across all histopathologic subgroups. Additionally, ccRCC samples from mixed subgroup presented an inverse correlation of mitochondrial and angiogenesis-related genes in the TCGA and in three external validation cohorts. Moreover, mixed subgroup affiliation was associated with a highly significant shorter overall survival for patients with ccRCC—and a highly significant longer overall survival for chRCC patients.
Conclusions: Pan-RCC clustering according to RNA-sequencing data revealed a distinct histology-independent subgroup characterized by strengthened mitochondrial and weakened angiogenesis-related gene signatures. Moreover, affiliation to mixed subgroup went along with a significantly shorter overall survival for ccRCC and a longer overall survival for chRCC patients. Further research could offer a therapy stratification by specifically addressing the mitochondrial metabolism of such tumors and its microenvironment.