Refine
Has Fulltext
- yes (8)
Is part of the Bibliography
- yes (8)
Document Type
- Journal article (8)
Keywords
- genotype (2)
- phenotype (2)
- Arabidopsis thaliana (1)
- GWAS (1)
- Genom (1)
- Genomweite Assoziationstudie (GWAS) (1)
- Jacobian matrix (1)
- Keras (1)
- Phänotyp (1)
- QST-FST analysis (1)
Institute
Natural genetic variation makes it possible to discover evolutionary changes that have been maintained in a population because they are advantageous. To understand genotype–phenotype relationships and to investigate trait architecture, the existence of both high-resolution genotypic and phenotypic data is necessary. Arabidopsis thaliana is a prime model for these purposes. This herb naturally occurs across much of the Eurasian continent and North America. Thus, it is exposed to a wide range of environmental factors and has been subject to natural selection under distinct conditions. Full genome sequencing data for more than 1000 different natural inbred lines are available, and this has encouraged the distributed generation of many types of phenotypic data. To leverage these data for meta analyses, AraPheno (https://arapheno.1001genomes.org) provide a central repository of population-scale phenotypes for A. thaliana inbred lines. AraPheno includes various features to easily access, download and visualize the phenotypic data. This will facilitate a comparative analysis of the many different types of phenotypic data, which is the base to further enhance our understanding of the genotype–phenotype map.
Understanding the causal relationship between genotype and phenotype is a major objective in biology. Genome-wide association studies (GWAS) correlate genetic polymorphisms with trait variation and have already identified causative variants for various traits in many different organisms, from humans to plants. Importantly, many adaptive traits, like the regulation of flowering time in plants, are not regulated by distinct genetic effects, but by more sophisticated gene regulatory networks.
Although many genes have been identified using high throughput technologies in endometriosis (ES), only a small number of individual genes have been analyzed functionally. This is due to the complexity of the disease that has different stages and is affected by various genetic and environmental factors. Many genes are upregulated or downregulated at each stage of the disease, thus making it difficult to identify key genes. In addition, little is known about the differences between the different stages of the disease. We assumed that the study of the identified genes in ES at a system-level can help to better understand the molecular mechanism of the disease at different stages of the development. We used publicly available microarray data containing archived endometrial samples from women with minimal/mild endometriosis (MMES), mild/severe endometriosis (MSES) and without endometriosis. Using weighted gene co-expression analysis (WGCNA), functional modules were derived from normal endometrium (NEM) as the reference sample. Subsequently, we tested whether the topology or connectivity pattern of the modules was preserved in MMES and/or MSES. Common and specific hub genes were identified in non-preserved modules. Accordingly, hub genes were detected in the non-preserved modules at each stage. We identified sixteen co-expression modules. Of the 16 modules, nine were non-preserved in both MMES and MSES whereas five were preserved in NEM, MMES, and MSES. Importantly, two non-preserved modules were found in either MMES or MSES, highlighting differences between the two stages of the disease. Analyzing the hub genes in the non-preserved modules showed that they mostly lost or gained their centrality in NEM after developing the disease into MMES and MSES. The same scenario was observed, when the severeness of the disease switched from MMES to MSES. Interestingly, the expression analysis of the new selected gene candidates including CC2D2A, AEBP1, HOXB6, IER3, and STX18 as well as IGF-1, CYP11A1 and MMP-2 could validate such shifts between different stages. The overrepresented gene ontology (GO) terms were enriched in specific modules, such as genetic disposition, estrogen dependence, progesterone resistance and inflammation, which are known as endometriosis hallmarks. Some modules uncovered novel co-expressed gene clusters that were not previously discovered.
The abundance of high-quality genotype and phenotype data for the model organism Arabidopsis thaliana enables scientists to study the genetic architecture of many complex traits at an unprecedented level of detail using genome-wide association studies (GWAS). GWAS have been a great success in A. thaliana and many SNP-trait associations have been published. With the AraGWAS Catalog (https://aragwas.1001genomes.org) we provide a publicly available, manually curated and standardized GWAS catalog for all publicly available phenotypes from the central A. thaliana phenotype repository, AraPheno. All GWAS have been recomputed on the latest imputed genotype release of the 1001 Genomes Consortium using a standardized GWAS pipeline to ensure comparability between results. The catalog includes currently 167 phenotypes and more than 222 000 SNP-trait associations with P < 10\(^{-4}\), of which 3887 are significantly associated using permutation-based thresholds. The AraGWAS Catalog can be accessed via a modern web-interface and provides various features to easily access, download and visualize the results and summary statistics across GWAS.
The prediction of breeding values and phenotypes is of central importance for both livestock and crop breeding. In this study, we analyze the use of artificial neural networks (ANN) and, in particular, local convolutional neural networks (LCNN) for genomic prediction, as a region-specific filter corresponds much better with our prior genetic knowledge on the genetic architecture of traits than traditional convolutional neural networks. Model performances are evaluated on a simulated maize data panel (n = 10,000; p = 34,595) and real Arabidopsis data (n = 2,039; p = 180,000) for a variety of traits based on their predictive ability. The baseline LCNN, containing one local convolutional layer (kernel size: 10) and two fully connected layers with 64 nodes each, is outperforming commonly proposed ANNs (multi layer perceptrons and convolutional neural networks) for basically all considered traits. For traits with high heritability and large training population as present in the simulated data, LCNN are even outperforming state-of-the-art methods like genomic best linear unbiased prediction (GBLUP), Bayesian models and extended GBLUP, indicated by an increase in predictive ability of up to 24%. However, for small training populations, these state-of-the-art methods outperform all considered ANNs. Nevertheless, the LCNN still outperforms all other considered ANNs by around 10%. Minor improvements to the tested baseline network architecture of the LCNN were obtained by increasing the kernel size and of reducing the stride, whereas the number of subsequent fully connected layers and their node sizes had neglectable impact. Although gains in predictive ability were obtained for large scale data sets by using LCNNs, the practical use of ANNs comes with additional problems, such as the need of genotyping all considered individuals, the lack of estimation of heritability and reliability. Furthermore, breeding values are additive by design, whereas ANN-based estimates are not. However, ANNs also comes with new opportunities, as networks can easily be extended to account for additional inputs (omics, weather etc.) and outputs (multi-trait models), and computing time increases linearly with the number of individuals. With advances in high-throughput phenotyping and cheaper genotyping, ANNs can become a valid alternative for genomic prediction.
Experimental high-throughput analysis of molecular networks is a central approach to characterize the adaptation of plant metabolism to the environment. However, recent studies have demonstrated that it is hardly possible to predict in situ metabolic phenotypes from experiments under controlled conditions, such as growth chambers or greenhouses. This is particularly due to the high molecular variance of in situ samples induced by environmental fluctuations. An approach of functional metabolome interpretation of field samples would be desirable in order to be able to identify and trace back the impact of environmental changes on plant metabolism. To test the applicability of metabolomics studies for a characterization of plant populations in the field, we have identified and analyzed in situ samples of nearby grown natural populations of Arabidopsis thaliana in Austria. A. thaliana is the primary molecular biological model system in plant biology with one of the best functionally annotated genomes representing a reference system for all other plant genome projects. The genomes of these novel natural populations were sequenced and phylogenetically compared to a comprehensive genome database of A. thaliana ecotypes. Experimental results on primary and secondary metabolite profiling and genotypic variation were functionally integrated by a data mining strategy, which combines statistical output of metabolomics data with genome-derived biochemical pathway reconstruction and metabolic modeling. Correlations of biochemical model predictions and population-specific genetic variation indicated varying strategies of metabolic regulation on a population level which enabled the direct comparison, differentiation, and prediction of metabolic adaptation of the same species to different habitats. These differences were most pronounced at organic and amino acid metabolism as well as at the interface of primary and secondary metabolism and allowed for the direct classification of population-specific metabolic phenotypes within geographically contiguous sampling sites.
Understanding the genetic architecture of complex traits is a major objective in biology. The standard approach for doing so is genome-wide association studies (GWAS), which aim to identify genetic polymorphisms responsible for variation in traits of interest. In human genetics, consistency across studies is commonly used as an indicator of reliability. However, if traits are involved in adaptation to the local environment, we do not necessarily expect reproducibility. On the contrary, results may depend on where you sample, and sampling across a wide range of environments may decrease the power of GWAS because of increased genetic heterogeneity. In this study, we examine how sampling affects GWAS in the model plant species Arabidopsis thaliana. We show that traits like flowering time are indeed influenced by distinct genetic effects in local populations. Furthermore, using gene expression as a molecular phenotype, we show that some genes are globally affected by shared variants, whereas others are affected by variants specific to subpopulations. Remarkably, the former are essentially all cis-regulated, whereas the latter are predominately affected by trans-acting variants. Our result illustrate that conclusions about genetic architecture can be extremely sensitive to sampling and population structure.
Stomata control gas exchanges between the plant and the atmosphere. How natural variation in stomata size and density contributes to resolve trade-offs between carbon uptake and water loss in response to local climatic variation is not yet understood. We developed an automated confocal microscopy approach to characterize natural genetic variation in stomatal patterning in 330 fully sequenced Arabidopsis thaliana accessions collected throughout the European range of the species. We compared this to variation in water-use efficiency, measured as carbon isotope discrimination (δ13C). We detect substantial genetic variation for stomata size and density segregating within Arabidopsis thaliana. A positive correlation between stomata size and δ13C further suggests that this variation has consequences on water-use efficiency. Genome wide association analyses indicate a complex genetic architecture underlying not only variation in stomatal patterning but also to its covariation with carbon uptake parameters. Yet, we report two novel QTL affecting δ13C independently of stomatal patterning. This suggests that, in A. thaliana, both morphological and physiological variants contribute to genetic variance in water-use efficiency. Patterns of regional differentiation and covariation with climatic parameters indicate that natural selection has contributed to shape some of this variation, especially in Southern Sweden, where water availability is more limited in spring relative to summer. These conditions are expected to favour the evolution of drought avoidance mechanisms over drought escape strategies.