Refine
Has Fulltext
- yes (4)
Is part of the Bibliography
- yes (4)
Document Type
- Journal article (4)
Language
- English (4)
Keywords
- Disease gene prioritization (1)
- Polygonum cuspidatum (1)
- Protein function prediction (1)
- cancer (1)
- genetics (1)
- genome assembly (1)
- medicinal plant (1)
- oncology (1)
- resveratrol biosynthesis (1)
- risk factors (1)
Institute
Polygonum cuspidatum (Japanese knotweed, also known as Huzhang in Chinese), a plant that produces bioactive components such as stilbenes and quinones, has long been recognized as important in traditional Chinese herbal medicine. To better understand the biological features of this plant and to gain genetic insight into the biosynthesis of its natural products, we assembled a draft genome of P. cuspidatum using Illumina sequencing technology. The draft genome is ca. 2.56 Gb long, with 71.54% of the genome annotated as transposable elements. Integrated gene prediction suggested that the P. cuspidatum genome encodes 55,075 functional genes, including 6,776 gene families that are conserved in the five eudicot species examined and 2,386 that are unique to P. cuspidatum. Among the functional genes identified, 4,753 are predicted to encode transcription factors. We traced the gene duplication history of P. cuspidatum and determined that it has undergone two whole-genome duplication events about 65 and 6.6 million years ago. Roots are considered the primary medicinal tissue, and transcriptome analysis identified 2,173 genes that were expressed at higher levels in roots compared to aboveground tissues. Detailed phylogenetic analysis demonstrated expansion of the gene family encoding stilbene synthase and chalcone synthase enzymes in the phenylpropanoid metabolic pathway, which is associated with the biosynthesis of resveratrol, a pharmacologically important stilbene. Analysis of the draft genome identified 7 abscisic acid and water deficit stress-induced protein-coding genes and 14 cysteine-rich transmembrane module genes predicted to be involved in stress responses. The draft de novo genome assembly produced in this study represents a valuable resource for the molecular characterization of medicinal compounds in P. cuspidatum, the improvement of this important medicinal plant, and the exploration of its abiotic stress resistance.
An expanded evaluation of protein function prediction methods shows an improvement in accuracy
(2016)
Background
A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.
Results
We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.
Conclusions
The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
Fanconi anemia (FA) is a genetically heterogeneous disorder with 22 disease-causing genes reported to date. In some FA genes, monoallelic mutations have been found to be associated with breast cancer risk, while the risk associations of others remain unknown. The gene for FA type C, FANCC, has been proposed as a breast cancer susceptibility gene based on epidemiological and sequencing studies. We used the Oncoarray project to genotype two truncating FANCC variants (p.R185X and p.R548X) in 64,760 breast cancer cases and 49,793 controls of European descent. FANCC mutations were observed in 25 cases (14 with p.R185X, 11 with p.R548X) and 26 controls (18 with p.R185X, 8 with p.R548X). There was no evidence of an association with the risk of breast cancer, neither overall (odds ratio 0.77, 95%CI 0.44–1.33, p = 0.4) nor by histology, hormone receptor status, age or family history. We conclude that the breast cancer risk association of these two FANCC variants, if any, is much smaller than for BRCA1, BRCA2 or PALB2 mutations. If this applies to all truncating variants in FANCC it would suggest there are differences between FA genes in their roles on breast cancer risk and demonstrates the merit of large consortia for clarifying risk associations of rare variants.
Genome-wide association studies (GWAS) have identified more than 170 breast cancer susceptibility loci. Here we hypothesize that some risk-associated variants might act in non-breast tissues, specifically adipose tissue and immune cells from blood and spleen. Using expression quantitative trait loci (eQTL) reported in these tissues, we identify 26 previously unreported, likely target genes of overall breast cancer risk variants, and 17 for estrogen receptor (ER)-negative breast cancer, several with a known immune function. We determine the directional effect of gene expression on disease risk measured based on single and multiple eQTL. In addition, using a gene-based test of association that considers eQTL from multiple tissues, we identify seven (and four) regions with variants associated with overall (and ER-negative) breast cancer risk, which were not reported in previous GWAS. Further investigation of the function of the implicated genes in breast and immune cells may provide insights into the etiology of breast cancer.