Refine
Has Fulltext
- yes (3)
Is part of the Bibliography
- yes (3)
Document Type
- Journal article (3)
Language
- English (3)
Keywords
- Disease gene prioritization (1)
- Protein function prediction (1)
- association (1)
- decision making (1)
- genetics (1)
- linkage (1)
- multiple sclerosis (1)
- null hypothesis testing (1)
- p-value (1)
- plasminogen (1)
An expanded evaluation of protein function prediction methods shows an improvement in accuracy
(2016)
Background
A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.
Results
We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.
Conclusions
The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
Multiple sclerosis (MS) is a prevalent neurological disease of complex etiology. Here, we describe the characterization of a multi-incident MS family that nominated a rare missense variant (p.G420D) in plasminogen (PLG) as a putative genetic risk factor for MS. Genotyping of PLG p.G420D (rs139071351) in 2160 MS patients, and 886 controls from Canada, identified 10 additional probands, two sporadic patients and one control with the variant. Segregation in families harboring the rs139071351 variant, identified p.G420D in 26 out of 30 family members diagnosed with MS, 14 unaffected parents, and 12 out of 30 family members not diagnosed with disease. Despite considerably reduced penetrance, linkage analysis supports cosegregation of PLG p.G420D and disease. Genotyping of PLG p.G420D in 14446 patients, and 8797 controls from Canada, France, Spain, Germany, Belgium, and Austria failed to identify significant association with disease (P = 0.117), despite an overall higher prevalence in patients (OR = 1.32; 95% CI = 0.93–1.87). To assess whether additional rare variants have an effect on MS risk, we sequenced PLG in 293 probands, and genotyped all rare variants in cases and controls. This analysis identified nine rare missense variants, and although three of them were exclusively observed in MS patients, segregation does not support pathogenicity. PLG is a plausible biological candidate for MS owing to its involvement in immune system response, blood-brain barrier permeability, and myelin degradation. Moreover, components of its activation cascade have been shown to present increased activity or expression in MS patients compared to controls; further studies are needed to clarify whether PLG is involved in MS susceptibility.
We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of 0.05, 0.01, 0.005, or anything else, is not acceptable.