An expanded evaluation of protein function prediction methods shows an improvement in accuracy
(2016)
Background
A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.
Results
We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.
Conclusions
The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
Motivation
The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. These data enable users to calculate temporal trends in biodiversity within and amongst assemblages using a broad range of metrics. BioTIME is being developed as a community-led open-source database of biodiversity time series. Our goal is to accelerate and facilitate quantitative analysis of temporal patterns of biodiversity in the Anthropocene.
Main types of variables included
The database contains 8,777,413 species abundance records, from assemblages consistently sampled for a minimum of 2 years, which need not necessarily be consecutive. In addition, the database contains metadata relating to sampling methodology and contextual information about each record.
Spatial location and grain
BioTIME is a global database of 547,161 unique sampling locations spanning the marine, freshwater and terrestrial realms. Grain size varies across datasets from 0.0000000158 km2 (158 cm2) to 100 km2 (1,000,000,000,000 cm2).
Time period and grain
BioTIME records span from 1874 to 2016. The minimal temporal grain across all datasets in BioTIME is a year.
Major taxa and level of measurement
BioTIME includes data from 44,440 species across the plant and animal kingdoms, ranging from plants, plankton and terrestrial invertebrates to small and large vertebrates.
Software format
.csv and .SQL.
No abstract available
Background
We aimed to accurately estimate the frequency of a hexanucleotide repeat expansion in C9orf72 that has been associated with a large proportion of cases of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD).
Methods
We screened 4448 patients diagnosed with ALS (El Escorial criteria) and 1425 patients with FTD (Lund-Manchester criteria) from 17 regions worldwide for the GGGGCC hexanucleotide expansion using a repeat-primed PCR assay. We assessed familial disease status on the basis of self-reported family history of similar neurodegenerative diseases at the time of sample collection. We compared haplotype data for 262 patients carrying the expansion with the known Finnish founder risk haplotype across the chromosomal locus. We calculated age-related penetrance using the Kaplan-Meier method with data for 603 individuals with the expansion.
Findings
In patients with sporadic ALS, we identified the repeat expansion in 236 (7·0%) of 3377 white individuals from the USA, Europe, and Australia, two (4·1%) of 49 black individuals from the USA, and six (8·3%) of 72 Hispanic individuals from the USA. The mutation was present in 217 (39·3%) of 552 white individuals with familial ALS from Europe and the USA. 59 (6·0%) of 981 white Europeans with sporadic FTD had the mutation, as did 99 (24·8%) of 400 white Europeans with familial FTD. Data for other ethnic groups were sparse, but we identified one Asian patient with familial ALS (from 20 assessed) and two with familial FTD (from three assessed) who carried the mutation. The mutation was not carried by the three Native Americans or 360 patients from Asia or the Pacific Islands with sporadic ALS who were tested, or by 41 Asian patients with sporadic FTD. All patients with the repeat expansion had (partly or fully) the founder haplotype, suggesting a one-off expansion occurring about 1500 years ago. The pathogenic expansion was non-penetrant in individuals younger than 35 years, 50% penetrant by 58 years, and almost fully penetrant by 80 years.
Interpretation
A common Mendelian genetic lesion in C9orf72 is implicated in many cases of sporadic and familial ALS and FTD. Testing for this pathogenic expansion should be considered in the management and genetic counselling of patients with these fatal neurodegenerative diseases.
Diabetic kidney disease (DKD) is the most common etiology of chronic kidney disease (CKD) in the industrialized world and accounts for much of the excess mortality in patients with diabetes mellitus. Approximately 45% of U.S. patients with incident end-stage kidney disease (ESKD) have DKD. Independent of glycemic control, DKD aggregates in families and has higher incidence rates in African, Mexican, and American Indian ancestral groups relative to European populations. The Family Investigation of Nephropathy and Diabetes (FIND) performed a genome-wide association study (GWAS) contrasting 6,197 unrelated individuals with advanced DKD with healthy and diabetic individuals lacking nephropathy of European American, African American, Mexican American, or American Indian ancestry. A large-scale replication and trans-ethnic meta-analysis included 7,539 additional European American, African American and American Indian DKD cases and non-nephropathy controls. Within ethnic group meta-analysis of discovery GWAS and replication set results identified genome-wide significant evidence for association between DKD and rs12523822 on chromosome 6q25.2 in American Indians (P = 5.74x10\(^{−9}\)). The strongest signal of association in the trans-ethnic meta-analysis was with a SNP in strong linkage disequilibrium with rs12523822 (rs955333; P = 1.31x10\(^{−8}\)), with directionally consistent results across ethnic groups. These 6q25.2 SNPs are located between the SCAF8 and CNKSR3 genes, a region with DKD relevant changes in gene expression and an eQTL with IPCEF1, a gene co-translated with CNKSR3. Several other SNPs demonstrated suggestive evidence of association with DKD, within and across populations. These data identify a novel DKD susceptibility locus with consistent directions of effect across diverse ancestral groups and provide insight into the genetic architecture of DKD.
Understanding the pathways involved in the formation and stability of the core and shell regions of a platelet-rich arterial thrombus may result in new ways to treat arterial thrombosis. The distinguishing feature between these two regions is the absence of fibrin in the shell which indicates that in vitro flow-based assays over thrombogenic surfaces, in the absence of coagulation, can be used to resemble this region. In this study, we have investigated the contribution of Syk tyrosine kinase in the stability of platelet aggregates (or thrombi) formed on collagen or atherosclerotic plaque homogenate at arterial shear (1000 s\(^{−1}\)). We show that post-perfusion of the Syk inhibitor PRT-060318 over preformed thrombi on both surfaces enhances thrombus breakdown and platelet detachment. The resulting loss of thrombus stability led to a reduction in thrombus contractile score which could be detected as early as 3 min after perfusion of the Syk inhibitor. A similar loss of thrombus stability was observed with ticagrelor and indomethacin, inhibitors of platelet adenosine diphosphate (ADP) receptor and thromboxane A\(_2\) (TxA\(_2\)), respectively, and in the presence of the Src inhibitor, dasatinib. In contrast, the Btk inhibitor, ibrutinib, causes only a minor decrease in thrombus contractile score. Weak thrombus breakdown is also seen with the blocking GPVI nanobody, Nb21, which indicates, at best, a minor contribution of collagen to the stability of the platelet aggregate. These results show that Syk regulates thrombus stability in the absence of fibrin in human platelets under flow and provide evidence that this involves pathways additional to activation of GPVI by collagen.
No abstract available