TY  - JOUR
A1  - Caliskan, Aylin
A1  - Caliskan, Deniz
A1  - Rasbach, Lauritz
A1  - Yu, Weimeng
A1  - Dandekar, Thomas
A1  - Breitenbach, Tim
T1  - Optimized cell type signatures revealed from single-cell data by combining principal feature analysis, mutual information, and machine learning
JF  - Computational and Structural Biotechnology Journal
N2  - Machine learning techniques are excellent to analyze expression data from single cells. These techniques impact all fields ranging from cell annotation and clustering to signature identification. The presented framework evaluates gene selection sets how far they optimally separate defined phenotypes or cell groups. This innovation overcomes the present limitation to objectively and correctly identify a small gene set of high information content regarding separating phenotypes for which corresponding code scripts are provided. The small but meaningful subset of the original genes (or feature space) facilitates human interpretability of the differences of the phenotypes including those found by machine learning results and may even turn correlations between genes and phenotypes into a causal explanation. For the feature selection task, the principal feature analysis is utilized which reduces redundant information while selecting genes that carry the information for separating the phenotypes. In this context, the presented framework shows explainability of unsupervised learning as it reveals cell-type specific signatures. Apart from a Seurat preprocessing tool and the PFA script, the pipeline uses mutual information to balance accuracy and size of the gene set if desired. A validation part to evaluate the gene selection for their information content regarding the separation of the phenotypes is provided as well, binary and multiclass classification of 3 or 4 groups are studied. Results from different single-cell data are presented. In each, only about ten out of more than 30000 genes are identified as carrying the relevant information. The code is provided in a GitHub repository at https://github.com/AC-PHD/Seurat_PFA_pipeline.
KW  - single cell analysis
KW  - machine learning
KW  - explainability of machine learning
KW  - principal
KW  - feature analysis
KW  - model reduction
KW  - feature selection
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-349989
SN  - 2001-0370
VL  - 21
ER  - 
TY  - JOUR
A1  - Dhillon, Maninder Singh
A1  - Dahms, Thorsten
A1  - Kuebert-Flock, Carina
A1  - Rummler, Thomas
A1  - Arnault, Joel
A1  - Steffan-Dewenter, Ingolf
A1  - Ullmann, Tobias
T1  - Integrating random forest and crop modeling improves the crop yield prediction of winter wheat and oil seed rape
JF  - Frontiers in Remote Sensing
N2  - The fast and accurate yield estimates with the increasing availability and variety of global satellite products and the rapid development of new algorithms remain a goal for precision agriculture and food security. However, the consistency and reliability of suitable methodologies that provide accurate crop yield outcomes still need to be explored. The study investigates the coupling of crop modeling and machine learning (ML) to improve the yield prediction of winter wheat (WW) and oil seed rape (OSR) and provides examples for the Free State of Bavaria (70,550 km2), Germany, in 2019. The main objectives are to find whether a coupling approach [Light Use Efficiency (LUE) + Random Forest (RF)] would result in better and more accurate yield predictions compared to results provided with other models not using the LUE. Four different RF models [RF1 (input: Normalized Difference Vegetation Index (NDVI)), RF2 (input: climate variables), RF3 (input: NDVI + climate variables), RF4 (input: LUE generated biomass + climate variables)], and one semi-empiric LUE model were designed with different input requirements to find the best predictors of crop monitoring. The results indicate that the individual use of the NDVI (in RF1) and the climate variables (in RF2) could not be the most accurate, reliable, and precise solution for crop monitoring; however, their combined use (in RF3) resulted in higher accuracies. Notably, the study suggested the coupling of the LUE model variables to the RF4 model can reduce the relative root mean square error (RRMSE) from −8% (WW) and −1.6% (OSR) and increase the R
2 by 14.3% (for both WW and OSR), compared to results just relying on LUE. Moreover, the research compares models yield outputs by inputting three different spatial inputs: Sentinel-2(S)-MOD13Q1 (10 m), Landsat (L)-MOD13Q1 (30 m), and MOD13Q1 (MODIS) (250 m). The S-MOD13Q1 data has relatively improved the performance of models with higher mean R
2 [0.80 (WW), 0.69 (OSR)], and lower RRMSE (%) (9.18, 10.21) compared to L-MOD13Q1 (30 m) and MOD13Q1 (250 m). Satellite-based crop biomass, solar radiation, and temperature are found to be the most influential variables in the yield prediction of both crops.
KW  - crop modeling
KW  - random forest
KW  - machine learning
KW  - NDVI
KW  - satellite
KW  - landsat
KW  - sentinel-2
KW  - winter wheat
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-301462
SN  - 2673-6187
VL  - 3
ER  - 
TY  - JOUR
A1  - Vey, Johannes
A1  - Kapsner, Lorenz A.
A1  - Fuchs, Maximilian
A1  - Unberath, Philipp
A1  - Veronesi, Giulia
A1  - Kunz, Meik
T1  - A toolbox for functional analysis and the systematic identification of diagnostic and prognostic gene expression signatures combining meta-analysis and machine learning
JF  - Cancers
N2  - The identification of biomarker signatures is important for cancer diagnosis and prognosis. However, the detection of clinical reliable signatures is influenced by limited data availability, which may restrict statistical power. Moreover, methods for integration of large sample cohorts and signature identification are limited. We present a step-by-step computational protocol for functional gene expression analysis and the identification of diagnostic and prognostic signatures by combining meta-analysis with machine learning and survival analysis. The novelty of the toolbox lies in its all-in-one functionality, generic design, and modularity. It is exemplified for lung cancer, including a comprehensive evaluation using different validation strategies. However, the protocol is not restricted to specific disease types and can therefore be used by a broad community. The accompanying R package vignette runs in ~1 h and describes the workflow in detail for use by researchers with limited bioinformatics training.
KW  - bioinformatics tool
KW  - R package
KW  - machine learning
KW  - meta-analysis
KW  - biomarker signature
KW  - gene expression analysis
KW  - survival analysis
KW  - functional analysis
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-193240
SN  - 2072-6694
VL  - 11
IS  - 10
ER  - 
TY  - JOUR
A1  - Kaltdorf, Kristin Verena
A1  - Theiss, Maria
A1  - Markert, Sebastian Matthias
A1  - Zhen, Mei
A1  - Dandekar, Thomas
A1  - Stigloher, Christian
A1  - Kollmannsberger, Philipp
T1  - Automated classification of synaptic vesicles in electron tomograms of C. elegans using machine learning
JF  - PLoS ONE
N2  - Synaptic vesicles (SVs) are a key component of neuronal signaling and fulfil different roles depending on their composition. In electron micrograms of neurites, two types of vesicles can be distinguished by morphological criteria, the classical “clear core” vesicles (CCV) and the typically larger “dense core” vesicles (DCV), with differences in electron density due to their diverse cargos. Compared to CCVs, the precise function of DCVs is less defined. DCVs are known to store neuropeptides, which function as neuronal messengers and modulators [1]. In C. elegans, they play a role in locomotion, dauer formation, egg-laying, and mechano- and chemosensation [2]. Another type of DCVs, also referred to as granulated vesicles, are known to transport Bassoon, Piccolo and further constituents of the presynaptic density in the center of the active zone (AZ), and therefore are important for synaptogenesis [3].
To better understand the role of different types of SVs, we present here a new automated approach to classify vesicles. We combine machine learning with an extension of our previously developed vesicle segmentation workflow, the ImageJ macro 3D ART VeSElecT. With that we reliably distinguish CCVs and DCVs in electron tomograms of C. elegans NMJs using image-based features. Analysis of the underlying ground truth data shows an increased fraction of DCVs as well as a higher mean distance between DCVs and AZs in dauer larvae compared to young adult hermaphrodites. Our machine learning based tools are adaptable and can be applied to study properties of different synaptic vesicle pools in electron tomograms of diverse model organisms.
KW  - synaptic vesicles
KW  - Caenorhabditis elegans
KW  - machine learning
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-176831
VL  - 13
IS  - 10
ER  -