• Treffer 5 von 139
Zurück zur Trefferliste

Optimized cell type signatures revealed from single-cell data by combining principal feature analysis, mutual information, and machine learning

Zitieren Sie bitte immer diese URN: urn:nbn:de:bvb:20-opus-349989
  • Machine learning techniques are excellent to analyze expression data from single cells. These techniques impact all fields ranging from cell annotation and clustering to signature identification. The presented framework evaluates gene selection sets how far they optimally separate defined phenotypes or cell groups. This innovation overcomes the present limitation to objectively and correctly identify a small gene set of high information content regarding separating phenotypes for which corresponding code scripts are provided. The small butMachine learning techniques are excellent to analyze expression data from single cells. These techniques impact all fields ranging from cell annotation and clustering to signature identification. The presented framework evaluates gene selection sets how far they optimally separate defined phenotypes or cell groups. This innovation overcomes the present limitation to objectively and correctly identify a small gene set of high information content regarding separating phenotypes for which corresponding code scripts are provided. The small but meaningful subset of the original genes (or feature space) facilitates human interpretability of the differences of the phenotypes including those found by machine learning results and may even turn correlations between genes and phenotypes into a causal explanation. For the feature selection task, the principal feature analysis is utilized which reduces redundant information while selecting genes that carry the information for separating the phenotypes. In this context, the presented framework shows explainability of unsupervised learning as it reveals cell-type specific signatures. Apart from a Seurat preprocessing tool and the PFA script, the pipeline uses mutual information to balance accuracy and size of the gene set if desired. A validation part to evaluate the gene selection for their information content regarding the separation of the phenotypes is provided as well, binary and multiclass classification of 3 or 4 groups are studied. Results from different single-cell data are presented. In each, only about ten out of more than 30000 genes are identified as carrying the relevant information. The code is provided in a GitHub repository at https://github.com/AC-PHD/Seurat_PFA_pipeline.zeige mehrzeige weniger

Volltext Dateien herunterladen

Metadaten exportieren

Weitere Dienste

Teilen auf Twitter Suche bei Google Scholar Statistik - Anzahl der Zugriffe auf das Dokument
Metadaten
Autor(en): Aylin Caliskan, Deniz Caliskan, Lauritz Rasbach, Weimeng Yu, Thomas Dandekar, Tim Breitenbach
URN:urn:nbn:de:bvb:20-opus-349989
Dokumentart:Artikel / Aufsatz in einer Zeitschrift
Institute der Universität:Medizinische Fakultät / Theodor-Boveri-Institut für Biowissenschaften
Sprache der Veröffentlichung:Englisch
Titel des übergeordneten Werkes / der Zeitschrift (Englisch):Computational and Structural Biotechnology Journal
ISSN:2001-0370
Erscheinungsjahr:2023
Band / Jahrgang:21
Seitenangabe:3293-3314
Originalveröffentlichung / Quelle:Computational and Structural Biotechnology Journal (2023) 21:3293-3314. DOI: 10.1016/j.csbj.2023.06.002
DOI:https://doi.org/10.1016/j.csbj.2023.06.002
Allgemeine fachliche Zuordnung (DDC-Klassifikation):5 Naturwissenschaften und Mathematik / 57 Biowissenschaften; Biologie / 570 Biowissenschaften; Biologie
Freie Schlagwort(e):explainability of machine learning; feature analysis; feature selection; machine learning; model reduction; principal; single cell analysis
Datum der Freischaltung:28.03.2024
Lizenz (Deutsch):License LogoCC BY-NC-ND: Creative-Commons-Lizenz: Namensnennung, Nicht kommerziell, Keine Bearbeitungen 4.0 International