Refine
Has Fulltext
- yes (5)
Is part of the Bibliography
- yes (5)
Document Type
- Doctoral Thesis (5) (remove)
Language
- English (5) (remove)
Keywords
- Maschinelles Lernen (5) (remove)
Institute
- Graduate School of Life Sciences (5) (remove)
Sonstige beteiligte Institutionen
Among the defense strategies developed in microbes over millions of years, the innate adaptive CRISPR-Cas immune systems have spread across most of bacteria and archaea. The flexibility, simplicity, and specificity of CRISPR-Cas systems have laid the foundation for CRISPR-based genetic tools. Yet, the efficient administration of CRISPR-based tools demands rational designs to maximize the on-target efficiency and off-target specificity. Specifically, the selection of guide RNAs (gRNAs), which play a crucial role in the target recognition of CRISPR-Cas systems, is non-trivial. Despite the fact that the emerging machine learning techniques provide a solution to aid in gRNA design with prediction algorithms, design rules for many CRISPR-Cas systems are ill-defined, hindering their broader applications.
CRISPR interference (CRISPRi), an alternative gene silencing technique using a catalytically dead Cas protein to interfere with transcription, is a leading technique in bacteria for functional interrogation, pathway manipulation, and genome-wide screens. Although the application is promising, it also is hindered by under-investigated design rules. Therefore, in this work, I develop a state-of-art predictive machine learning model for guide silencing efficiency in bacteria leveraging the advantages of feature engineering, data integration, interpretable AI, and automated machine learning. I first systematically investigate the influential factors that attribute to the extent of depletion in multiple CRISPRi genome-wide essentiality screens in Escherichia coli and demonstrate the surprising dominant contribution of gene-specific effects, such as gene expression level. These observations allowed me to segregate the confounding gene-specific effects using a mixed-effect random forest (MERF) model to provide a better estimate of guide efficiency, together with the improvement led by integrating multiple screens. The MERF model outperformed existing tools in an independent high-throughput saturating screen. I next interpret the predictive model to extract the design rules for robust gene silencing, such as the preference for cytosine and disfavoring for guanine and thymine within and around the protospacer adjacent motif (PAM) sequence. I further incorporated the MERF model in a web-based tool that is freely accessible at www.ciao.helmholtz-hiri.de.
When comparing the MERF model with existing tools, the performance of the alternative gRNA design tool optimized for CRISPRi in eukaryotes when applied to bacteria was far from satisfying, questioning the robustness of prediction algorithms across organisms. In addition, the CRISPR-Cas systems exhibit diverse mechanisms albeit with some similarities. The captured predictive patterns from one dataset thereby are at risk of poor generalization when applied across organisms and CRISPR-Cas techniques. To fill the gap, the machine learning approach I present here for CRISPRi could serve as a blueprint for the effective development of prediction algorithms for specific organisms or CRISPR-Cas systems of interest. The explicit workflow includes three principle steps: 1) accommodating the feature set for the CRISPR-Cas system or technique; 2) optimizing a machine learning model using automated machine learning; 3) explaining the model using interpretable AI. To illustrate the applicability of the workflow and diversity of results when applied across different bacteria and CRISPR-Cas systems, I have applied this workflow to analyze three distinct CRISPR-Cas genome-wide screens. From the CRISPR base editor essentiality screen in E. coli, I have determined the PAM preference and sequence context in the editing window for efficient editing, such as A at the 2nd position of PAM, A/TT/TG downstream of PAM, and TC at the 4th to 5th position of gRNAs. From the CRISPR-Cas13a screen in E. coli, in addition to the strong correlation with the guide depletion, the target expression level is the strongest predictor in the model, supporting it as a main determinant of the activation of Cas13-induced immunity and better characterizing the CRISPR-Cas13 system. From the CRISPR-Cas12a screen in Klebsiella pneumoniae, I have extracted the design rules for robust antimicrobial activity across K. pneumoniae strains and provided a predictive algorithm for gRNA design, facilitating CRISPR-Cas12a as an alternative technique to tackle antibiotic resistance.
Overall, this thesis presents an accurate prediction algorithm for CRISPRi guide efficiency in bacteria, providing insights into the determinants of efficient silencing and guide designs. The systematic exploration has led to a robust machine learning approach for effective model development in other bacteria and CRISPR-Cas systems. Applying the approach in the analysis of independent CRISPR-Cas screens not only sheds light on the design rules but also the mechanisms of the CRISPR-Cas systems. Together, I demonstrate that applied machine learning paves the way to a deeper understanding and a broader application of CRISPR-Cas systems.
Introduction.
Mobile health (mHealth) integrates mobile devices into healthcare, enabling remote monitoring, data collection, and personalized interventions. Machine Learning (ML), a subfield of Artificial Intelligence (AI), can use mHealth data to confirm or extend domain knowledge by finding associations within the data, i.e., with the goal of improving healthcare decisions. In this work, two data collection techniques were used for mHealth data fed into ML systems: Mobile Crowdsensing (MCS), which is a collaborative data gathering approach, and Ecological Momentary Assessments (EMA), which capture real-time individual experiences within the individual’s common environments using questionnaires and sensors. We collected EMA and MCS data on tinnitus and COVID-19. About 15 % of the world’s population suffers from tinnitus.
Materials & Methods.
This thesis investigates the challenges of ML systems when using MCS and EMA data. It asks: How can ML confirm or broad domain knowledge? Domain knowledge refers to expertise and understanding in a specific field, gained through experience and education. Are ML systems always superior to simple heuristics and if yes, how can one reach explainable AI (XAI) in the presence of mHealth data? An XAI method enables a human to understand why a model makes certain predictions. Finally, which guidelines can be beneficial for the use of ML within the mHealth domain? In tinnitus research, ML discerns gender, temperature, and season-related variations among patients. In the realm of COVID-19, we collaboratively designed a COVID-19 check app for public education, incorporating EMA data to offer informative feedback on COVID-19-related matters. This thesis uses seven EMA datasets with more than 250,000 assessments. Our analyses revealed a set of challenges: App user over-representation, time gaps, identity ambiguity, and operating system specific rounding errors, among others. Our systematic review of 450 medical studies assessed prior utilization of XAI methods.
Results.
ML models predict gender and tinnitus perception, validating gender-linked tinnitus disparities. Using season and temperature to predict tinnitus shows the association of these variables with tinnitus. Multiple assessments of one app user can constitute a group. Neglecting these groups in data sets leads to model overfitting. In select instances, heuristics outperform ML models, highlighting the need for domain expert consultation to unveil hidden groups or find simple heuristics.
Conclusion.
This thesis suggests guidelines for mHealth related data analyses and improves estimates for ML performance. Close communication with medical domain experts to identify latent user subsets and incremental benefits of ML is essential.
Biofabrication technologies must address numerous parameters and conditions to reconstruct tissue complexity in vitro. A critical challenge is vascularization, especially for large constructs exceeding diffusion limits. This requires the creation of artificial vascular structures, a task demanding the convergence and integration of multiple engineering approaches. This doctoral dissertation aims to achieve two primary objectives: firstly, to implement and refine engineering methods for creating artificial microvascular structures using Melt Electrowriting (MEW)-assisted sacrificial templating, and secondly, to deepen the understanding of the critical factors influencing the printability of bioink formulations in 3D extrusion bioprinting.
In the first part of this dissertation, two innovative sacrificial templating techniques using MEW are explored. Utilizing a carbohydrate glass as a fugitive material, a pioneering advancement in the processing of sugars with MEW with a resolution under 100 microns was made. Furthermore, by introducing the “print-and-fuse” strategy as a groundbreaking method, biomimetic branching microchannels embedded in hydrogel matrices were fabricated, which can then be endothelialized to mirror in vivo vascular conditions.
The second part of the dissertation explores extrusion bioprinting. By introducing a simple binary bioink formulation, the correlation between physical properties and printability was showcased. In the next step, employing state-of-the-art machine-learning approaches revealed a deeper understanding of the correlations between bioink properties and printability in an extended library of hydrogel formulations.
This dissertation offers in-depth insights into two key biofabrication technologies. Future work could merge these into hybrid methods for the fabrication of vascularized constructs, combining MEW's precision with fine-tuned bioink properties in automated extrusion bioprinting.
Acceleration is a central aim of clinical and technical research in magnetic resonance imaging (MRI) today, with the potential to increase robustness, accessibility and patient comfort, reduce cost, and enable entirely new kinds of examinations. A key component in this endeavor is image reconstruction, as most modern approaches build on advanced signal and image processing. Here, deep learning (DL)-based methods have recently shown considerable potential, with numerous publications demonstrating benefits for MRI reconstruction. However, these methods often come at the cost of an increased risk for subtle yet critical errors. Therefore, the aim of this thesis is to advance DL-based MRI reconstruction, while ensuring high quality and fidelity with measured data. A network architecture specifically suited for this purpose is the variational network (VN). To investigate the benefits these can bring to non-Cartesian cardiac imaging, the first part presents an application of VNs, which were specifically adapted to the reconstruction of accelerated spiral acquisitions. The proposed method is compared to a segmented exam, a U-Net and a compressed sensing (CS) model using qualitative and quantitative measures. While the U-Net performed poorly, the VN as well as the CS reconstruction showed good output quality. In functional cardiac imaging, the proposed real-time method with VN reconstruction substantially accelerates examinations over the gold-standard, from over 10 to just 1 minute. Clinical parameters agreed on average.
Generally in MRI reconstruction, the assessment of image quality is complex, in particular for modern non-linear methods. Therefore, advanced techniques for precise evaluation of quality were subsequently demonstrated.
With two distinct methods, resolution and amplification or suppression of noise are quantified locally in each pixel of a reconstruction. Using these, local maps of resolution and noise in parallel imaging (GRAPPA), CS, U-Net and VN reconstructions were determined for MR images of the brain. In the tested images, GRAPPA delivers uniform and ideal resolution, but amplifies noise noticeably. The other methods adapt their behavior to image structure, where different levels of local blurring were observed at edges compared to homogeneous areas, and noise was suppressed except at edges. Overall, VNs were found to combine a number of advantageous properties, including a good trade-off between resolution and noise, fast reconstruction times, and high overall image quality and fidelity of the produced output. Therefore, this network architecture seems highly promising for MRI reconstruction.
Machine-Learning-Based Identification of Tumor Entities, Tumor Subgroups, and Therapy Options
(2023)
Molecular genetic analyses, such as mutation analyses, are becoming increasingly important in the tumor field, especially in the context of therapy stratification. The identification of the underlying tumor entity is crucial, but can sometimes be difficult, for example in the case of metastases or the so-called Cancer of Unknown Primary (CUP) syndrome. In recent years, methylome and transcriptome utilizing machine learning (ML) approaches have been developed to enable fast and reliable tumor and tumor subtype identification. However, so far only methylome analysis have become widely used in routine diagnostics.
The present work addresses the utility of publicly available RNA-sequencing data to determine the underlying tumor entity, possible subgroups, and potential therapy options. Identification of these by ML - in particular random forest (RF) models - was the first task. The results with test accuracies of up to 99% provided new, previously unknown insights into the trained models and the corresponding entity prediction. Reducing the input data to the top 100 mRNA transcripts resulted in a minimal loss of prediction quality and could potentially enable application in clinical or real-world settings.
By introducing the ratios of these top 100 genes to each other as a new database for RF models, a novel method was developed enabling the use of trained RF models on data from other sources.
Further analysis of the transcriptomic differences of metastatic samples by visual clustering showed that there were no differences specific for the site of metastasis. Similarly, no distinct clusters were detectable when investigating primary tumors and metastases of cutaneous skin melanoma (SKCM).
Subsequently, more than half of the validation datasets had a prediction accuracy of at least 80%, with many datasets even achieving a prediction accuracy of – or close to – 100%.
To investigate the applicability of the used methods for subgroup identification, the TCGA-KIPAN dataset, consisting of the three major kidney cancer subgroups, was used. The results revealed a new, previously unknown subgroup consisting of all histopathological groups with clinically relevant characteristics, such as significantly different survival. Based on significant differences in gene expression, potential therapeutic options of the identified subgroup could be proposed.
Concludingly, in exploring the potential applicability of RNA-sequencing data as a basis for therapy prediction, it was shown that this type of data is suitable to predict entities as well as subgroups with high accuracy. Clinical relevance was also demonstrated for a novel subgroup in renal cell carcinoma. The reduction of the number of genes required for entity prediction to 100 genes, enables panel sequencing and thus demonstrates potential applicability in a real-life setting.