Refine
Has Fulltext
- yes (3)
Is part of the Bibliography
- yes (3)
Document Type
- Doctoral Thesis (3)
Language
- English (3) (remove)
Keywords
- Datenanalyse (3) (remove)
Institute
The fusion of methods from several disciplines is a crucial component of scientific development. Artificial Neural Networks, based on the principle of biological neuronal networks, demonstrate how nature provides the best templates for technological advancement. These innovations can then be employed to solve the remaining mysteries of biology, including, in particular, processes that take place on microscopic scales and can only be studied with sophisticated techniques. For instance, direct Stochastic Optical Reconstruction Microscopy combines tools from chemistry, physics, and computer science to visualize biological processes at the molecular level. One of the key components is the computer-aided reconstruction of super-resolved images. Improving the corresponding algorithms increases the quality of the generated data, providing further insights into our biology. It is important, however, to ensure that the heavily processed images are still a reflection of reality and do not originate in random artefacts.
Expansion microscopy is expanding the sample by embedding it in a swellable hydrogel. The method can be combined with other super-resolution techniques to gain additional resolution. We tested this approach on microtubules, a well-known filamentous reference structure, to evaluate the performance of different protocols and labelling techniques.
We developed LineProfiler an objective tool for data collection. Instead of collecting perpendicular profiles in small areas, the software gathers line profiles from filamentous structures of the entire image. This improves data quantity, quality and prevents a biased choice of the evaluated regions. On the basis of the collected data, we deployed theoretical models of the expected intensity distribution across the filaments. This led to the conclusion that post-expansion labelling significantly reduces the labelling error and thus, improves the data quality. The software was further used to determine the expansion factor and arrangement of synaptonemal complex data.
Automated Simple Elastix uses state-of-the-art image alignment to compare pre- and post-expansion images. It corrects linear distortions occurring under isotropic expansion, calculates a structural expansion factor and highlights structural mismatches in a distortion map. We used the software to evaluate expanded fungi and NK cells. We found that the expansion factor differs for the two structures and is lower than the overall expansion of the hydrogel.
Assessing the fluorescence lifetime of emitters used for direct Stochastic Optical Reconstruction Microscopy can reveal additional information about the molecular environment or distinguish dyes emitting with a similar wavelength. The corresponding measurements require a confocal scanning of the sample in combination with the fluorescent switching of the underlying emitters. This leads to non-linear, interrupted Point Spread Functions. The software ReCSAI targets this problem by combining the classical algorithm of compressed sensing with modern methods of artificial intelligence. We evaluated several different approaches to combine these components and found, that unrolling compressed sensing into the network architecture yields the best performance in terms of reconstruction speed and accuracy.
In addition to a deep insight into the functioning and learning of artificial intelligence in combination with classical algorithms, we were able to reconstruct the described non-linearities with significantly improved resolution, in comparison to other state-of-the-art architectures.
Small-angle X-ray scattering (SAXS) is a universal low-resolution method to study proteins in solution and to analyze structural changes in response to variations of conditions (pH, temperature, ionic strength etc). SAXS is hardly limited by the particle size, being applicable to the smallest proteins and to huge macromolecular machines like ribosomes and viruses. SAXS experiments are usually fast and require a moderate amount of purified material. Traditionally, SAXS is employed to study the size and shape of globular proteins, but recent developments have made it possible to quantitatively characterize the structure and structural transitions of metastable systems, e.g. partially or completely unfolded proteins. In the absence of complementary information, low-resolution macromolecular shapes can be reconstructed ab initio and overall characteristics of the systems can be extracted. If a high or low-resolution structure or a predicted model is available, it can be validated against the experimental SAXS data. If the measured sample is polydisperse, the oligomeric state and/or oligomeric composition in solution can be determined. One of the most important approaches for macromolecular complexes is a combined ab initio/rigid body modeling, when the structures (either complete or partial) of individual subunits are available and SAXS data is employed to build the entire complex. Moreover, this method can be effectively combined with information from other structural, computational and biochemical methods. All the above approaches are covered in a comprehensive program suite ATSAS for SAXS data analysis, which has been developed at the EMBL-Hamburg. In order to meet the growing demands of the structural biology community, methods for SAXS data analysis must be further developed. This thesis describes the development of two new modules, RANLOGS and EM2DAM, which became part of ATSAS suite. The former program can be employed for constructing libraries of linkers and loops de novo and became a part of a combined ab initio/rigid body modeling program CORAL. EM2DAM can be employed to convert electron microscopy maps to bead models, which can be used for modeling or structure validation. Moreover, the programs CRYSOL and CRYSON, for computing X-ray and neutron scattering patterns from atomic models, respectively, were refurbished to work faster and new options were added to them. Two programs, to be contributed to future releases of the ATSAS package, were also developed. The first program generates a large pool of possible models using rigid body modeling program SASREF, selects and refines models with lowest discrepancy to experimental SAXS data using a docking program HADDOCK. The second program refines binary protein-protein complexes using the SAXS data and the high-resolution models of unbound subunits. Some results and conclusions from this work are presented here. The developed approaches detailed in this thesis, together with existing ATSAS modules were additionally employed in a number of collaborative projects. New insights into the “structural memory” of natively unfolded tau protein were gained and supramodular structure of RhoA-specific guanidine nucleotide exchange factor was reconstructed. Moreover, high resolution structures of several hematopoietic cytokine-receptor complexes were validated and re-modeled using the SAXS data. Important information about the oligomeric state of yeast frataxin in solution was derived from the scattering patterns recorded under different conditions and its flexibility was quantitatively characterized using the Ensemble Optimization Method (EOM).
In this thesis, the development of a phylogenetic DNA microarray, the analysis of several gene expression microarray datasets and new approaches for improved data analysis and interpretation are described. In the first publication, the development and analysis of a phylogenetic microarray is presented. I could show that species detection with phylogenetic DNA microarrays can be significantly improved when the microarray data is analyzed with a linear regression modeling approach. Standard methods have so far relied on pure signal intensities of the array spots and a simple cutoff criterion was applied to call a species present or absent. This procedure is not applicable to very closely related species with high sequence similarity because cross-hybridization of non-target DNA renders species detection impossible based on signal intensities alone. By modeling hybridization and cross-hybridization with linear regression, as I have presented in this thesis, even species with a sequence similarity of 97% in the marker gene can be detected and distinguished from related species. Another advantage of the modeling approach over existing methods is that the model also performs well on mixtures of different species. In principle, also quantitative predictions can be made. To make better use of the large amounts of microarray data stored in public databases, meta-analysis approaches need to be developed. In the second publication, an explorative meta-analysis exemplified on Arabidopsis thaliana gene expression datasets is presented. Integrating datasets studying effects such as the influence of plant hormones, pathogens and different mutations on gene expression levels, clusters of similarly treated datasets could be found. From the clusters of pathogen-treated and indole-3-acetic acid (IAA) treated datasets, representative genes were selected which pointed to functions which had been associated with pathogen attack or IAA effects previously. Additionally, hypotheses about the functions of so far uncharacterized genes could be set up. Thus, this kind of meta-analysis could be used to propose gene functions and their regulation under different conditions. In this work, also primary data analysis of Arabidopsis thaliana datasets is presented. In the third publication, an experiment which was conducted to find out if microwave irradiation has an effect on the gene expression of a plant cell culture is described. During the first steps, the data analysis was carried out blinded and exploratory analysis methods were applied to find out if the irradiation had an effect on gene expression of plant cells. Small but statistically significant changes in a few genes were found and could be experimentally confirmed. From the functions of the regulated genes and a meta-analysis with publicly available microarray data, it could be suspected that the plant cell culture somehow perceived the irradiation as energy, similar to perceiving light rays. The fourth publication describes the functional analysis of another Arabidopsis thaliana gene expression dataset. The gene expression data of the plant tumor dataset pointed to a switch from a mainly aerobic, auxotrophic to an anaerobic and heterotrophic metabolism in the plant tumor. Genes involved in photosynthesis were found to be repressed in tumors; genes of amino acid and lipid metabolism, cell wall and solute transporters were regulated in a way that sustains tumor growth and development. Furthermore, in the fifth publication, GEPAT (Genome Expression Pathway Analysis Tool), a tool for the analysis and integration of microarray data with other data types, is described. It consists of a web application and database which allows comfortable data upload and data analysis. In later chapters of this thesis (publication 6 and publication 7), GEPAT is used to analyze human microarray datasets and to integrate results from gene expression analysis with other datatypes. Gene expression and comparative genomic hybridization data from 71 Mantle Cell Lymphoma (MCL) patients was analyzed and allowed proposing a seven gene predictor which facilitates survival predictions for patients compared to existing predictors. In this study, it was shown that CGH data can be used for survival predictions. For the dataset of Diffuse Large B-cell lymphoma (DLBCL) patients, an improved survival predictor could be found based on the gene expression data. From the genes differentially expressed between long and short surviving MCL patients as well as for regulated genes of DLBCL patients, interaction networks could be set up. They point to differences in regulation for cell cycle and proliferation genes between patients with good and bad prognosis.