Refine
Has Fulltext
- yes (4)
Is part of the Bibliography
- yes (4)
Document Type
- Journal article (3)
- Doctoral Thesis (1)
Language
- English (4)
Keywords
- algorithm (2)
- Bioinformatik (1)
- Distributed computing (1)
- Humangenetik (1)
- LGMDR5 (1)
- OMICS (1)
- User interfaces (1)
- Verteiltes Datenbanksystem (1)
- alignment (1)
- bioinformatics (1)
Institute
Sonstige beteiligte Institutionen
Over recent years next generation sequencing (NGS) technologies evolved from costly tools used by very few, to a much more accessible and economically viable technology. Through this recently gained popularity, its use-cases expanded from research environments into clinical settings. But the technical know-how and infrastructure required to analyze the data remain an obstacle for a wider adoption of this technology, especially in smaller laboratories. We present GensearchNGS, a commercial DNAseq software suite distributed by Phenosystems SA. The focus of GensearchNGS is the optimal usage of already existing infrastructure, while keeping its use simple. This is achieved through the integration of existing tools in a comprehensive software environment, as well as custom algorithms developed with the restrictions of limited infrastructures in mind. This includes the possibility to connect multiple computers to speed up computing intensive parts of the analysis such as sequence alignments. We present a typical DNAseq workflow for NGS data analysis and the approach GensearchNGS takes to implement it. The presented workflow goes from raw data quality control to the final variant report. This includes features such as gene panels and the integration of online databases, like Ensembl for annotations or Cafe Variome for variant sharing.
Lung cancer is currently the leading cause of cancer related mortality due to late diagnosis and limited treatment intervention. Non-coding RNAs are not translated into proteins and have emerged as fundamental regulators of gene expression. Recent studies reported that microRNAs and long non-coding RNAs are involved in lung cancer development and progression. Moreover, they appear as new promising non-invasive biomarkers for early lung cancer diagnosis. Here, we highlight their potential as biomarker in lung cancer and present how bioinformatics can contribute to the development of non-invasive diagnostic tools. For this, we discuss several bioinformatics algorithms and software tools for a comprehensive understanding and functional characterization of microRNAs and long non-coding RNAs.
The field of genetics faces a lot of challenges and opportunities in both research and diagnostics due to the rise of next generation sequencing (NGS), a technology that allows to sequence DNA increasingly fast and cheap.
NGS is not only used to analyze DNA, but also RNA, which is a very similar molecule also present in the cell, in both cases producing large amounts of data.
The big amount of data raises both infrastructure and usability problems, as powerful computing infrastructures are required and there are many manual steps in the data analysis which are complicated to execute.
Both of those problems limit the use of NGS in the clinic and research, by producing a bottleneck both computationally and in terms of manpower, as for many analyses geneticists lack the required computing skills.
Over the course of this thesis we investigated how computer science can help to improve this situation to reduce the complexity of this type of analysis.
We looked at how to make the analysis more accessible to increase the number of people that can perform OMICS data analysis (OMICS groups various genomics data-sources).
To approach this problem, we developed a graphical NGS data analysis pipeline aimed at a diagnostics environment while still being useful in research in close collaboration with the Human Genetics Department at the University of Würzburg.
The pipeline has been used in various research papers on covering subjects, including works with direct author participation in genomics, transcriptomics as well as epigenomics.
To further validate the graphical pipeline, a user survey was carried out which confirmed that it lowers the complexity of OMICS data analysis.
We also studied how the data analysis can be improved in terms of computing infrastructure by improving the performance of certain analysis steps.
We did this both in terms of speed improvements on a single computer (with notably variant calling being faster by up to 18 times), as well as with distributed computing to better use an existing infrastructure.
The improvements were integrated into the previously described graphical pipeline, which itself also was focused on low resource usage.
As a major contribution and to help with future development of parallel and distributed applications, for the usage in genetics or otherwise, we also looked at how to make it easier to develop such applications.
Based on the parallel object programming model (POP), we created a Java language extension called POP-Java, which allows for easy and transparent distribution of objects.
Through this development, we brought the POP model to the cloud, Hadoop clusters and present a new collaborative distributed computing model called FriendComputing.
The advances made in the different domains of this thesis have been published in various works specified in this document.
New techniques in molecular genetic diagnostics now allow for accurate diagnosis in a large proportion of patients with muscular diseases. Nevertheless, many patients remain unsolved, although the clinical history and/or the muscle biopsy give a clear indication of the involved genes. In many cases, there is a strong suspicion that the cause must lie in unexplored gene areas, such as deep-intronic or other non-coding regions. In order to find these changes, next-generation sequencing (NGS) methods are constantly evolving, making it possible to sequence entire genomes to reveal these previously uninvestigated regions. Here, we present a young woman who was strongly suspected of having a so far genetically unsolved sarcoglycanopathy based on her clinical history and muscle biopsy. Using short read whole genome sequencing (WGS), a homozygous inversion on chromosome 13 involving SGCG and LINC00621 was detected. The breakpoint in intron 2 of SGCG led to the absence of γ-sarcoglycan, resulting in the manifestation of autosomal recessive limb-girdle muscular dystrophy 5 (LGMDR5) in the young woman.