TY  - JOUR
A1  - Wolf, Beat
A1  - Kuonen, Pierre
A1  - Dandekar, Thomas
A1  - Atlan, David
T1  - DNAseq workflow in a diagnostic context and an example of a user friendly implementation
JF  - BioMed Research International
N2  - Over recent years next generation sequencing (NGS) technologies evolved from costly tools used by very few, to a much more accessible and economically viable technology. Through this recently gained popularity, its use-cases expanded from research environments into clinical settings. But the technical know-how and infrastructure required to analyze the data remain an obstacle for a wider adoption of this technology, especially in smaller laboratories. We present GensearchNGS, a commercial DNAseq software suite distributed by Phenosystems SA. The focus of GensearchNGS is the optimal usage of already existing infrastructure, while keeping its use simple. This is achieved through the integration of existing tools in a comprehensive software environment, as well as custom algorithms developed with the restrictions of limited infrastructures in mind. This includes the possibility to connect multiple computers to speed up computing intensive parts of the analysis such as sequence alignments. We present a typical DNAseq workflow for NGS data analysis and the approach GensearchNGS takes to implement it. The presented workflow goes from raw data quality control to the final variant report. This includes features such as gene panels and the integration of online databases, like Ensembl for annotations or Cafe Variome for variant sharing.
KW  - next generation sequencing
KW  - genome browser
KW  - mutation
KW  - algorithm
KW  - database
KW  - format
KW  - discovery
KW  - exome
KW  - variants
KW  - alignment
Y1  - 2015
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-144527
IS  - 403497
ER  - 
TY  - JOUR
A1  - Kunz, Meik
A1  - Wolf, Beat
A1  - Schulze, Harald
A1  - Atlan, David
A1  - Walles, Thorsten
A1  - Walles, Heike
A1  - Dandekar, Thomas
T1  - Non-Coding RNAs in Lung Cancer: Contribution of Bioinformatics Analysis to the Development of Non-Invasive Diagnostic Tools
JF  - Genes
N2  - Lung cancer is currently the leading cause of cancer related mortality due to late diagnosis and limited treatment intervention. Non-coding RNAs are not translated into proteins and have emerged as fundamental regulators of gene expression. Recent studies reported that microRNAs and long non-coding RNAs are involved in lung cancer development and progression. Moreover, they appear as new promising non-invasive biomarkers for early lung cancer diagnosis. Here, we highlight their potential as biomarker in lung cancer and present how bioinformatics can contribute to the development of non-invasive diagnostic tools. For this, we discuss several bioinformatics algorithms and software tools for a comprehensive understanding and functional characterization of microRNAs and long non-coding RNAs.
KW  - lung cancer
KW  - non-invasive biomarkers
KW  - miRNAs
KW  - lncRNAs
KW  - bioinformatics
KW  - early diagnosis
KW  - algorithm
Y1  - 2016
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-147990
VL  - 8
IS  - 1
ER  - 
TY  - THES
A1  - Wolf, Beat
T1  - Reducing the complexity of OMICS data analysis
T1  - Verringerung der Komplexität von OMICS Datenanalysen
N2  - The field of genetics faces a lot of challenges and opportunities in both research and diagnostics due to the rise of next generation sequencing (NGS), a technology that allows to sequence DNA increasingly fast and cheap.
NGS is not only used to analyze DNA, but also RNA, which is a very similar molecule also present in the cell, in both cases producing large amounts of data.
The big amount of data raises both infrastructure and usability problems, as powerful computing infrastructures are required and there are many manual steps in the data analysis which are complicated to execute.
Both of those problems limit the use of NGS in the clinic and research, by producing a bottleneck both computationally and in terms of manpower, as for many analyses geneticists lack the required computing skills.
Over the course of this thesis we investigated how computer science can help to improve this situation to reduce the complexity of this type of analysis.
We looked at how to make the analysis more accessible to increase the number of people that can perform OMICS data analysis (OMICS groups various genomics data-sources).
To approach this problem, we developed a graphical NGS data analysis pipeline aimed at a diagnostics environment while still being useful in research in close collaboration with the Human Genetics Department at the University of Würzburg.
The pipeline has been used in various research papers on covering subjects, including works with direct author participation in genomics, transcriptomics as well as epigenomics.
To further validate the graphical pipeline, a user survey was carried out which confirmed that it lowers the complexity of OMICS data analysis.

We also studied how the data analysis can be improved in terms of computing infrastructure by improving the performance of certain analysis steps.
We did this both in terms of speed improvements on a single computer (with notably variant calling being faster by up to 18 times), as well as with distributed computing to better use an existing infrastructure.
The improvements were integrated into the previously described graphical pipeline, which itself also was focused on low resource usage.

As a major contribution and to help with future development of parallel and distributed applications, for the usage in genetics or otherwise, we also looked at how to make it easier to develop such applications.
Based on the parallel object programming model (POP), we created a Java language extension called POP-Java, which allows for easy and transparent distribution of objects.
Through this development, we brought the POP model to the cloud, Hadoop clusters and present a new collaborative distributed computing model called FriendComputing.

The advances made in the different domains of this thesis have been published in various works specified in this document.
N2  - Das Gebiet der Genetik steht vor vielen Herausforderungen, sowohl in der Forschung als auch Diagnostik, aufgrund des "next generation sequencing" (NGS), eine Technologie die DNA immer schneller und billiger sequenziert.
NGS wird nicht nur verwendet um DNA zu analysieren sondern auch RNA, ein der DNA sehr ähnliches Molekül, wobei in beiden Fällen große Datenmengen zu erzeugt werden.
Durch die große Menge an Daten entstehen Infrastruktur und Benutzbarkeitsprobleme, da leistungsstarke Computerinfrastrukturen erforderlich sind, und es viele manuelle Schritte in der Datenanalyse gibt die kompliziert auszuführen sind.
Diese beiden Probleme begrenzen die Verwendung von NGS in der Klinik und Forschung, da es einen Engpass sowohl im Bereich der Rechnerleistung als auch beim Personal gibt, da für viele Analysen Genetikern die erforderlichen Computerkenntnisse fehlen.

In dieser Arbeit haben wir untersucht wie die Informatik helfen kann diese Situation zu verbessern indem die Komplexität dieser Art von Analyse reduziert wird.
Wir haben angeschaut, wie die Analyse zugänglicher gemacht werden kann um die Anzahl Personen zu erhöhen, die OMICS (OMICS gruppiert verschiedene Genetische Datenquellen) Datenanalysen durchführen können.
In enger Zusammenarbeit mit dem Institut für Humangenetik der Universität Würzburg wurde eine graphische NGS Datenanalysen Pipeline erstellt um diese Frage zu erläutern.
Die graphische Pipeline wurde für den Diagnostikbereich entwickelt ohne aber die Forschung aus dem Auge zu lassen.
Darum warum die Pipeline in verschiedenen Forschungsgebieten verwendet, darunter mit direkter Autorenteilname Publikationen in der Genomik, Transkriptomik und Epigenomik,
Die Pipeline wurde auch durch eine Benutzerumfrage validiert, welche bestätigt, dass unsere graphische Pipeline die Komplexität der OMICS Datenanalyse reduziert.

Wir haben auch untersucht wie die Leistung der Datenanalyse verbessert werden kann, damit die nötige Infrastruktur zugänglicher wird.
Das wurde sowohl durch das optimieren der verfügbaren Methoden (wo z.B. die Variantenanalyse bis zu 18 mal schneller wurde) als auch mit verteiltem Rechnen angegangen, um eine bestehende Infrastruktur besser zu verwenden.
Die Verbesserungen wurden in der zuvor beschriebenen graphischen Pipeline integriert, wobei generell die geringe Ressourcenverbrauch ein Fokus war.

Um die künftige Entwicklung von parallelen und verteilten Anwendung zu unterstützen, ob in der Genetik oder anderswo, haben wir geschaut, wie man es einfacher machen könnte solche Applikationen zu entwickeln.

Dies führte zu einem wichtigen informatischen Result, in dem wir, basierend auf dem Model von „parallel object programming“ (POP), eine Erweiterung der Java-Sprache namens POP-Java entwickelt haben, die eine einfache und transparente Verteilung von Objekten ermöglicht.
Durch diese Entwicklung brachten wir das POP-Modell in die Cloud, Hadoop-Cluster und präsentieren ein neues Model für ein verteiltes kollaboratives rechnen, FriendComputing genannt.

Die verschiedenen veröffentlichten Teile dieser Dissertation werden speziel aufgelistet und diskutiert.
KW  - Bioinformatik
KW  - Humangenetik
KW  - OMICS
KW  - Distributed computing
KW  - User interfaces
KW  - Verteiltes Datenbanksystem
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-153687
ER  - 
TY  - JOUR
A1  - Pluta, Natalie
A1  - Hoffjan, Sabine
A1  - Zimmer, Frederic
A1  - Köhler, Cornelia
A1  - Lücke, Thomas
A1  - Mohr, Jennifer
A1  - Vorgerd, Matthias
A1  - Nguyen, Hoa Huu Phuc
A1  - Atlan, David
A1  - Wolf, Beat
A1  - Zaum, Ann-Kathrin
A1  - Rost, Simone
T1  - Homozygous inversion on chromosome 13 involving SGCG detected by short read whole genome sequencing in a patient suffering from limb-girdle muscular dystrophy
JF  - Genes
N2  - New techniques in molecular genetic diagnostics now allow for accurate diagnosis in a large proportion of patients with muscular diseases. Nevertheless, many patients remain unsolved, although the clinical history and/or the muscle biopsy give a clear indication of the involved genes. In many cases, there is a strong suspicion that the cause must lie in unexplored gene areas, such as deep-intronic or other non-coding regions. In order to find these changes, next-generation sequencing (NGS) methods are constantly evolving, making it possible to sequence entire genomes to reveal these previously uninvestigated regions. Here, we present a young woman who was strongly suspected of having a so far genetically unsolved sarcoglycanopathy based on her clinical history and muscle biopsy. Using short read whole genome sequencing (WGS), a homozygous inversion on chromosome 13 involving SGCG and LINC00621 was detected. The breakpoint in intron 2 of SGCG led to the absence of γ-sarcoglycan, resulting in the manifestation of autosomal recessive limb-girdle muscular dystrophy 5 (LGMDR5) in the young woman.
KW  - inversion
KW  - sarcoglycanopathy
KW  - whole genome sequencing (WGS)
KW  - next generation sequencing (NGS)
KW  - LGMDR5
KW  - muscle disease
KW  - genetic diagnostics
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-288122
SN  - 2073-4425
VL  - 13
IS  - 10
ER  -