Refine
Has Fulltext
- yes (21)
Is part of the Bibliography
- yes (21) (remove)
Year of publication
Document Type
- Doctoral Thesis (18)
- Journal article (2)
- Master Thesis (1)
Keywords
- Bioinformatik (21) (remove)
Institute
- Theodor-Boveri-Institut für Biowissenschaften (21) (remove)
Sonstige beteiligte Institutionen
The human gut is home for thousands of microbes that are important for human life. As most of these cannot be cultivated, metagenomics is an important means to understand this important community. To perform comparative metagenomic analysis of the human gut microbiome, I have developed SMASH (Simple metagenomic analysis shell), a computational pipeline. SMASH can also be used to assemble and analyze single genomes, and has been successfully applied to the bacterium Mycoplasma pneumoniae and the fungus Chaetomium thermophilum. In the context of the MetaHIT (Metagenomics of the human intestinal tract) consortium our group is participating in, I used SMASH to validate the assembly and to estimate the assembly error rate of 576.7 Gb metagenome sequence obtained using Illumina Solexa technology from fecal DNA of 124 European individuals. I also estimated the completeness of the gene catalogue containing 3.3 million open reading frames obtained from these metagenomes. Finally, I used SMASH to analyze human gut metagenomes of 39 individuals from 6 countries encompassing a wide range of host properties such as age, body mass index and disease states. We find that the variation in the gut microbiome is not continuous but stratified into enterotypes. Enterotypes are complex host-microbial symbiotic states that are not explained by host properties, nutritional habits or possible technical biases. The concept of enterotypes might have far reaching implications, for example, to explain different responses to diet or drug intake. We also find several functional markers in the human gut microbiome that correlate with a number of host properties such as body mass index, highlighting the need for functional analysis and raising hopes for the application of microbial markers as diagnostic or even prognostic tools for microbiota-associated human disorders.
In recent years high-throughput experiments provided a vast amount of data from all areas of molecular biology, including genomics, transcriptomics, proteomics and metabolomics. Its analysis using bioinformatics methods has developed accordingly, towards a systematic approach to understand how genes and their resulting proteins give rise to biological form and function. They interact with each other and with other molecules in highly complex structures, which are explored in network biology. The in-depth knowledge of genes and proteins obtained from high-throughput experiments can be complemented by the architecture of molecular networks to gain a deeper understanding of biological processes. This thesis provides methods and statistical analyses for the integration of molecular data into biological networks and the identification of functional modules, as well as its application to distinct biological data. The integrated network approach is implemented as a software package, termed BioNet, for the statistical language R. The package includes the statistics for the integration of transcriptomic and functional data with biological networks, the scoring of nodes and edges of these networks as well as methods for subnetwork search and visualisation. The exact algorithm is extensively tested in a simulation study and outperforms existing heuristic methods for the calculation of this NP-hard problem in accuracy and robustness. The variability of the resulting solutions is assessed on perturbed data, mimicking random or biased factors that obscure the biological signal, generated for the integrated data and the network. An optimal, robust module can be calculated using a consensus approach, based on a resampling method. It summarizes optimally an ensemble of solutions in a robust consensus module with the estimated variability indicated by confidence values for the nodes and edges. The approach is subsequently applied to two gene expression data sets. The first application analyses gene expression data for acute lymphoblastic leukaemia (ALL) and differences between the subgroups with and without an oncogenic BCR/ABL gene fusion. In a second application gene expression and survival data from diffuse large B-cell lymphomas are examined. The identified modules include and extend already existing gene lists and signatures by further significant genes and their interactions. The most important novelty is that these genes are determined and visualised in the context of their interactions as a functional module and not as a list of independent and unrelated transcripts. In a third application the integrative network approach is used to trace changes in tardigrade metabolism to identify pathways responsible for their extreme resistance to environmental changes and endurance in an inactive tun state. For the first time a metabolic network approach is proposed to detect shifts in metabolic pathways, integrating transcriptome and metabolite data. Concluding, the presented integrated network approach is an adequate technique to unite high-throughput experimental data for single molecules and their intermolecular dependencies. It is flexible to apply on diverse data, ranging from gene expression changes over metabolite abundances to protein modifications in a combination with a suitable molecular network. The exact algorithm is accurate and robust in comparison to heuristic approaches and delivers an optimal, robust solution in form of a consensus module with confidence values. By the integration of diverse sources of information and a simultaneous inspection of a molecular event from different points of view, new and exhaustive insights into biological processes can be acquired.
Diese Arbeit untersucht zelluläre Netzwerke mit dem Ziel, die so gewonnenen Einsichten medizinisch beziehungsweise biotechnologisch zu nutzen. Hierzu müssen zunächst Proteindomänen und wichtige regulatorische RNA Elemente erkannt werden. Dies geschieht für regulatorische Elemente in Nukleinsäuren am Beispiel von Iron Responsive Elements (IREs) in Staphylococcus aureus, wobei sich solche Elemente in viel versprechender Nähe zu exprimierten Sequenzen finden lassen (T. Dandekar, F. Du, H. Bertram (2001) Nonlinear Analysis 47(1): 225-34). Noch bedeutsamer als Ziele zur Medikamentenentwicklung gegen Parasiten sind Domänenunterschiede in Struktur und Sequenz bei Proteinen (T. Dandekar, F. Du, H. Bertram (2001) Nonlinear Analysis 47(1): 225-34). Ihre Identifikation wird am Beispiel eines potentiellen Transportproteins in Plasmodium falciparum exemplarisch dargestellt. Anschließend wird das Zusammenwirken von regulatorischen Elementen und Domänen in Netzwerken betrachtet (einschließlich experimenteller Daten). Dies kann einerseits zu allgemeineren Schlussfolgerungen über das Netzwerkverhalten führen, andererseits für konkrete Anwendungen genutzt werden. Als Beispiel wählten wir hier Redoxnetzwerke und die Bekämpfung von Plasmodien als Verursacher der Malaria. Da das gesamte Redoxnetzwerk einer lebenden Zelle mit Methoden der pH Wert Messung nur unzureichend zu erfassen ist, werden als alternative Messmethode für dieses Netzwerk Mikrokristalle der Glutathionreduktase als Indikatorsystem nach digitaler Verstärkung experimentell genutzt (H. Bertram, M. A. Keese, C. Boulin, R. H. Schirmer, R. Pepperkok, T. Dandekar (2002) Chemical Nanotechnology Talks III - Nano for Life Sciences). Um komplexe Redoxnetzwerke auch bioinformatisch zu modulieren, werden Verfahren der metabolischen Fluxanalyse vorgestellt und verbessert, um insbesondere ihrer Verzahnung besser gerecht zu werden und solche Netzwerke mit möglichst wenig elementaren Flussmoden zutreffend beschreiben zu können. Die Reduktion der Anzahl von Elementarmoden bei sehr großen metabolischen Netzwerken einer Zelle gelingt hier mit Hilfe unterschiedlicher Methoden und führt zu einer vereinfachten Darstellungsmöglichkeit komplexer Stoffwechselwege von Metaboliten. Dabei dient bei jeder dieser Methoden die biochemisch sinnvolle Definition von externen Metaboliten als Grundlage (T. Dandekar, F. Moldenhauer, S. Bulik, H. Bertram, S. Schuster (2003) Biosystems 70(3): 255-70). Allgemeiner werden Verfahren der Proteindomänenklassifikation sowie neue Strategien gegen mikrobielle Erreger betrachtet. In Bezug auf automatisierte Einteilung von Proteinen in Domänen wird ein neues System von Taylor (2002b) mit bekannten Systemen verglichen, die in unterschiedlichem Umfang menschlichen Eingriffs bedürfen (H. Bertram, T. Dandekar (2002) Chemtracts 15: 735-9). Außerdem wurde neben einer Arbeit über die verschiedenen Methoden aus den Daten eines Genoms Informationen über das metabolische Netzwerk der Zelle zu erlangen (H. Bertram, T. Dandekar (2004) it 46(1): 5-11) auch eine Übersicht über die Schwerpunkte der Bioinformatik in Würzburg zusammengestellt (H. Bertram, S. Balthasar, T. Dandekar (2003) Bioforum 1-2: 26-7). Schließlich wird beschrieben, wie die Pathogenomik und Virulenz von Bakterien der bioinformatischen Analyse zugänglich gemacht werden können (H. Bertram, S. Balthasar, T. Dandekar (2003) Bioforum Eur. 3: 157-9). Im letzten Teil wird die metabolische Fluxanalyse zur Identifikation neuer Strategien zur Bekämpfung von Plasmodien dargestellt: Beim Vergleich der Stoffwechselwege mit Glutathion und Thioredoxin in Plasmodium falciparum, Anopheles und Mensch geht es darum, gezielte Störungen im Stoffwechsel des Malariaerregers auszulösen und dabei den Wirt zu schonen. Es ergeben sich einige interessante Ansatzpunkte, deren medizinische Nutzung experimentell angestrebt werden kann.
Background: The frequency of the most observed cancer, Non Hodgkin Lymphoma (NHL), is further rising. Diffuse large B-cell lymphoma (DLBCL) is the most common of the NHLs. There are two subgroups of DLBCL with different gene expression patterns: ABC (“Activated B-like DLBCL”) and GCB (“Germinal Center B-like DLBCL”). Without therapy the patients often die within a few months, the ABC type exhibits the more aggressive behaviour. A further B-cell lymphoma is the Mantle cell lymphoma (MCL). It is rare and shows very poor prognosis. There is no cure yet. Methods: In this project these B-cell lymphomas were examined with methods from bioinformatics, to find new characteristics or undiscovered events on the molecular level. This would improve understanding and therapy of lymphomas. For this purpose we used survival, gene expression and comparative genomic hybridization (CGH) data. In some clinical studies, you get large data sets, from which one can reveal yet unknown trends. Results (MCL): The published proliferation signature correlates directly with survival. Exploratory analyses of gene expression and CGH data of MCL samples (n=71) revealed a valid grouping according to the median of the proliferation signature values. The second axis of correspondence analysis distinguishes between good and bad prognosis. Statistical testing (moderate t-test, Wilcoxon rank-sum test) showed differences in the cell cycle and delivered a network of kinases, which are responsible for the difference between good and bad prognosis. A set of seven genes (CENPE, CDC20, HPRT1, CDC2, BIRC5, ASPM, IGF2BP3) predicted, similarly well, survival patterns as proliferation signature with 20 genes. Furthermore, some bands could be associated with prognosis in the explorative analysis (chromosome 9: 9p24, 9p23, 9p22, 9p21, 9q33 and 9q34). Results (DLBCL): New normalization of gene expression data of DLBCL patients revealed better separation of risk groups by the 2002 published signature based predictor. We could achieve, similarly well, a separation with six genes. Exploratory analysis of gene expression data could confirm the subgroups ABC and GCB. We recognized a clear difference in early and late cell cycle stages of cell cycle genes, which can separate ABC and GCB. Classical lymphoma and best separating genes form a network, which can classify and explain the ABC and GCB groups. Together with gene sets which identify ABC and GCB we get a network, which can classify and explain the ABC and GCB groups (ASB13, BCL2, BCL6, BCL7A, CCND2, COL3A1, CTGF, FN1, FOXP1, IGHM, IRF4, LMO2, LRMP, MAPK10, MME, MYBL1, NEIL1 and SH3BP5; Altogether these findings are useful for diagnosis, prognosis and therapy (cytostatic drugs).
In this work models for molecular networks consisting of ordinary differential equations are extended by terms that include the interaction of the corresponding molecular network with the environment that the molecular network is embedded in. These terms model the effects of the external stimuli on the molecular network. The usability of this extension is demonstrated with a model of a circadian clock that is extended with certain terms and reproduces data from several experiments at the same time.
Once the model including external stimuli is set up, a framework is developed in order to calculate external stimuli that have a predefined desired effect on the molecular network. For this purpose the task of finding appropriate external stimuli is formulated as a mathematical optimal control problem for which in order to solve it a lot of mathematical methods are available. Several methods are discussed and worked out in order to calculate a solution for the corresponding optimal control problem. The application of the framework to find pharmacological intervention points or effective drug combinations is pointed out and discussed. Furthermore the framework is related to existing network analysis tools and their combination for network analysis in order to find dedicated external stimuli is discussed.
The total framework is verified with biological examples by comparing the calculated results with data from literature. For this purpose platelet aggregation is investigated based on a corresponding gene regulatory network and associated receptors are detected. Furthermore a transition from one to another type of T-helper cell is analyzed in a tumor setting where missing agents are calculated to induce the corresponding switch in vitro. Next a gene regulatory network of a myocardiocyte is investigated where it is shown how the presented framework can be used to compare different treatment strategies with respect to their beneficial effects and side effects quantitatively. Moreover a constitutively activated signaling pathway, which thus causes maleficent effects, is modeled and intervention points with corresponding treatment strategies are determined that steer the gene regulatory network from a pathological expression pattern to physiological one again.
Durch das Auftreten neuer Stämme resistenter Krankheitserreger ist die Suche nach neuartigen Wirkstoffen gegen diese, sich ständig weiter ausbreitende Bedrohung, dringend notwendig. Der interdisziplinäre Sonderforschungsbereich 630 der Universität Würzburg stellt sich dieser Aufgabe, indem hier neuartige Xenobiotika synthetisiert und auf ihre Wirksamkeit getestet werden. Die hier vorgelegte Dissertation fügt sich hierbei nahtlos in die verschiedenen Fachbereiche des SFB630 ein: Sie stellt eine Schnittstelle zwischen Synthese und Analyse der Effekte der im Rahmen des SFB630 synthetisierten Isochinolinalkaloid-Derivaten. Mit den hier angewandten bioinformatischen Methoden wurden zunächst die wichtigsten Stoffwechselwege von S. epidermidis R62A, S. aureus USA300 und menschlicher Zellen in sogenannten metabolischen Netzwerkmodellen nachgestellt. Basierend auf diesen Modellen konnten Enzymaktivitäten für verschiedene Szenarien an zugesetzten Xenobiotika berechnet werden. Die hierfür benötigten Daten wurden direkt aus Genexpressionsanalysen gewonnen. Die Validierung dieser Methode erfolgte durch Metabolommessungen. Hierfür wurde S. aureus USA300 mit verschiedenen Konzentrationen von IQ-143 behandelt und gemäß dem in dieser Dissertation vorgelegten Ernteprotokoll aufgearbeitet. Die Ergebnisse hieraus lassen darauf schließen, dass IQ-143 starke Effekte auf den Komplex 1 der Atmungskette ausübt – diese Resultate decken sich mit denen der metabolischen Netzwerkanalyse. Für den Wirkstoff IQ-238 ergaben sich trotz der strukturellen Ähnlichkeiten zu IQ-143 deutlich verschiedene Wirkeffekte: Dieser Stoff verursacht einen direkten Abfall der Enzymaktivitäten in der Glykolyse. Dadurch konnte eine unspezifische Toxizität dieser Stoffe basierend auf ihrer chemischen Struktur ausgeschlossen werden. Weiterhin konnten die bereits für IQ-143 und IQ-238 auf Bakterien angewandten Methoden erfolgreich zur Modellierung der Effekte von Methylenblau auf verschiedene resistente Stämme von P. falciparum 3D7 angewandt werden. Dadurch konnte gezeigt werden, dass Methylenblau in einer Kombination mit anderen Präparaten gegen diesen Parasiten zum einen die Wirkung des Primärpräparates verstärkt, zum anderen aber auch in gewissem Maße vorhandene Resistenzen gegen das Primärpräparat zu verringern vermag. Somit konnte durch die vorgelegte Arbeit eine Pipeline zur Identifizierung der metabolischen Effekte verschiedener Wirkstoffe auf unterschiedliche Krankheitserreger erstellt werden. Diese Pipeline kann jederzeit auf andere Organismen ausgeweitet werden und stellt somit einen wichtigen Ansatz um Netzwerkeffekte verschiedener, potentieller Medikamente aufzuklären.
An essential topic for synthetic biologists is to understand the structure and function of biological processes and involved proteins and plan experiments accordingly. Remarkable progress has been made in recent years towards this goal. However, efforts to collect and present all information on processes and functions are still cumbersome. The database tool GoSynthetic provides a new, simple and fast way to analyse biological processes applying a hierarchical database. Four different search modes are implemented. Furthermore, protein interaction data, cross-links to organism-specific databases (17 organisms including six model organisms and their interactions), COG/KOG, GO and IntAct are warehoused. The built in connection to technical and engineering terms enables a simple switching between biological concepts and concepts from engineering, electronics and synthetic biology. The current version of GoSynthetic covers more than one million processes, proteins, COGs and GOs. It is illustrated by various application examples probing process differences and designing modifications.
The phylum Tardigrada consists of about 1000 described species to date. The animals live in habitats within marine, freshwater and terrestrial ecosystems allover the world. Tardigrades are polyextremophiles. They are capable to resist extreme temperature, pressure or radiation. In the event of desiccation, tardigrades enter a so-called tun stage. The reason for their great tolerance capabilities against extreme environmental conditions is not discovered yet. Our Funcrypta project aims at finding answers to the question what mechanisms underlie these adaption capabilities particularly with regard to the species Milnesium tardigradum. The first part of this thesis describes the establishment of expressed sequence tags (ESTs) libraries for different stages of M. tardigradum. From proteomics data we bioinformatically identified 144 proteins with a known function and additionally 36 proteins which seemed to be specific for M. tardigradum. The generation of a comprehensive web-based database allows us to merge the proteome and transcriptome data. Therefore we created an annotation pipeline for the functional annotation of the protein and nucleotide sequences. Additionally, we clustered the obtained proteome dataset and identified some tardigrade-specific proteins (TSPs) which did not show homology to known proteins. Moreover, we examined the heat shock proteins of M. tardigradum and their different expression levels depending on the actual state of the animals. In further bioinformatical analyses of the whole data set, we discovered promising proteins and pathways which are described to be correlated with the stress tolerance, e.g. late embryogenesis abundant (LEA) proteins. Besides, we compared the tardigrades with nematodes, rotifers, yeast and man to identify shared and tardigrade specific stress pathways. An analysis of the 50 and 30 untranslated regions (UTRs) demonstrates a strong usage of stabilising motifs like the 15-lipoxygenase differentiation control element (15-LOX-DICE) but also reveals a lack of other common UTR motifs normally used, e.g. AU rich elements. The second part of this thesis focuses on the relatedness between several cryptic species within the tardigrade genus Paramacrobiotus. Therefore for the first time, we used the sequence-structure information of the internal transcribed spacer 2 (ITS2) as a phylogenetic marker in tardigrades. This allowed the description of three new species which were indistinguishable using morphological characters or common molecular markers like the 18S ribosomal ribonucleic acid (rRNA) or the Cytochrome c oxidase subunit I (COI). In a large in silico simulation study we also succeeded to show the benefit for the phylogenetic tree reconstruction by adding structure information to the ITS2 sequence. Next to the genus Paramacrobiotus we used the ITS2 to corroborate a monophyletic DO-group (Sphaeropleales) within the Chlorophyceae. Additionally we redesigned another comprehensive database—the ITS2 database resulting in a doubled number of sequence-structure pairs of the ITS2. In conclusion, this thesis shows the first insights (6 first author publications and 4 coauthor publications) into the reasons for the enormous adaption capabilities of tardigrades and offers a solution to the debate on the phylogenetic relatedness within the tardigrade genus Paramacrobiotus.
Die Bioinformatik ist eine interdisziplinäre Wissenschaft, welche Probleme aus allen Lebenswissenschaften mit Hilfe computergestützter Methoden bearbeitet. Ihr Ziel ist es, die Verarbeitung und Interpretation großer Datenmengen zu ermöglichen. Zudem unterstützt sie den Designprozess von Experimenten in der Synthetischen Biologie. Die synthetische Biologie beschäftigt sich mit der Generierung neuer Komponenten und deren Eigenschaften, welche durch die Behandlung und Manipulation lebender Organismen oder Teilen daraus entstehen. Ein besonders interessantes Themengebiet hierbei sind Zweikomponenten-Systeme (Two-Component System, TCS). TCS sind wichtige Signalkaskaden in Bakterien, welche in der Lage sind Informationen aus der Umgebung in eine Zelle zu übertragen und darauf zu reagieren. Die vorliegende Dissertation beschäftigt sich mit der Beurteilung, Nutzung und Weiterentwicklung von bioinformatischen Methoden zur Untersuchung von Proteininteraktionen und biologischen Systemen. Der wissenschaftliche Beitrag der vorliegenden Arbeit kann in drei Aspekte unterteilt werden: - Untersuchung und Beurteilung von bioinformatischen Methoden und Weiterführung der Ergebnisse aus der vorhergehenden Diplomarbeit zum Thema Protein-Protein-Interaktionsvorhersagen. - Analyse genereller evolutionärer Modifikationsmöglichkeiten von TCS sowie deren Design und spezifische Unterschiede. - Abstraktion bzw. Transfer der gewonnenen Erkenntnisse auf technische und biologische Zusammenhänge. Mit dem Ziel das Design neuer Experimente in der synthetischen Biologie zu vereinfachen und die Vergleichbarkeit von technischen und biologischen Prozessen sowie zwischen Organismen zu ermöglichen. Das Ergebnis der durchgeführten Studie zeigte, dass Zweikomponenten-Systeme in ihrem Aufbau sehr konserviert sind. Nichtsdestotrotz konnten viele spezifische Eigenschaften und drei generelle Modifikationsmöglichkeiten entdeckt werden. Die Untersuchungen ermöglichten die Identifikation neuer Promotorstellen, erlaubten aber auch die Beschreibung der Beschaffenheit unterschiedlicher Signalbindestellen. Zudem konnten bisher fehlende Komponenten aus TCS entdeckt werden, ebenso wie neue divergierte TCS-Domänen im Organismus Mycoplasma. Eine Kombination aus technischen Ansätzen und synthetischer Biologie vereinfachte die gezielte Manipulation von TCS oder anderen modularen Systemen. Die Etablierung der vorgestellten zweistufigen Modul-Klassifikation ermöglichte eine effizientere Analyse modular aufgebauter Prozesse und erlaubte somit das molekulare Design synthetischer, biologischer Anwendungen. Zur einfachen Nutzung dieses Ansatzes wurde eine frei zugängliche Software GoSynthetic entwickelt. Konkrete Beispiele demonstrierten die praktische Anwendbarkeit dieser Analysesoftware. Die vorgestellte Klassifikation der synthetisch-biologischen und technischen Einheiten soll die Planung zukünftiger Designexperimente vereinfachen und neue Wege für sinnverwandte Bereiche aufzeigen. Es ist nicht die Hauptaufgabe der Bioinformatik, Experimente zu ersetzen, sondern resultierende große Datenmengen sinnvoll und effizient auszuwerten. Daraus sollen neue Ideen für weitere Analysen und alternative Anwendungen gewonnen werden, um fehlerhafte oder falsche Ansätze frühzeitig zu erkennen. Die Bioinformatik bietet moderne, technische Verfahren, um vertraute, aber oft mühsame experimentelle Wege durch neue, vielversprechende Ansätze zur Datenstrukturierung und Auswertung großer Datenmengen zu ergänzen. Neue Sichtweisen werden durch die Erleichterung des Testprozederes gefördert. Die resultierende Zeitersparnis führt zudem zu einer Kostenreduktion.
Genome sequence analysis A combination of genome analysis application has been established here during this project. This offers an efficient platform to interactively compare similar genome regions and reveal loci differences. The genes and operons can be rapidly analyzed and local collinear blocks (LCBs) categorized according to their function. The features of interests are parsed, recognized, and clustered into reports. Phylogenetic relationships can be readily examined such as the evolution of critical factors or a certain highly-conserved region. The resulting platform-independent software packages (GENOVA and inGeno), have been proven to be efficient and easy to handle in a number of projects. The capabilities of the software allowed the investigation of virulence factors, e.g., rsbU, strains’ biological design, and in particular pathogenicity feature storage and management. We have successfully investigated the genomes of Staphylococcus aureus strains (COL, N315, 8325, RN1HG, Newman), Listeria spp. (welshimeri, innocua and monocytogenes), E.coli strains (O157:H7 and MG1655) and Vaccinia strains (WR, Copenhagen, Lister, LIVP, GLV-1h68 and parental strains). Metabolic network analysis Our YANAsquare package offers a workbench to rapidly establish the metabolic network of such as Staphylococcous aureus bacteria in genome-scale size as well as metabolic networks of interest such as the murine phagosome lipid signalling network. YANAsquare recruits reactions from online databases using an integrated KEGG browser. This reduces the efforts in building large metabolic networks. The involved calculation routines (METATOOL-derived wrapper or native Java implementation) readily obtain all possible flux modes (EM/EP) for metabolite fluxes within the network. Advanced layout algorithms visualize the topological structure of the network. In addition, the generated structure can be dynamically modified in the graphic interface. The generated network as well as the manipulated layout can be validated and stored (XML file: scheme of SBML level-2). This format can be further parsed and analyzed by other systems biology software, such as CellDesigner. Moreover, the integrated robustness-evaluation routine is able to examine the synthesis rates affected by each single mutation throughout the whole network. We have successfully applied the method to simulate single and multiple gene knockouts, and the affected fluxes are comprehensively revealed. Recently we applied the method to proteomic data and extra-cellular metabolite data of Staphylococci, the physiological changes regarding the flux distribution are studied. Calculations at different time points, including different conditions such as hypoxia or stress, show a good fit to experimental data. Moreover, using the proteomic data (enzyme amounts) calculated from 2D-Gel-EP experiments our study provides a way to compare the fluxome and the enzyme expression. Oncolytic vaccinia virus (VACV) We investigated the genetic differences between the de novo sequence of the recombinant oncolytic GLV-1h68 and other related VACVs, including function predictions for all found genome differences. Our phylogenetic analysis indicates that GLV-1h68 is closest to Lister strains but has lost several ORFs present in its parental LIVP strain, including genes encoding CrmE and a viral Golgi anti-apoptotic protein, v-GAAP. Functions of viral genes were either strain-specific, tissue-specific or host-specific comparing viral genes in the Lister, WR and COP strains. This helps to rationally design more optimized oncolytic virus strains to benefit cancer therapy in human patients. Identified differences from the comparison in open reading frames (ORFs) include genes for host-range selection, virulence and immune modulation proteins, e.g. ankyrin-like proteins, serine proteinase inhibitor SPI-2/CrmA, tumor necrosis factor (TNF) receptor homolog CrmC, semaphorin-like and interleukin-1 receptor homolog proteins. The contribution of foreign gene expression cassettes in the therapeutic and oncolytic virus GLV-1h68 was studied, including the F14.5L, J2R and A56R loci. The contribution of F14.5L inactivation to the reduced virulence is demonstrated by comparing the virulence data of GLV-1h68 with its F14.5L-null and revertant viruses. The comparison suggests that insertion of a foreign gene expression cassette in a nonessential locus in the viral genome is a practical way to attenuate VACVs, especially if the nonessential locus itself contains a virulence gene. This reduces the virulence of the virus without compromising too much the replication competency of the virus, the key to its oncolytic activity. The reduced pathogenicity of GLV-1h68 was confirmed by our experimental collaboration partners in male mice bearing C6 rat glioma and in immunocompetent mice bearing B16-F10 murine melanoma. In conclusion, bioinformatics and experimental data show that GLV-1h68 is a promising engineered VACV variant for anticancer therapy with tumor-specific replication, reduced pathogenicity and benign tissue tropism.