Die SVM-gestützte Prädiktabilität der Bindungsspezifität ‎von SH3-Domänen anhand ihrer Aminosäuresequenz

Axmacher, Franz

search hit 9 of 19

Back to Result List

Die SVM-gestützte Prädiktabilität der Bindungsspezifität ‎von SH3-Domänen anhand ihrer Aminosäuresequenz

The SVM-based predictability of SH3-domain binding specificity by means of its amino-acid-‎sequence. ‎

Please always quote using this URN: urn:nbn:de:bvb:20-opus-113349

Franz Axmacher

Die Identifikation der Bindungsspezifitäten von Proteininteraktionsdomänen und damit letztlich auch ‎die Fähigkeit potentielle Bindungspartner dieser in vivo vorherzusagen bildet ein grundlegendes ‎Element für das Verständnis der biologischen Funktionen dieser Domänen. In dieser Arbeit wurde ‎untersucht, inwieweit solche Vorhersagen bezüglich der SH3-Domäne – als Beispiel für eine ‎Proteininteraktionsdomäne – mithilfe von Support-Vector-Machines (SVMs) möglich sind, wenn ‎diesen als Informationsquelle ausschließlich die innerhalb derDie Identifikation der Bindungsspezifitäten von Proteininteraktionsdomänen und damit letztlich auch ‎die Fähigkeit potentielle Bindungspartner dieser in vivo vorherzusagen bildet ein grundlegendes ‎Element für das Verständnis der biologischen Funktionen dieser Domänen. In dieser Arbeit wurde ‎untersucht, inwieweit solche Vorhersagen bezüglich der SH3-Domäne – als Beispiel für eine ‎Proteininteraktionsdomäne – mithilfe von Support-Vector-Machines (SVMs) möglich sind, wenn ‎diesen als Informationsquelle ausschließlich die innerhalb der Aminosäuresequenz der Domäne ‎konservierten Informationen zur Verfügung stehen. Um den SVM-basierten Klassifikator zu ‎trainieren und zu validieren, wurde ein Satz aus 51 SH3-Domänen verwendet, die zuvor ‎entsprechend ihrer Ligandenpräferenz in ein System aus acht verschiedenen Klassen eingeteilt ‎worden waren. Da die innerhalb der Aminosäuresequenzen konservierten Informationen in ‎abstrakte Zahlenwerte konvertiert werden mussten (Voraussetzung für mathematisch basierte ‎Klassifikatoren wie SVMs), wurde jede Aminosäuresequenz durch ihren jeweiligen Fisher-Score-‎Vektor ausgedrückt. Die Ergebnisse erbrachten einen Klassifikationserror, welcher weit unterhalb des ‎Zufallsniveaus lag, was darauf hindeutet, dass sich die Bindungsspezifität (Klasse) einer SH3-Domäne ‎in der Tat von seiner Aminosäuresequenz ableiten lassen dürfte. Mithilfe klassenspezifisch ‎emittierter, artifizieller Sequenzen, implementiert in den Trainingsprozess des Klassifikators, um ‎etwaigen nachteiligen Auswirkungen von Overfitting zu entgegenzuwirken, sowie durch ‎Berücksichtigung taxonomischer Informationen des Klassensystems während Training und ‎Validierung, ließ sich der Klassifikationserror sogar noch weiter senken und lag schließlich bei lediglich ‎‎35,29% (vergleiche Zufall: 7/8 = 87.50%). Auch die Nutzung von Feature Selections zur Abmilderung ‎Overfitting-bedingter, negativer Effekte lieferte recht vielversprechende Ergebnisse, wenngleich ihr ‎volles Potential aufgrund von Software-Beschränkungen nicht ausgenutzt werden konnte.‎ Die Analyse der Positionen im Sequence-Alignment, welche für den SVM- basierten Klassifikator am ‎relevantesten waren, zeigte, dass diese häufig mit Positionen korrelierten, von denen angenommen ‎wird auch in vivo eine Schlüsselrolle bei der Determination der Bindungsspezifität (Klasse) zu spielen. ‎Dies unterstreicht nicht nur die Reliabilität des präsentierten Klassifikators, es gibt auch Grund zur ‎Annahme, dass das Verfahren möglicherweise auch als Supplement anderer Ansätze genutzt werden ‎könnte, welche zum Ziel haben die Positionen zu identifizieren, die die Ligandenpräferenz in vivo ‎determinieren. Informationen, die nicht nur für ein besseres Verständnis der SH3-Domäne (und ‎möglicherweise auch anderer Proteininteraktionsdomänen) von grundlegender Bedeutung sind, ‎sondern auch aus pharmakologischer Sicht von großem Interesse sein dürften.‎…
Regarding protein-interaction-domains the identification of their binding specificities and ‎eventually ‎also the ability to predict potential binding partners for them in vivo constitutes a fundamental ‎element for the understanding of the biological functions of these domains. In this study it ‎was ‎investigated to what extent such predictions could be made for the SH3-domain – as an ‎example ‎for a protein-interaction-domain – when using support-vector-machines (SVMs) trained ‎exclusively ‎with the information conserved within theRegarding protein-interaction-domains the identification of their binding specificities and ‎eventually ‎also the ability to predict potential binding partners for them in vivo constitutes a fundamental ‎element for the understanding of the biological functions of these domains. In this study it ‎was ‎investigated to what extent such predictions could be made for the SH3-domain – as an ‎example ‎for a protein-interaction-domain – when using support-vector-machines (SVMs) trained ‎exclusively ‎with the information conserved within the amino-acid-sequence of the domain. A set of ‎‎51 SH3-‎domains, pre-classified into a system of eight different classes according to their ligand ‎preference, was used to train and cross-validate the SVM-based classifier. To convert the ‎information ‎conserved within the amino-acid-sequences into abstract numeric values (a ‎prerequisite for a ‎mathematics-based classifier like SVMs) each sequence was represented by its ‎respective Fisher-‎score-vector. The results revealed a classification error level way below chance ‎level, indicating the ‎binding specificity (class) of an SH3-domain can indeed be inferred from its ‎amino-acid-sequence. ‎With the help of class-specific emitted, artificial sequences introduced into ‎the training process of the ‎classifier to counter adverse overfitting effects and by additionally ‎considering taxonomic ‎information of the class system during training and cross-validation, the ‎classification error level of ‎the classifier could be lowered even farther, eventually reaching a level ‎as low as 35.29% (compare ‎chance level: 7/8 = 87.50%). The use feature selections to counter ‎overfitting returned quite ‎promising results, too, however couldn't be exploited to its full potential ‎due to software limitations. ‎ The analysis of those positions in the sequence-alignment being most relevant for the SVM-‎based ‎classifier showed, they frequently correlated with positions considered to also play in vivo a ‎pivotal ‎role in binding specificity (class) determination of the SH3-domain. Not only does this ‎underline the ‎reliability of the presented classifier, it also gives reason to believe, the method could ‎possibly be ‎used as a supplement for other approaches trying to identify positions that determine ‎ligand ‎preference in vivo. Information, not only fundamental for a better understanding of the SH3-‎‎domain (and maybe also other protein-interaction-domains), but also likely to be of great interest ‎from a pharmacological point of view.‎…

Metadaten
Author:	Franz Axmacher
URN:	urn:nbn:de:bvb:20-opus-113349
Document Type:	Doctoral Thesis
Granting Institution:	Universität Würzburg, Medizinische Fakultät
Faculties:	Fakultät für Biologie / Theodor-Boveri-Institut für Biowissenschaften
Referee:	Prof. Dr. Thomas Dandekar, Prof. Dr. Dr. Dipl. Phys. Wolfgang Bauer
Date of final exam:	2015/04/21
Language:	German
Year of Completion:	2014
Dewey Decimal Classification:	6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit
GND Keyword:	Support-Vektor-Maschine; Alignment <Biochemie>; Hidden-Markov-Modell; Kreuzvalidierung; Taxonomie
Tag:	Regularisierung; SH3-Domäne Feature-Selection; Fisher-Score; PyMOL; WebLogo; e1071
CCS-Classification:	G. Mathematics of Computing / G.3 PROBABILITY AND STATISTICS / Multivariate statistics (NEW)
Release Date:	2015/06/02
Licence (German):	CC BY-NC-ND: Creative-Commons-Lizenz: Namensnennung, Nicht kommerziell, Keine Bearbeitung

Die SVM-gestützte Prädiktabilität der Bindungsspezifität ‎von SH3-Domänen anhand ihrer Aminosäuresequenz

The SVM-based predictability of SH3-domain binding specificity by means of its amino-acid-‎sequence. ‎

Download full text files

Export metadata

Additional Services