TY - THES A1 - Kobs, Konstantin T1 - Think outside the Black Box: Model-Agnostic Deep Learning with Domain Knowledge T1 - Think outside the Black Box: Modellagnostisches Deep Learning mit Domänenwissen N2 - Deep Learning (DL) models are trained on a downstream task by feeding (potentially preprocessed) input data through a trainable Neural Network (NN) and updating its parameters to minimize the loss function between the predicted and the desired output. While this general framework has mainly remained unchanged over the years, the architectures of the trainable models have greatly evolved. Even though it is undoubtedly important to choose the right architecture, we argue that it is also beneficial to develop methods that address other components of the training process. We hypothesize that utilizing domain knowledge can be helpful to improve DL models in terms of performance and/or efficiency. Such model-agnostic methods can be applied to any existing or future architecture. Furthermore, the black box nature of DL models motivates the development of techniques to understand their inner workings. Considering the rapid advancement of DL architectures, it is again crucial to develop model-agnostic methods. In this thesis, we explore six principles that incorporate domain knowledge to understand or improve models. They are applied either on the input or output side of the trainable model. Each principle is applied to at least two DL tasks, leading to task-specific implementations. To understand DL models, we propose to use Generated Input Data coming from a controllable generation process requiring knowledge about the data properties. This way, we can understand the model’s behavior by analyzing how it changes when one specific high-level input feature changes in the generated data. On the output side, Gradient-Based Attribution methods create a gradient at the end of the NN and then propagate it back to the input, indicating which low-level input features have a large influence on the model’s prediction. The resulting input features can be interpreted by humans using domain knowledge. To improve the trainable model in terms of downstream performance, data and compute efficiency, or robustness to unwanted features, we explore principles that each address one of the training components besides the trainable model. Input Masking and Augmentation directly modifies the training input data, integrating knowledge about the data and its impact on the model’s output. We also explore the use of Feature Extraction using Pretrained Multimodal Models which can be seen as a beneficial preprocessing step to extract useful features. When no training data is available for the downstream task, using such features and domain knowledge expressed in other modalities can result in a Zero-Shot Learning (ZSL) setting, completely eliminating the trainable model. The Weak Label Generation principle produces new desired outputs using knowledge about the labels, giving either a good pretraining or even exclusive training dataset to solve the downstream task. Finally, improving and choosing the right Loss Function is another principle we explore in this thesis. Here, we enrich existing loss functions with knowledge about label interactions or utilize and combine multiple task-specific loss functions in a multitask setting. We apply the principles to classification, regression, and representation tasks as well as to image and text modalities. We propose, apply, and evaluate existing and novel methods to understand and improve the model. Overall, this thesis introduces and evaluates methods that complement the development and choice of DL model architectures. N2 - Deep-Learning-Modelle (DL-Modelle) werden trainiert, indem potenziell vorverarbeitete Eingangsdaten durch ein trainierbares Neuronales Netz (NN) geleitet und dessen Parameter aktualisiert werden, um die Verlustfunktion zwischen der Vorhersage und der gewünschten Ausgabe zu minimieren. Während sich dieser allgemeine Ablauf kaum geändert hat, haben sich die verwendeten NN-Architekturen erheblich weiterentwickelt. Auch wenn die Wahl der Architektur für die Aufgabe zweifellos wichtig ist, schlagen wir in dieser Arbeit vor, Methoden für andere Komponenten des Trainingsprozesses zu entwickeln. Wir vermuten, dass die Verwendung von Domänenwissen hilfreich bei der Verbesserung von DL-Modellen bezüglich ihrer Leistung und/oder Effizienz sein kann. Solche modellagnostischen Methoden sind dann bei jeder bestehenden oder zukünftigen NN-Architektur anwendbar. Die Black-Box-Natur von DL-Modellen motiviert zudem die Entwicklung von Methoden, die zum Verständnis der Funktionsweise dieser Modelle beitragen. Angesichts der schnellen Architektur-Entwicklung ist es wichtig, modellagnostische Methoden zu entwickeln. In dieser Arbeit untersuchen wir sechs Prinzipien, die Domänenwissen verwenden, um Modelle zu verstehen oder zu verbessern. Sie werden auf Trainingskomponenten im Eingang oder Ausgang des Modells angewendet. Jedes Prinzip wird dann auf mindestens zwei DL-Aufgaben angewandt, was zu aufgabenspezifischen Implementierungen führt. Um DL-Modelle zu verstehen, verwenden wir kontrolliert generierte Eingangsdaten, was Wissen über die Dateneigenschaften benötigt. So können wir das Verhalten des Modells verstehen, indem wir die Ausgabeänderung bei der Änderung von abstrahierten Eingabefeatures beobachten. Wir untersuchen zudem gradienten-basierte Attribution-Methoden, die am Ausgang des NN einen Gradienten anlegen und zur Eingabe zurückführen. Eingabefeatures mit großem Einfluss auf die Modellvorhersage können so identifiziert und von Menschen mit Domänenwissen interpretiert werden. Um Modelle zu verbessern (in Bezug auf die Ergebnisgüte, Daten- und Recheneffizienz oder Robustheit gegenüber ungewollten Eingaben), untersuchen wir Prinzipien, die jeweils eine Trainingskomponente neben dem trainierbaren Modell betreffen. Das Maskieren und Augmentieren von Eingangsdaten modifiziert direkt die Trainingsdaten und integriert dabei Wissen über ihren Einfluss auf die Modellausgabe. Die Verwendung von vortrainierten multimodalen Modellen zur Featureextraktion kann als ein Vorverarbeitungsschritt angesehen werden. Bei fehlenden Trainingsdaten können die Features und Domänenwissen in anderen Modalitäten als Zero-Shot Setting das trainierbare Modell gänzlich eliminieren. Das Weak-Label-Generierungs-Prinzip erzeugt neue gewünschte Ausgaben anhand von Wissen über die Labels, was zu einem Pretrainings- oder exklusiven Trainigsdatensatz führt. Schließlich ist die Verbesserung und Auswahl der Verlustfunktion ein weiteres untersuchtes Prinzip. Hier reichern wir bestehende Verlustfunktionen mit Wissen über Label-Interaktionen an oder kombinieren mehrere aufgabenspezifische Verlustfunktionen als Multi-Task-Ansatz. Wir wenden die Prinzipien auf Klassifikations-, Regressions- und Repräsentationsaufgaben sowie Bild- und Textmodalitäten an. Wir stellen bestehende und neue Methoden vor, wenden sie an und evaluieren sie für das Verstehen und Verbessern von DL-Modellen, was die Entwicklung und Auswahl von DL-Modellarchitekturen ergänzt. KW - Deep learning KW - Neuronales Netz KW - Maschinelles Lernen KW - Machine Learning KW - Model-Agnostic KW - Domain Knowledge Y1 - 2024 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-349689 ER - TY - JOUR A1 - Hentschel, Simon A1 - Kobs, Konstantin A1 - Hotho, Andreas T1 - CLIP knows image aesthetics JF - Frontiers in Artificial Intelligence N2 - Most Image Aesthetic Assessment (IAA) methods use a pretrained ImageNet classification model as a base to fine-tune. We hypothesize that content classification is not an optimal pretraining task for IAA, since the task discourages the extraction of features that are useful for IAA, e.g., composition, lighting, or style. On the other hand, we argue that the Contrastive Language-Image Pretraining (CLIP) model is a better base for IAA models, since it has been trained using natural language supervision. Due to the rich nature of language, CLIP needs to learn a broad range of image features that correlate with sentences describing the image content, composition, environments, and even subjective feelings about the image. While it has been shown that CLIP extracts features useful for content classification tasks, its suitability for tasks that require the extraction of style-based features like IAA has not yet been shown. We test our hypothesis by conducting a three-step study, investigating the usefulness of features extracted by CLIP compared to features obtained from the last layer of a comparable ImageNet classification model. In each step, we get more computationally expensive. First, we engineer natural language prompts that let CLIP assess an image's aesthetic without adjusting any weights in the model. To overcome the challenge that CLIP's prompting only is applicable to classification tasks, we propose a simple but effective strategy to convert multiple prompts to a continuous scalar as required when predicting an image's mean aesthetic score. Second, we train a linear regression on the AVA dataset using image features obtained by CLIP's image encoder. The resulting model outperforms a linear regression trained on features from an ImageNet classification model. It also shows competitive performance with fully fine-tuned networks based on ImageNet, while only training a single layer. Finally, by fine-tuning CLIP's image encoder on the AVA dataset, we show that CLIP only needs a fraction of training epochs to converge, while also performing better than a fine-tuned ImageNet model. Overall, our experiments suggest that CLIP is better suited as a base model for IAA methods than ImageNet pretrained networks. KW - Image Aesthetic Assessment KW - CLIP KW - language-image pre-training KW - text supervision KW - prompt engineering KW - AVA Y1 - 2022 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-297150 SN - 2624-8212 VL - 5 ER - TY - JOUR A1 - Steininger, Michael A1 - Kobs, Konstantin A1 - Davidson, Padraig A1 - Krause, Anna A1 - Hotho, Andreas T1 - Density-based weighting for imbalanced regression JF - Machine Learning N2 - In many real world settings, imbalanced data impedes model performance of learning algorithms, like neural networks, mostly for rare cases. This is especially problematic for tasks focusing on these rare occurrences. For example, when estimating precipitation, extreme rainfall events are scarce but important considering their potential consequences. While there are numerous well studied solutions for classification settings, most of them cannot be applied to regression easily. Of the few solutions for regression tasks, barely any have explored cost-sensitive learning which is known to have advantages compared to sampling-based methods in classification tasks. In this work, we propose a sample weighting approach for imbalanced regression datasets called DenseWeight and a cost-sensitive learning approach for neural network regression with imbalanced data called DenseLoss based on our weighting scheme. DenseWeight weights data points according to their target value rarities through kernel density estimation (KDE). DenseLoss adjusts each data point’s influence on the loss according to DenseWeight, giving rare data points more influence on model training compared to common data points. We show on multiple differently distributed datasets that DenseLoss significantly improves model performance for rare data points through its density-based weighting scheme. Additionally, we compare DenseLoss to the state-of-the-art method SMOGN, finding that our method mostly yields better performance. Our approach provides more control over model training as it enables us to actively decide on the trade-off between focusing on common or rare cases through a single hyperparameter, allowing the training of better models for rare data points. KW - supervised learning KW - imbalanced regression KW - cost-sensitive learning KW - sample weighting KW - Kerneldensity estimation Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-269177 SN - 1573-0565 VL - 110 IS - 8 ER -