Refine
Has Fulltext
- yes (18)
Is part of the Bibliography
- yes (18)
Year of publication
Document Type
- Doctoral Thesis (18)
Language
- English (18) (remove)
Keywords
- Modellierung (18) (remove)
Institute
- Institut für Informatik (6)
- Institut für Geographie und Geologie (5)
- Betriebswirtschaftliches Institut (1)
- Graduate School of Life Sciences (1)
- Institut für Geographie (1)
- Institut für Geologie (1)
- Institut für Mathematik (1)
- Institut für Pharmazie und Lebensmittelchemie (1)
- Klinik und Poliklinik für Psychiatrie, Psychosomatik und Psychotherapie (1)
- Theodor-Boveri-Institut für Biowissenschaften (1)
Oral antineoplastic drugs are an important component in the treatment of solid tumour diseases, haematological and immunological malignancies. Oral drug administration is associated with positive features (e.g., non-invasive drug administration, outpatient care with a high level of independence for the patient and reduced costs for the health care system). The systemic exposure after oral intake however is prone to high IIV as it strongly depends on gastrointestinal absorption processes, which are per se characterized by high inter-and intraindividual variability. Disease and patient-specific characteristics (e.g., disease state, concomitant diseases, concomitant medication, patient demographics) may additionally contribute to variability in plasma concentrations between individual patients. In addition, many oral antineoplastic drugs show complex PK, which has not yet been fully investigated and elucidated for all substances. All this may increase the risk of suboptimal plasma exposure (either subtherapeutic or toxic), which may ultimately jeopardise the success of therapy, either through a loss of efficacy or through increased, intolerable adverse drug reactions. TDM can be used to detect suboptimal plasma levels and prevent permanent under- or overexposure. It is essential in the treatment of ACC with mitotane, a substance with unfavourable PK and high IIV. In the current work a HPLC-UV method for the TDM of mitotane using VAMS was developed. A low sample volume (20 µl) of capillary blood was used in the developed method, which facilitates dense sampling e.g., at treatment initiation. However, no reference ranges for measurements from capillary blood are established so far and a simple conversion from capillary concentrations to plasma concentrations was not possible. To date the therapeutic range is established only for plasma concentrations and observed capillary concentrations could not be reliable interpretated.The multi-kinase inhibitor cabozantinib is also used for the treatment of ACC. However, not all PK properties, like the characteristic second peak in the cabozantinib concentration-time profile have been fully understood so far. To gain a mechanistic understanding of the compound, a PBPK model was developed and various theories for modelling the second peak were explored, revealing that EHC of the compound is most plausible. Cabozantinib is mainly metabolized via CYP3A4 and susceptible to DDI with e.g., CYP3A4 inducers. The DDI between cabozantinib and rifampin was investigated with the developed PBPK model and revealed a reduced cabozantinib exposure (AUC) by 77%. Hence, the combination of cabozantinib with strong CYP inducers should be avoided. If this is not possible, co administration should be monitored using TDM. The model was also used to simulate cabozantinib plasma concentrations at different stages of liver injury. This showed a 64% and 50% increase in total exposure for mild and moderate liver injury, respectively.Ruxolitinib is used, among others, for patients with acute and chronic GvHD. These patients often also receive posaconazole for invasive fungal prophylaxis leading to CYP3A4 mediated DDI between both substances. Different dosing recommendations from the FDA and EMA on the use of ruxolitinib in combination with posaconazole complicate clinical use. To simulate the effect of this relevant DDI, two separate PBPK models for ruxolitinib and posaconazole were developed and combined. Predicted ruxolitinib exposure was compared to observed plasma concentrations obtained in GvHD patients. The model simulations showed that the observed ruxolitinib concentrations in these patients were generally higher than the simulated concentrations in healthy individuals, with standard dosing present in both scenarios. According to the developed model, EMA recommended RUX dose reduction seems to be plausible as due to the complexity of the disease and intake of extensive co-medication, RUX plasma concentration can be higher than expected.
Environmental issues have emerged especially since humans burned fossil fuels, which led to air pollution and climate change that harm the environment. These issues’ substantial consequences evoked strong efforts towards assessing the state of our environment.
Various environmental machine learning (ML) tasks aid these efforts. These tasks concern environmental data but are common ML tasks otherwise, i.e., datasets are split (training, validatition, test), hyperparameters are optimized on validation data, and test set metrics measure a model’s generalizability. This work focuses on the following environmental ML tasks: Regarding air pollution, land use regression (LUR) estimates air pollutant concentrations at locations where no measurements are available based on measured locations and each location’s land use (e.g., industry, streets). For LUR, this work uses data from London (modeled) and Zurich (measured). Concerning climate change, a common ML task is model output statistics (MOS), where a climate model’s output for a study area is altered to better fit Earth observations and provide more accurate climate data. This work uses the regional climate model (RCM) REMO and Earth observations from the E-OBS dataset for MOS. Another task regarding climate is grain size distribution interpolation where soil properties at locations without measurements are estimated based on the few measured locations. This can provide climate models with soil information, that is important for hydrology. For this task, data from Lower Franconia is used.
Such environmental ML tasks commonly have a number of properties: (i) geospatiality, i.e., their data refers to locations relative to the Earth’s surface. (ii) The environmental variables to estimate or predict are usually continuous. (iii) Data can be imbalanced due to relatively rare extreme events (e.g., extreme precipitation). (iv) Multiple related potential target variables can be available per location, since measurement devices often contain different sensors. (v) Labels are spatially often only sparsely available since conducting measurements at all locations of interest is usually infeasible. These properties present challenges but also opportunities when designing ML methods for such tasks.
In the past, environmental ML tasks have been tackled with conventional ML methods, such as linear regression or random forests (RFs). However, the field of ML has made tremendous leaps beyond these classic models through deep learning (DL). In DL, models use multiple layers of neurons, producing increasingly higher-level feature representations with growing layer depth. DL has made previously infeasible ML tasks feasible, improved the performance for many tasks in comparison to existing ML models significantly, and eliminated the need for manual feature engineering in some domains due to its ability to learn features from raw data. To harness these advantages for environmental domains it is promising to develop novel DL methods for environmental ML tasks.
This thesis presents methods for dealing with special challenges and exploiting opportunities inherent to environmental ML tasks in conjunction with DL. To this end, the proposed methods explore the following techniques: (i) Convolutions as in convolutional neural networks (CNNs) to exploit reoccurring spatial patterns in geospatial data. (ii) Posing the problems as regression tasks to estimate the continuous variables. (iii) Density-based weighting to improve estimation performance for rare and extreme events. (iv) Multi-task learning to make use of multiple related target variables. (v) Semi–supervised learning to cope with label sparsity. Using these techniques, this thesis considers four research questions: (i) Can air pollution be estimated without manual feature engineering? This is answered positively by the introduction of the CNN-based LUR model MapLUR as well as the off-the-shelf LUR solution OpenLUR. (ii) Can colocated pollution data improve spatial air pollution models? Multi-task learning for LUR is developed for this, showing potential for improvements with colocated data. (iii) Can DL models improve the quality of climate model outputs? The proposed DL climate MOS architecture ConvMOS demonstrates this. Additionally, semi-supervised training of multilayer perceptrons (MLPs) for grain size distribution interpolation is presented, which can provide improved input data. (iv) Can DL models be taught to better estimate climate extremes? To this end, density-based weighting for imbalanced regression (DenseLoss) is proposed and applied to the DL architecture ConvMOS, improving climate extremes estimation. These methods show how especially DL techniques can be developed for environmental ML tasks with their special characteristics in mind. This allows for better models than previously possible with conventional ML, leading to more accurate assessment and better understanding of the state of our environment.
Empathy, the act of sharing another person’s affective state, is a ubiquitous driver for helping others and feeling close to them. These experiences are integral parts of human behavior and society. The studies presented in this dissertation aimed to investigate the sustainability and stability of social closeness and prosocial decision-making driven by empathy and other social motives. In this vein, four studies were conducted in which behavioral and neural indicators of empathy sustainability were identified using model-based functional magnetic resonance imaging (fMRI).
Applying reinforcement learning, drift-diffusion modelling (DDM), and fMRI, the first two studies were designed to investigate the formation and sustainability of empathy-related social closeness (study 1) and examined how sustainably empathy led to prosocial behavior (study 2). Using DDM and fMRI, the last two studies investigated how empathy combined with reciprocity, the social norm to return a favor, on the one hand and empathy combined with the motive of outcome maximization on the other hand altered the behavioral and neural social decision process.
The results showed that empathy-related social closeness and prosocial decision tendencies persisted even if empathy was rarely reinforced. The sustainability of these empathy effects was related to recalibration of the empathy-related social closeness learning signal (study 1) and the maintenance of a prosocial decision bias (study 2). The findings of study 3 showed that empathy boosted the processing of reciprocity-based social decisions, but not vice versa. Study 4 revealed that empathy-related decisions were modulated by the motive of outcome maximization, depending on individual differences in state empathy.
Together, the studies strongly support the concept of empathy as a sustainable driver of social closeness and prosocial behavior.
Today’s cloud data centers consume an enormous amount of energy, and energy consumption will rise in the future. An estimate from 2012 found that data centers consume about 30 billion watts of power, resulting in about 263TWh of energy usage per year. The energy consumption will rise to 1929TWh until 2030. This projected rise in energy demand is fueled by a growing number of services deployed in the cloud. 50% of enterprise workloads have been migrated to the cloud in the last decade so far. Additionally, an increasing number of devices are using the cloud to provide functionalities and enable data centers to grow. Estimates say more than 75 billion IoT devices will be in use by 2025.
The growing energy demand also increases the amount of CO2 emissions. Assuming a CO2-intensity of 200g CO2 per kWh will get us close to 227 billion tons of CO2. This emission is more than the emissions of all energy-producing power plants in Germany in 2020.
However, data centers consume energy because they respond to service requests that are fulfilled through computing resources. Hence, it is not the users and devices that consume the energy in the data center but the software that controls the hardware. While the hardware is physically consuming energy, it is not always responsible for wasting energy. The software itself plays a vital role in reducing the energy consumption and CO2 emissions of data centers. The scenario of our thesis is, therefore, focused on software development.
Nevertheless, we must first show developers that software contributes to energy consumption by providing evidence of its influence. The second step is to provide methods to assess an application’s power consumption during different phases of the development process and to allow modern DevOps and agile development methods. We, therefore, need to have an automatic selection of system-level energy-consumption models that can accommodate rapid changes in the source code and application-level models allowing developers to locate power-consuming software parts for constant improvements. Afterward, we need emulation to assess the energy efficiency before the actual deployment.
Landslide susceptibility assessment in the Chiconquiaco Mountain Range area, Veracruz (Mexico)
(2022)
In Mexico, numerous landslides occur each year and Veracruz represents the state with the third highest number of events. Especially the Chiconquiaco Mountain Range, located in the central part of Veracruz, is highly affected by landslides and no detailed information on the spatial distribution of existing landslides or future occurrences is available. This leaves the local population exposed to an unknown threat and unable to react appropriately to this hazard or to consider the potential landslide occurrence in future planning processes.
Thus, the overall objective of the present study is to provide a comprehensive assessment of the landslide situation in the Chiconquiaco Mountain Range area. Here, the combination of a site-specific and a regional approach enables to investigate the causes, triggers, and process types as well as to model the landslide susceptibility for the entire study area.
For the site-specific approach, the focus lies on characterizing the Capulín landslide, which represents one of the largest mass movements in the area. In this context, the task is to develop a multi-methodological concept, which concentrates on cost-effective, flexible and non-invasive methods. This approach shows that the applied methods complement each other very well and their combination allows for a detailed characterization of the landslide.
The analyses revealed that the Capulín landslide is a complex mass movement type. It comprises rotational movement in the upper parts and translational movement in the lower areas, as well as flow processes at the flank and foot area and therefore, is classified as a compound slide-flow according to Cruden and Varnes (1996). Furthermore, the investigations show that the Capulín landslide represents a reactivation of a former process. This is an important new information, especially with regard to the other landslides identified in the study area. Both the road reconstructed after the landslide, which runs through the landslide mass, and the stream causing erosion processes at the foot of the landslide severely affect the stability of the landslide, making it highly susceptible to future reactivation processes. This is particularly important as the landslide is located only few hundred meters from the village El Capulín and an extension of the landslide area could cause severe damage.
The next step in the landslide assessment consists of integrating the data obtained in the site-specific approach into the regional analysis. Here, the focus lies on transferring the generated data to the entire study area. The developed methodological concept yields applicable results, which is supported by different validation approaches.
The susceptibility modeling as well as the landslide inventory reveal that the highest probability of landslides occurrence is related to the areas with moderate slopes covered by slope deposits. These slope deposits comprise material from old mass movements and erosion processes and are highly susceptible to landslides. The results give new insights into the landslide situation in the Chiconquiaco Mountain Range area, since previously landslide occurrence was related to steep slopes of basalt and andesite.
The susceptibility map is a contribution to a better assessment of the landslide situation in the study area and simultaneously proves that it is crucial to include specific characteristics of the respective area into the modeling process, otherwise it is possible that the local conditions will not be represented correctly.
The first problem is that of the optimal volume allocation in procurement. The choice of this problem was motivated by a study whose objective was to support decision-making at two procurement organizations for the procurement of Depot Medroxyprogesterone Acetate (DMPA), an injectable contraceptive. At the time of this study, only one supplier that had undergone the costly and lengthy process of WHO pre-qualification was available to these organizations. However, a new entrant supplier was expected to receive WHO qualification within the next year, thus becoming a viable second source for DMPA procurement. When deciding how to allocate the procurement volume between the two suppliers, the buyers had to consider the impact on price as well as risk. Higher allocations to one supplier yield lower prices but expose a buyer to higher supply risks, while an even allocation will result in lower supply risk but also reduce competitive pressure, resulting in higher prices. Our research investigates this single- versus dual-sourcing problem and quantifies in one model the impact of the procurement volume on competition and risk. To support decision-makers, we develop a mathematical framework that accounts for the characteristics of donor-funded global health markets and models the effects of an entrant on purchasing costs and supply risks. Our in-depth analysis provides insights into how the optimal allocation decision is affected by various parameters and explores the trade-off between competition and supply risk. For example, we find that, even if the entrant supplier introduces longer leads times and a higher default risk, the buyer still benefits from dual sourcing. However, these risk-diversification benefits depend heavily on the entrant’s in-country registration: If the buyer can ship the entrant’s product to only a selected number of countries, the buyer does not benefit from dual sourcing as much as it would if entrant’s product could be shipped to all supplied countries. We show that the buyer should be interested in qualifying the entrant’s product in countries with high demand first.
In the second problem we explore a new tendering mechanism called the postponement tender, which can be useful when buyers in the global health industry want to contract new generics suppliers with uncertain product quality. The mechanism allows a buyer to postpone part of the procurement volume’s allocation so the buyer can learn about the unknown quality before allocating the remaining volume to the best supplier in terms of both price and quality. We develop a mathematical model to capture the decision-maker’s trade-offs in setting the right split between the initial volume and the postponed volume. Our analysis shows that a buyer can benefit from this mechanism more than it can from a single-sourcing format, as it can decrease the risk of receiving poor quality (in terms of product quality and logistics performance) and even increase competitive pressure between the suppliers, thereby lowering the purchasing costs. By considering market parameters like the buyer’s size, the suppliers’ value (difference between quality and cost), quality uncertainty, and minimum order volumes, we derive optimal sourcing strategies for various market structures and explore how competition is affected by the buyer’s learning about the suppliers’ quality through the initial volume.
The third problem considers the repeated procurement problem of pharmacies in Kenya that have multi-product inventories. Coordinating orders allows pharmacies to achieve lower procurement prices by using the quantity discounts manufacturers offer and sharing fixed ordering costs, such as logistics costs. However, coordinating and optimizing orders for multiple products is complex and costly. To solve the coordinated procurement problem, also known as the Joint Replenishment Problem (JRP) with quantity discounts, a novel, data-driven inventory policy using sample-average approximation is proposed. The inventory policy is developed based on renewal theory and is evaluated using real-world sales data from Kenyan pharmacies. Multiple benchmarks are used to evaluate the performance of the approach. First, it is compared to the theoretically optimal policy --- that is, a dynamic-programming policy --- in the single-product setting without quantity discounts to show that the proposed policy results in comparable inventory costs. Second, the policy is evaluated for the original multi-product setting with quantity discounts and compared to ex-post optimal costs. The evaluation shows that the policy’s performance in the multi-product setting is similar to its performance in the single-product setting (with respect to ex-post optimal costs), suggesting that the proposed policy offers a promising, data-driven solution to these types of multi-product inventory problems.
The present thesis considers the modelling of gas mixtures via a kinetic description. Fundamentals about the Boltzmann equation for gas mixtures and the BGK approximation are presented. Especially, issues in extending these models to gas mixtures are discussed. A non-reactive two component gas mixture is considered. The two species mixture is modelled by a system of kinetic BGK equations featuring two interaction terms to account for momentum and energy transfer between the two species. The model presented here contains several models from physicists and engineers as special cases. Consistency of this model is proven: conservation properties, positivity of all temperatures and the H-theorem. The form in global equilibrium as Maxwell distributions is specified. Moreover, the usual macroscopic conservation laws can be derived.
In the literature, there is another type of BGK model for gas mixtures developed by Andries, Aoki and Perthame, which contains only one interaction term. In this thesis, the advantages of these two types of models are discussed and the usefulness of the model presented here is shown by using this model to determine an unknown function in the energy exchange of the macroscopic equations for gas mixtures described in the literature by Dellacherie. In addition, for each of the two models existence and uniqueness of mild solutions is shown. Moreover, positivity of classical solutions is proven.
Then, the model presented here is applied to three physical applications: a plasma consisting of ions and electrons, a gas mixture which deviates from equilibrium and a gas mixture consisting of polyatomic molecules.
First, the model is extended to a model for charged particles. Then, the equations of magnetohydrodynamics are derived from this model. Next, we want to apply this extended model to a mixture of ions and electrons in a special physical constellation which can be found for example in a Tokamak. The mixture is partly in equilibrium in some regions, in some regions it deviates from equilibrium. The model presented in this thesis is taken for this purpose, since it has the advantage to separate the intra and interspecies interactions. Then, a new model based on a micro-macro decomposition is proposed in order to capture the physical regime of being partly in equilibrium, partly not. Theoretical results are presented, convergence rates to equilibrium in the space-homogeneous case and the Landau damping for mixtures, in order to compare it with numerical results.
Second, the model presented here is applied to a gas mixture which deviates from equilibrium such that it is described by Navier-Stokes equations on the macroscopic level. In this macroscopic description it is expected that four physical coefficients will show up, characterizing the physical behaviour of the gases, namely the diffusion coefficient, the viscosity coefficient, the heat conductivity and the thermal diffusion parameter. A Chapman-Enskog expansion of the model presented here is performed in order to capture three of these four physical coefficients. In addition, several possible extensions to an ellipsoidal statistical model for gas mixtures are proposed in order to capture the fourth coefficient. Three extensions are proposed: An extension which is as simple as possible, an intuitive extension copying the one species case and an extension which takes into account the physical motivation of the physicist Holway who invented the ellipsoidal statistical model for one species. Consistency of the extended models like conservation properties, positivity of all temperatures and the H-theorem are proven. The shape of global Maxwell distributions in equilibrium are specified.
Third, the model presented here is applied to polyatomic molecules. A multi component gas mixture with translational and internal energy degrees of freedom is considered. The two species are allowed to have different degrees of freedom in internal energy and are modelled by a system of kinetic ellipsoidal statistical equations. Consistency of this model is shown: conservation properties, positivity of the temperature, H-theorem and the form of Maxwell distributions in equilibrium. For numerical purposes the Chu reduction is applied to the developed model for polyatomic gases to reduce the complexity of the model and an application for a gas consisting of a mono-atomic and a diatomic gas is given.
Last, the limit from the model presented here to the dissipative Euler equations for gas mixtures is proven.
Nowadays, data centers are becoming increasingly dynamic due to the common adoption of virtualization technologies. Systems can scale their capacity on demand by growing and shrinking their resources dynamically based on the current load. However, the complexity and performance of modern data centers is influenced not only by the software architecture, middleware, and computing resources, but also by network virtualization, network protocols, network services, and configuration. The field of network virtualization is not as mature as server virtualization and there are multiple competing approaches and technologies. Performance modeling and prediction techniques provide a powerful tool to analyze the performance of modern data centers. However, given the wide variety of network virtualization approaches, no common approach exists for modeling and evaluating the performance of virtualized networks.
The performance community has proposed multiple formalisms and models for evaluating the performance of infrastructures based on different network virtualization technologies. The existing performance models can be divided into two main categories: coarse-grained analytical models and highly-detailed simulation models. Analytical performance models are normally defined at a high level of abstraction and thus they abstract many details of the real network and therefore have limited predictive power. On the other hand, simulation models are normally focused on a selected networking technology and take into account many specific performance influencing factors, resulting in detailed models that are tightly bound to a given technology, infrastructure setup, or to a given protocol stack.
Existing models are inflexible, that means, they provide a single solution method without providing means for the user to influence the solution accuracy and solution overhead. To allow for flexibility in the performance prediction, the user is required to build multiple different performance models obtaining multiple performance predictions. Each performance prediction may then have different focus, different performance metrics, prediction accuracy, and solving time.
The goal of this thesis is to develop a modeling approach that does not require the user to have experience in any of the applied performance modeling formalisms. The approach offers the flexibility in the modeling and analysis by balancing between: (a) generic character and low overhead of coarse-grained analytical models, and (b) the more detailed simulation models with higher prediction accuracy.
The contributions of this thesis intersect with technologies and research areas, such as: software engineering, model-driven software development, domain-specific modeling, performance modeling and prediction, networking and data center networks, network virtualization, Software-Defined Networking (SDN), Network Function Virtualization (NFV). The main contributions of this thesis compose the Descartes Network Infrastructure (DNI) approach and include:
• Novel modeling abstractions for virtualized network infrastructures. This includes two meta-models that define modeling languages for modeling data center network performance. The DNI and miniDNI meta-models provide means for representing network infrastructures at two different abstraction levels. Regardless of which variant of the DNI meta-model is used, the modeling language provides generic modeling elements allowing to describe the majority of existing and future network technologies, while at the same time abstracting factors that have low influence on the overall performance. I focus on SDN and NFV as examples of modern virtualization technologies.
• Network deployment meta-model—an interface between DNI and other meta- models that allows to define mapping between DNI and other descriptive models. The integration with other domain-specific models allows capturing behaviors that are not reflected in the DNI model, for example, software bottlenecks, server virtualization, and middleware overheads.
• Flexible model solving with model transformations. The transformations enable solving a DNI model by transforming it into a predictive model. The model transformations vary in size and complexity depending on the amount of data abstracted in the transformation process and provided to the solver. In this thesis, I contribute six transformations that transform DNI models into various predictive models based on the following modeling formalisms: (a) OMNeT++ simulation, (b) Queueing Petri Nets (QPNs), (c) Layered Queueing Networks (LQNs). For each of these formalisms, multiple predictive models are generated (e.g., models with different level of detail): (a) two for OMNeT++, (b) two for QPNs, (c) two for LQNs. Some predictive models can be solved using multiple alternative solvers resulting in up to ten different automated solving methods for a single DNI model.
• A model extraction method that supports the modeler in the modeling process by automatically prefilling the DNI model with the network traffic data. The contributed traffic profile abstraction and optimization method provides a trade-off by balancing between the size and the level of detail of the extracted profiles.
• A method for selecting feasible solving methods for a DNI model. The method proposes a set of solvers based on trade-off analysis characterizing each transformation with respect to various parameters such as its specific limitations, expected prediction accuracy, expected run-time, required resources in terms of CPU and memory consumption, and scalability.
• An evaluation of the approach in the context of two realistic systems. I evaluate the approach with focus on such factors like: prediction of network capacity and interface throughput, applicability, flexibility in trading-off between prediction accuracy and solving time. Despite not focusing on the maximization of the prediction accuracy, I demonstrate that in the majority of cases, the prediction error is low—up to 20% for uncalibrated models and up to 10% for calibrated models depending on the solving technique.
In summary, this thesis presents the first approach to flexible run-time performance prediction in data center networks, including network based on SDN. It provides ability to flexibly balance between performance prediction accuracy and solving overhead. The approach provides the following key benefits:
• It is possible to predict the impact of changes in the data center network on the performance. The changes include: changes in network topology, hardware configuration, traffic load, and applications deployment.
• DNI can successfully model and predict the performance of multiple different of network infrastructures including proactive SDN scenarios.
• The prediction process is flexible, that is, it provides balance between the granularity of the predictive models and the solving time. The decreased prediction accuracy is usually rewarded with savings of the solving time and consumption of resources required for solving.
• The users are enabled to conduct performance analysis using multiple different prediction methods without requiring the expertise and experience in each of the modeling formalisms.
The components of the DNI approach can be also applied to scenarios that are not considered in this thesis. The approach is generalizable and applicable for the following examples: (a) networks outside of data centers may be analyzed with DNI as long as the background traffic profile is known; (b) uncalibrated DNI models may serve as a basis for design-time performance analysis; (c) the method for extracting and compacting of traffic profiles may be used for other, non-network workloads as well.
Computer systems have replaced human work-force in many parts of everyday life, but there still exists a large number of tasks that cannot be automated, yet. This also includes tasks, which we consider to be rather simple like the categorization of image content or subjective ratings. Traditionally, these tasks have been completed by designated employees or outsourced to specialized companies. However, recently the crowdsourcing paradigm is more and more applied to complete such human-labor intensive tasks. Crowdsourcing aims at leveraging the huge number of Internet users all around the globe, which form a potentially highly available, low-cost, and easy accessible work-force.
To enable the distribution of work on a global scale, new web-based services emerged, so called crowdsourcing platforms, that act as mediator between employers posting tasks and workers completing tasks. However, the crowdsourcing approach, especially the large anonymous worker crowd, results in two types of challenges. On the one hand, there are technical challenges like the dimensioning of crowdsourcing platform infrastructure or the interconnection of crowdsourcing platforms and machine clouds to build hybrid services. On the other hand, there are conceptual challenges like identifying reliable workers or migrating traditional off-line work to the crowdsourcing environment. To tackle these challenges, this monograph analyzes and models current crowdsourcing systems to optimize crowdsourcing workflows and the underlying infrastructure. First, a categorization of crowdsourcing tasks and platforms is developed to derive generalizable properties. Based on this categorization and an exemplary analysis of a commercial crowdsourcing platform, models for different aspects of crowdsourcing platforms and crowdsourcing mechanisms are developed. A special focus is put on quality assurance mechanisms for crowdsourcing tasks, where the models are used to assess the suitability and costs of existing approaches for different types of tasks. Further, a novel quality assurance mechanism solely based on user-interactions is proposed and its feasibility is shown. The findings from the analysis of existing platforms, the derived models, and the developed quality assurance mechanisms are finally used to derive best practices for two crowdsourcing use-cases, crowdsourcing-based network measurements and crowdsourcing-based subjective user studies. These two exemplary use-cases cover aspects typical for a large range of crowdsourcing tasks and illustrated the potential benefits, but also resulting challenges when using crowdsourcing.
With the ongoing digitalization and globalization of the labor markets, the crowdsourcing paradigm is expected to gain even more importance in the next years. This is already evident in the currently new emerging fields of crowdsourcing, like enterprise crowdsourcing or mobile crowdsourcing. The models developed in the monograph enable platform providers to optimize their current systems and employers to optimize their workflows to increase their commercial success. Moreover, the results help to improve the general understanding of crowdsourcing systems, a key for identifying necessary adaptions and future improvements.
Irrigated agriculture in the Khorezm region in the arid inner Aral Sea Basin faces enormous challenges due to a legacy of cotton monoculture and non-sustainable water use. Regional crop growth monitoring and yield estimation continuously gain in importance, especially with regard to climate change and food security issues. Remote sensing is the ideal tool for regional-scale analysis, especially in regions where ground-truth data collection is difficult and data availability is scarce. New satellite systems promise higher spatial and temporal resolutions. So-called light use efficiency (LUE) models are based on the fraction of photosynthetic active radiation absorbed by vegetation (FPAR), a biophysical parameter that can be derived from satellite measurements. The general objective of this thesis was to use satellite data, in conjunction with an adapted LUE model, for inferring crop yield of cotton and rice at field (6.5 m) and regional (250 m) scale for multiple years (2003-2009), in order to assess crop yield variations in the study area. Intensive field measurements of FPAR were conducted in the Khorezm region during the growing season 2009. RapidEye imagery was acquired approximately bi-weekly during this time. The normalized difference vegetation index (NDVI) was calculated for all images. Linear regression between image-based NDVI and field-based FPAR was conducted. The analyses resulted in high correlations, and the resulting regression equations were used to generate time series of FPAR at the RapidEye level. RapidEye-based FPAR was subsequently aggregated to the MODIS scale and used to validate the existing MODIS FPAR product. This step was carried out to evaluate the applicability of MODIS FPAR for regional vegetation monitoring. The validation revealed that the MODIS product generally overestimates RapidEye FPAR by about 6 to 15 %. Mixture of crop types was found to be a problem at the 1 km scale, but less severe at the 250 m scale. Consequently, high resolution FPAR was used to calibrate 8-day, 250 m MODIS NDVI data, this time by linear regression of RapidEye-based FPAR against MODIS-based NDVI. The established FPAR datasets, for both RapidEye and MODIS, were subsequently assimilated into a LUE model as the driving variable. This model operated at both satellite scales, and both required an estimation of further parameters like the photosynthetic active radiation (PAR) or the actual light use efficiency (LUEact). The latter is influenced by crop stress factors like temperature or water stress, which were taken account of in the model. Water stress was especially important, and calculated via the ratio of the actual (ETact) to the potential, crop-specific evapotranspiration (ETc). Results showed that water stress typically occurred between the beginning of May and mid-September and beginning of May and end of July for cotton and rice crops, respectively. The mean water stress showed only minor differences between years. Exceptions occurred in 2008 and 2009, where the mean water stress was higher and lower, respectively. In 2008, this was likely caused by generally reduced water availability in the whole region. Model estimations were evaluated using field-based harvest information (RapidEye) and statistical information at district level (MODIS). The results showed that the model at both the RapidEye and the MODIS scale can estimate regional crop yield with acceptable accuracy. The RMSE for the RapidEye scale amounted to 29.1 % for cotton and 30.4 % for rice, respectively. At the MODIS scale, depending on the year and evaluated at Oblast level, the RMSE ranged from 10.5 % to 23.8 % for cotton and from -0.4 % to -19.4 % for rice. Altogether, the RapidEye scale model slightly underestimated cotton (bias = 0.22) and rice yield (bias = 0.11). The MODIS-scale model, on the other hand, also underestimated official rice yield (bias from 0.01 to 0.87), but overestimated official cotton yield (bias from -0.28 to -0.6). Evaluation of the MODIS scale revealed that predictions were very accurate for some districts, but less for others. The produced crop yield maps indicated that crop yield generally decreases with distance to the river. The lowest yields can be found in the southern districts, close to the desert. From a temporal point of view, there were areas characterized by low crop yields over the span of the seven years investigated. The study at hand showed that light use efficiency-based modeling, based on remote sensing data, is a viable way for regional crop yield prediction. The found accuracies were good within the boundaries of related research. From a methodological viewpoint, the work carried out made several improvements to the existing LUE models reported in the literature, e.g. the calibration of FPAR for the study region using in situ and high resolution RapidEye imagery and the incorporation of crop-specific water stress in the calculation.