004 Datenverarbeitung; Informatik
Refine
Has Fulltext
- yes (287)
Year of publication
Document Type
- Journal article (128)
- Doctoral Thesis (81)
- Working Paper (37)
- Preprint (19)
- Conference Proceeding (9)
- Jahresbericht (5)
- Master Thesis (4)
- Report (3)
- Other (1)
Language
- English (259)
- German (27)
- Multiple languages (1)
Keywords
- virtual reality (16)
- Datennetz (14)
- Leistungsbewertung (13)
- Quran (8)
- Robotik (8)
- Koran (7)
- Mobiler Roboter (7)
- Text Mining (7)
- Autonomer Roboter (6)
- Optimierung (6)
Institute
- Institut für Informatik (205)
- Theodor-Boveri-Institut für Biowissenschaften (29)
- Institut Mensch - Computer - Medien (17)
- Institut für deutsche Philologie (17)
- Institut für Klinische Epidemiologie und Biometrie (7)
- Rechenzentrum (7)
- Center for Computational and Theoretical Biology (4)
- Graduate School of Science and Technology (3)
- Medizinische Klinik und Poliklinik II (3)
- Institut für Funktionsmaterialien und Biofabrikation (2)
Schriftenreihe
Sonstige beteiligte Institutionen
- Cologne Game Lab (2)
- Birmingham City University (1)
- DATE Lab, KITE Research Insititute, University Health Network, Toronto, Canada (1)
- EMBL Heidelberg (1)
- INAF Padova, Italy (1)
- Jacobs University Bremen, Germany (1)
- Open University of the Netherlands (1)
- Servicezentrum Medizin-Informatik (Universitätsklinikum) (1)
- Social and Technological Systems (SaTS) lab, School of Art, Media, Performance and Design, York University, Toronto, Canada (1)
- TH Köln (1)
Our world is becoming more and more connected. The Internet of things (IoT) is one of the emerging technologies connecting an increasing number of devices to the Internet and offering value-added services. The desire for more and more connectivity and smart control, the low technical effort, and the decreased cost of Internet-enabled devices favor this trend. IoT enriches a variety of application domains, including home automation, smart grid, Industrial Internet of Things (IIoT), vehicle-to-everything (V2X) communication, smart city, health care, disaster forecasting, and the observation of animals and nature. A high diversity of IoT devices ranges from tiny battery-powered sensors to artificial intelligence (AI) applications running on powerful hardware. The types of data transmission are also heterogeneous, such as rarely transmitted sensor data of just a few bytes, high-frequency messaging data, or large, continuous data streams.
The dynamics of IoT systems, the amount of IoT devices, and the high diversity introduce challenges to applications and infrastructure. The infrastructure components include computing nodes, backend software systems, databases, storage systems, as well as access and data center networks. The systems must consider performance aspects in order to maintain the desired quality of experience (QoE) for users. Due to varying requirements, the systems need to consider performance aspects both during design as well as during runtime by supporting adaptation. This dissertation deals with design optimizations and runtime adaptations of networks from the perspective of application developers and data center operators through measurement, modeling, and prediction of network quality with a focus on IoT. This work covers the three research areas of: (i) quality prediction in mobile networks, (ii) performance evaluation and testing of communication protocols, and (iii) adaptation of data center networks.
The first research area addresses quality prediction in mobile networks. Applications must react to changing network conditions, such as varying coverage or even dead spots, which are intrinsic to mobile networks. Such adjustments include, for example, the adjustment of a video bitrate or the deferral of low-priority data transfers. A long-term network quality prediction is beneficial for such communication adaptation mechanisms. Furthermore, applications could inform users in advance about areas with limited network coverage, and route planning software can include the prediction. However, existing approaches focus primarily on short-term predictions and require current connectivity parameters. We identified the need for prediction approaches that provide long-term predictions with a horizon of several minutes or even hours.
The second research area deals with the performance evaluation and testing of communication protocols. Various communication protocols have emerged over time, differing in their suitability for specific scenarios. IoT system designers and application developers should be aware of the characteristics of the communication protocols and consider them when selecting a suitable protocol. Existing work on performance evaluation of communication protocols focus on scalability and efficiency in stable networks. This work addresses the shortcomings of such tools by supporting the performance evaluation of communication protocols under constrained and varying network quality conditions.
The third research area concerns the adaptation of data center networks. The varying number of devices and load also affects the data center networks. Service providers that run their applications in the cloud define the requirements on infrastructure performance through service-level agreements (SLAs) with the data center operators. Data center operators must continuously monitor their networks for SLA-compliance and respond to performance bottlenecks. Existing network adaptation approaches focus on specific adaptation aspects and technologies such as rerouting or software-defined networking (SDN) operations. This work identifies the need for an approach supporting different, technology-agnostic network adaptation strategies. Such an approach should suggest adaptations to the data center operators or apply them directly in order to cope with the dynamics of communication. The adaptation approach should validate the adaptation options automatically before suggesting them.
Motivated by the identified shortcomings and needs, we summarize the contributions of this dissertation as follows.
Technique for continuously measuring cellular network quality: We design and implement a technique for collecting measurements in cellular networks with the following key characteristics. First, the measurement technique supports continuous and automatic measurements, enabling car drivers or cyclists to collect measurements without any user interaction. Second, the technique integrates a fast bitrate measurement algorithm with low data volume consumption, making it well-suited for measurements within short time intervals and data collection by volunteers. Third, we implemented the technique on the Android operating system, eliminating the need for specialized measurement equipment, which facilitates capturing measurements by a crowd of users using commodity smartphones.
Collecting, processing, and publishing a cellular network measurement dataset: Using our developed technique for quality measurements in cellular networks, we collected 326,157 measurements and published them as open data. We captured the measurements primarily in Germany within a small region. Repeated use of the same routes allows investigation of network quality changes over time and under different conditions. This dissertation analyzes the dataset and provides algorithms for detecting implausible and abnormal values. The cleaned dataset is particularly suitable for further processing in research, for example, as input for machine learning algorithms.
Prediction models for connection quality in cellular networks: We developed an approach for predicting the download bitrate in cellular networks along a route. The approach uses a machine learning model trained on the collected dataset. The model requires passive measurements along the route for a prediction. These measurements do not need to send or receive any packets and can, therefore, be collected by a crowd of users through an application in the background. By eliminating the need for actual connection parameters, our approach allows prediction well in advance and what-if analysis for long-term decisions. The prediction enables applications to react in advance or inform the user about the expected network quality and coverage.
Benchmarking framework for the performance analysis of publish/subscribe protocols: We developed ComBench, a benchmarking framework for publish/subscribe protocols, focusing on networks with limited and varying quality. ComBench emulates clients that send and receive messages according to a workload definition. These clients can switch between different publish/subscribe protocols. During the benchmark run, ComBench can apply constraints like bitrate limitations or packet losses on their network connection. ComBench provides built-in collection and reporting of performance metrics. IoT system designers can use ComBench to test IoT scenarios matching real-world conditions in wireless networks.
Network emulator for testing the communication behavior of applications: Our developed IoT network emulator can emulate networks with constrained and varying network conditions. Software developers can easily integrate it into software testing pipelines through its simple instantiation via the command line and its provided interfaces. These pipelines can then automatically test the desired behavior of an application under different network conditions for each new release.
Online adaptation framework for data center networks: We designed an online adaptation framework for data center networks that utilizes a network model to simulate the workload for detecting bottlenecks and SLA violations. A control loop applies adaptation strategies on the model, simulates the suggested configuration changes, and iteratively executes additional adaptation strategies as needed. The validation through simulation and automated optimization ensures that the framework only suggests SLA compliant and cost-optimal configurations.
Our developed approaches are valuable contributions to academic research as well as practical applications. To the best of our knowledge, our work is the first to provide a long-term prediction approach for mobile network quality without extensive prior bitrate measurements. Evaluation of the approach shows that it can estimate connection quality trends well. Regarding the performance evaluation of application layer protocols, our approaches differ from existing tools through their multi-protocol-support and capabilities for emulating network limitations and quality variations. A case study illustrates the different application domains of our performance evaluation approaches. The main characteristics of our approach for the online adaptation of data center networks are technology independence and validation of the suggested configuration changes. The evaluation shows that the adaptation algorithm scales well even for more extensive networks and is Pareto-optimal with respect to multiple cost dimensions.
In geographic data analysis, one is often given point data of different categories (such as facilities of a university categorized by department). Drawing upon recent research on set visualization, we want to visualize category membership by connecting points of the same category with visual links. Existing approaches that follow this path usually insist on connecting all members of a category, which may lead to many crossings and visual clutter. We propose an approach that avoids crossings between connections of different categories completely. Instead of connecting all data points of the same category, we subdivide categories into smaller, local clusters where needed. We do a case study comparing the legibility of drawings produced by our approach and those by existing approaches.
In our problem formulation, we are additionally given a graph G on the data points whose edges express some sort of proximity. Our aim is to find a subgraph G′ of G with the following properties: (i) edges connect only data points of the same category, (ii) no two edges cross, and (iii) the number of connected components (clusters) is minimized. We then visualize the clusters in G′. For arbitrary graphs, the resulting optimization problem, Cluster Minimization, is NP-hard (even to approximate). Therefore, we introduce two heuristics. We do an extensive benchmark test on real-world data. Comparisons with exact solutions indicate that our heuristics do astonishing well for certain relative-neighborhood graphs.
The steadily increasing usage of smart meters generates a valuable amount of high-resolution data about the individual energy consumption and production of local energy systems. Private households install more and more photovoltaic systems, battery storage and big consumers like heat pumps. Thus, our vision is to augment these collected smart meter time series of a complete system (e.g., a city, town or complex institutions like airports) with simulatively added previously named components. We, therefore, propose a novel digital twin of such an energy system based solely on a complete set of smart meter data including additional building data. Based on the additional geospatial data, the twin is intended to represent the addition of the abovementioned components as realistically as possible. Outputs of the twin can be used as a decision support for either system operators where to strengthen the system or for individual households where and how to install photovoltaic systems and batteries. Meanwhile, the first local energy system operators had such smart meter data of almost all residential consumers for several years. We acquire those of an exemplary operator and discuss a case study presenting some features of our digital twin and highlighting the value of the combination of smart meter and geospatial data.
Background
Colorectal cancer is a leading cause of cancer-related deaths worldwide. The best method to prevent CRC is a colonoscopy. However, not all colon polyps have the risk of becoming cancerous. Therefore, polyps are classified using different classification systems. After the classification, further treatment and procedures are based on the classification of the polyp. Nevertheless, classification is not easy. Therefore, we suggest two novel automated classifications system assisting gastroenterologists in classifying polyps based on the NICE and Paris classification.
Methods
We build two classification systems. One is classifying polyps based on their shape (Paris). The other classifies polyps based on their texture and surface patterns (NICE). A two-step process for the Paris classification is introduced: First, detecting and cropping the polyp on the image, and secondly, classifying the polyp based on the cropped area with a transformer network. For the NICE classification, we design a few-shot learning algorithm based on the Deep Metric Learning approach. The algorithm creates an embedding space for polyps, which allows classification from a few examples to account for the data scarcity of NICE annotated images in our database.
Results
For the Paris classification, we achieve an accuracy of 89.35 %, surpassing all papers in the literature and establishing a new state-of-the-art and baseline accuracy for other publications on a public data set. For the NICE classification, we achieve a competitive accuracy of 81.13 % and demonstrate thereby the viability of the few-shot learning paradigm in polyp classification in data-scarce environments. Additionally, we show different ablations of the algorithms. Finally, we further elaborate on the explainability of the system by showing heat maps of the neural network explaining neural activations.
Conclusion
Overall we introduce two polyp classification systems to assist gastroenterologists. We achieve state-of-the-art performance in the Paris classification and demonstrate the viability of the few-shot learning paradigm in the NICE classification, addressing the prevalent data scarcity issues faced in medical machine learning.
Scalability is often mentioned in literature, but a stringent definition is missing. In particular, there is no general scalability assessment which clearly indicates whether a system scales or not or whether a system scales better than another. The key contribution of this article is the definition of a scalability index (SI) which quantifies if a system scales in comparison to another system, a hypothetical system, e.g., linear system, or the theoretically optimal system. The suggested SI generalizes different metrics from literature, which are specialized cases of our SI. The primary target of our scalability framework is, however, benchmarking of two systems, which does not require any reference system. The SI is demonstrated and evaluated for different use cases, that are (1) the performance of an IoT load balancer depending on the system load, (2) the availability of a communication system depending on the size and structure of the network, (3) scalability comparison of different location selection mechanisms in fog computing with respect to delays and energy consumption; (4) comparison of time-sensitive networking (TSN) mechanisms in terms of efficiency and utilization. Finally, we discuss how to use and how not to use the SI and give recommendations and guidelines in practice. To the best of our knowledge, this is the first work which provides a general SI for the comparison and benchmarking of systems, which is the primary target of our scalability analysis.
In recent history, normalized digital surface models (nDSMs) have been constantly gaining importance as a means to solve large-scale geographic problems. High-resolution surface models are precious, as they can provide detailed information for a specific area. However, measurements with a high resolution are time consuming and costly. Only a few approaches exist to create high-resolution nDSMs for extensive areas. This article explores approaches to extract high-resolution nDSMs from low-resolution Sentinel-2 data, allowing us to derive large-scale models. We thereby utilize the advantages of Sentinel 2 being open access, having global coverage, and providing steady updates through a high repetition rate. Several deep learning models are trained to overcome the gap in producing high-resolution surface maps from low-resolution input data. With U-Net as a base architecture, we extend the capabilities of our model by integrating tailored multiscale encoders with differently sized kernels in the convolution as well as conformed self-attention inside the skip connection gates. Using pixelwise regression, our U-Net base models can achieve a mean height error of approximately 2 m. Moreover, through our enhancements to the model architecture, we reduce the model error by more than 7%.
Background: Due to the importance of radiologic examinations, such as X-rays or computed tomography scans, for many clinical diagnoses, the optimal use of the radiology department is 1 of the primary goals of many hospitals.
Objective: This study aims to calculate the key metrics of this use by creating a radiology data warehouse solution, where data from radiology information systems (RISs) can be imported and then queried using a query language as well as a graphical user interface (GUI).
Methods: Using a simple configuration file, the developed system allowed for the processing of radiology data exported from any kind of RIS into a Microsoft Excel, comma-separated value (CSV), or JavaScript Object Notation (JSON) file. These data were then imported into a clinical data warehouse. Additional values based on the radiology data were calculated during this import process by implementing 1 of several provided interfaces. Afterward, the query language and GUI of the data warehouse were used to configure and calculate reports on these data. For the most common types of requested reports, a web interface was created to view their numbers as graphics.
Results: The tool was successfully tested with the data of 4 different German hospitals from 2018 to 2021, with a total of 1,436,111 examinations. The user feedback was good, since all their queries could be answered if the available data were sufficient. The initial processing of the radiology data for using them with the clinical data warehouse took (depending on the amount of data provided by each hospital) between 7 minutes and 1 hour 11 minutes. Calculating 3 reports of different complexities on the data of each hospital was possible in 1-3 seconds for reports with up to 200 individual calculations and in up to 1.5 minutes for reports with up to 8200 individual calculations.
Conclusions: A system was developed with the main advantage of being generic concerning the export of different RISs as well as concerning the configuration of queries for various reports. The queries could be configured easily using the GUI of the data warehouse, and their results could be exported into the standard formats Excel and CSV for further processing.
Group-based communication is a highly popular communication paradigm, which is especially prominent in mobile instant messaging (MIM) applications, such as WhatsApp. Chat groups in MIM applications facilitate the sharing of various types of messages (e.g., text, voice, image, video) among a large number of participants. As each message has to be transmitted to every other member of the group, which multiplies the traffic, this has a massive impact on the underlying communication networks. However, most chat groups are private and network operators cannot obtain deep insights into MIM communication via network measurements due to end-to-end encryption. Thus, the generation of traffic is not well understood, given that it depends on sizes of communication groups, speed of communication, and exchanged message types. In this work, we provide a huge data set of 5,956 private WhatsApp chat histories, which contains over 76 million messages from more than 117,000 users. We describe and model the properties of chat groups and users, and the communication within these chat groups, which gives unprecedented insights into private MIM communication. In addition, we conduct exemplary measurements for the most popular message types, which empower the provided models to estimate the traffic over time in a chat group.
Cooperative, connected and automated mobility (CCAM) systems depend on a reliable communication to provide their service and more crucially to ensure the safety of users. One way to ensure the reliability of a data transmission is to use multiple transmission technologies in combination with redundant flows. In this paper, we describe a system requiring multipath communication in the context of CCAM. To this end, we introduce a data plane-based scheduler that uses replication and integration modules to provide redundant and transparent multipath communication. We provide an analytical model for the full replication module of the system and give an overview of how and where the data-plane scheduler components can be realized.
Knowledge about ransomware is important for protecting sensitive data and for participating in public debates about suitable regulation regarding its security. However, as of now, this topic has received little to no attention in most school curricula. As such, it is desirable to analyze what citizens can learn about this topic outside of formal education, e.g., from news articles. This analysis is both relevant to analyzing the public discourse about ransomware, as well as to identify what aspects of this topic should be included in the limited time available for this topic in formal education. Thus, this paper was motivated both by educational and media research. The central goal is to explore how the media reports on this topic and, additionally, to identify potential misconceptions that could stem from this reporting. To do so, we conducted an exploratory case study into the reporting of 109 media articles regarding a high-impact ransomware event: the shutdown of the Colonial Pipeline (located in the east of the USA). We analyzed how the articles introduced central terminology, what details were provided, what details were not, and what (mis-)conceptions readers might receive from them. Our results show that an introduction of the terminology and technical concepts of security is insufficient for a complete understanding of the incident. Most importantly, the articles may lead to four misconceptions about ransomware that are likely to lead to misleading conclusions about the responsibility for the incident and possible political and technical options to prevent such attacks in the future.