004 Datenverarbeitung; Informatik
Refine
Has Fulltext
- yes (74)
Is part of the Bibliography
- yes (74) (remove)
Year of publication
Document Type
- Journal article (74) (remove)
Language
- English (74) (remove)
Keywords
- virtual reality (11)
- machine learning (5)
- augmented reality (3)
- immersion (3)
- Deep learning (2)
- Quadrocopter (2)
- Quadrotor (2)
- XR (2)
- artificial intelligence (2)
- automation (2)
Institute
- Institut für Informatik (74) (remove)
This article presents a novel method for controlling a virtual audience system (VAS) in Virtual Reality (VR) application, called STAGE, which has been originally designed for supervised public speaking training in university seminars dedicated to the preparation and delivery of scientific talks. We are interested in creating pedagogical narratives: narratives encompass affective phenomenon and rather than organizing events changing the course of a training scenario, pedagogical plans using our system focus on organizing the affects it arouses for the trainees. Efficiently controlling a virtual audience towards a specific training objective while evaluating the speaker’s performance presents a challenge for a seminar instructor: the high level of cognitive and physical demands required to be able to control the virtual audience, whilst evaluating speaker’s performance, adjusting and allowing it to quickly react to the user’s behaviors and interactions. It is indeed a critical limitation of a number of existing systems that they rely on a Wizard of Oz approach, where the tutor drives the audience in reaction to the user’s performance. We address this problem by integrating with a VAS a high-level control component for tutors, which allows using predefined audience behavior rules, defining custom ones, as well as intervening during run-time for finer control of the unfolding of the pedagogical plan. At its core, this component offers a tool to program, select, modify and monitor interactive training narratives using a high-level representation. The STAGE offers the following features: i) a high-level API to program pedagogical narratives focusing on a specific public speaking situation and training objectives, ii) an interactive visualization interface iii) computation and visualization of user metrics, iv) a semi-autonomous virtual audience composed of virtual spectators with automatic reactions to the speaker and surrounding spectators while following the pedagogical plan V) and the possibility for the instructor to embody a virtual spectator to ask questions or guide the speaker from within the Virtual Environment. We present here the design, and implementation of the tutoring system and its integration in STAGE, and discuss its reception by end-users.
Virtual environments (VEs) can evoke and support emotions, as experienced when playing emotionally arousing games. We theoretically approach the design of fear and joy evoking VEs based on a literature review of empirical studies on virtual and real environments as well as video games’ reviews and content analyses. We define the design space and identify central design elements that evoke specific positive and negative emotions. Based on that, we derive and present guidelines for emotion-inducing VE design with respect to design themes, colors and textures, and lighting configurations. To validate our guidelines in two user studies, we 1) expose participants to 360° videos of VEs designed following the individual guidelines and 2) immerse them in a neutral, positive and negative emotion-inducing VEs combining all respective guidelines in Virtual Reality. The results support our theoretically derived guidelines by revealing significant differences in terms of fear and joy induction.
The rapid development of green and sustainable materials opens up new possibilities in the field of applied research. Such materials include nanocellulose composites that can integrate many components into composites and provide a good chassis for smart devices. In our study, we evaluate four approaches for turning a nanocellulose composite into an information storage or processing device: 1) nanocellulose can be a suitable carrier material and protect information stored in DNA. 2) Nucleotide-processing enzymes (polymerase and exonuclease) can be controlled by light after fusing them with light-gating domains; nucleotide substrate specificity can be changed by mutation or pH change (read-in and read-out of the information). 3) Semiconductors and electronic capabilities can be achieved: we show that nanocellulose is rendered electronic by iodine treatment replacing silicon including microstructures. Nanocellulose semiconductor properties are measured, and the resulting potential including single-electron transistors (SET) and their properties are modeled. Electric current can also be transported by DNA through G-quadruplex DNA molecules; these as well as classical silicon semiconductors can easily be integrated into the nanocellulose composite. 4) To elaborate upon miniaturization and integration for a smart nanocellulose chip device, we demonstrate pH-sensitive dyes in nanocellulose, nanopore creation, and kinase micropatterning on bacterial membranes as well as digital PCR micro-wells. Future application potential includes nano-3D printing and fast molecular processors (e.g., SETs) integrated with DNA storage and conventional electronics. This would also lead to environment-friendly nanocellulose chips for information processing as well as smart nanocellulose composites for biomedical applications and nano-factories.
A key feature for Internet of Things (IoT) is to control what content is available to each user. To handle this access management, encryption schemes can be used. Due to the diverse usage of encryption schemes, there are various realizations of 1-to-1, 1-to-n, and n-to-n schemes in the literature. This multitude of encryption methods with a wide variety of properties presents developers with the challenge of selecting the optimal method for a particular use case, which is further complicated by the fact that there is no overview of existing encryption schemes. To fill this gap, we envision a cryptography encyclopedia providing such an overview of existing encryption schemes. In this survey paper, we take a first step towards such an encyclopedia by creating a sub-encyclopedia for secure group communication (SGC) schemes, which belong to the n-to-n category. We extensively surveyed the state-of-the-art and classified 47 different schemes. More precisely, we provide (i) a comprehensive overview of the relevant security features, (ii) a set of relevant performance metrics, (iii) a classification for secure group communication schemes, and (iv) workflow descriptions of the 47 schemes. Moreover, we perform a detailed performance and security evaluation of the 47 secure group communication schemes. Based on this evaluation, we create a guideline for the selection of secure group communication schemes.
Around 4.9 billion Internet users worldwide watch billions of hours of online video every day. As a result, streaming is by far the predominant type of traffic in communication networks. According to Google statistics, three out of five video views come from mobile devices. Thus, in view of the continuous technological advances in end devices and increasing mobile use, datasets for mobile streaming are indispensable in research but only sparsely dealt with in literature so far. With this public dataset, we provide 1,081 hours of time-synchronous video measurements at network, transport, and application layer with the native YouTube streaming client on mobile devices. The dataset includes 80 network scenarios with 171 different individual bandwidth settings measured in 5,181 runs with limited bandwidth, 1,939 runs with emulated 3 G/4 G traces, and 4,022 runs with pre-defined bandwidth changes. This corresponds to 332 GB video payload. We present the most relevant quality indicators for scientific use, i.e., initial playback delay, streaming video quality, adaptive video quality changes, video rebuffering events, and streaming phases.
Background
Machine learning, especially deep learning, is becoming more and more relevant in research and development in the medical domain. For all the supervised deep learning applications, data is the most critical factor in securing successful implementation and sustaining the progress of the machine learning model. Especially gastroenterological data, which often involves endoscopic videos, are cumbersome to annotate. Domain experts are needed to interpret and annotate the videos. To support those domain experts, we generated a framework. With this framework, instead of annotating every frame in the video sequence, experts are just performing key annotations at the beginning and the end of sequences with pathologies, e.g., visible polyps. Subsequently, non-expert annotators supported by machine learning add the missing annotations for the frames in-between.
Methods
In our framework, an expert reviews the video and annotates a few video frames to verify the object’s annotations for the non-expert. In a second step, a non-expert has visual confirmation of the given object and can annotate all following and preceding frames with AI assistance. After the expert has finished, relevant frames will be selected and passed on to an AI model. This information allows the AI model to detect and mark the desired object on all following and preceding frames with an annotation. Therefore, the non-expert can adjust and modify the AI predictions and export the results, which can then be used to train the AI model.
Results
Using this framework, we were able to reduce workload of domain experts on average by a factor of 20 on our data. This is primarily due to the structure of the framework, which is designed to minimize the workload of the domain expert. Pairing this framework with a state-of-the-art semi-automated AI model enhances the annotation speed further. Through a prospective study with 10 participants, we show that semi-automated annotation using our tool doubles the annotation speed of non-expert annotators compared to a well-known state-of-the-art annotation tool.
Conclusion
In summary, we introduce a framework for fast expert annotation for gastroenterologists, which reduces the workload of the domain expert considerably while maintaining a very high annotation quality. The framework incorporates a semi-automated annotation system utilizing trained object detection models. The software and framework are open-source.
Towards LoRaWAN without data loss: studying the performance of different channel access approaches
(2022)
The Long Range Wide Area Network (LoRaWAN) is one of the fastest growing Internet of Things (IoT) access protocols. It operates in the license free 868 MHz band and gives everyone the possibility to create their own small sensor networks. The drawback of this technology is often unscheduled or random channel access, which leads to message collisions and potential data loss. For that reason, recent literature studies alternative approaches for LoRaWAN channel access. In this work, state-of-the-art random channel access is compared with alternative approaches from the literature by means of collision probability. Furthermore, a time scheduled channel access methodology is presented to completely avoid collisions in LoRaWAN. For this approach, an exhaustive simulation study was conducted and the performance was evaluated with random access cross-traffic. In a general theoretical analysis the limits of the time scheduled approach are discussed to comply with duty cycle regulations in LoRaWAN.
Presence is often considered the most important quale describing the subjective feeling of being in a computer-generated and/or computer-mediated virtual environment. The identification and separation of orthogonal presence components, i.e., the place illusion and the plausibility illusion, has been an accepted theoretical model describing Virtual Reality (VR) experiences for some time. This perspective article challenges this presence-oriented VR theory. First, we argue that a place illusion cannot be the major construct to describe the much wider scope of virtual, augmented, and mixed reality (VR, AR, MR: or XR for short). Second, we argue that there is no plausibility illusion but merely plausibility, and we derive the place illusion caused by the congruent and plausible generation of spatial cues and similarly for all the current model’s so-defined illusions. Finally, we propose congruence and plausibility to become the central essential conditions in a novel theoretical model describing XR experiences and effects.
Crowdsensing offers a cost-effective way to collect large amounts of environmental sensor data; however, the spatial distribution of crowdsensing sensors can hardly be influenced, as the participants carry the sensors, and, additionally, the quality of the crowdsensed data can vary significantly. Hybrid systems that use mobile users in conjunction with fixed sensors might help to overcome these limitations, as such systems allow assessing the quality of the submitted crowdsensed data and provide sensor values where no crowdsensing data are typically available. In this work, we first used a simulation study to analyze a simple crowdsensing system concerning the detection performance of spatial events to highlight the potential and limitations of a pure crowdsourcing system. The results indicate that even if only a small share of inhabitants participate in crowdsensing, events that have locations correlated with the population density can be easily and quickly detected using such a system. On the contrary, events with uniformly randomly distributed locations are much harder to detect using a simple crowdsensing-based approach. A second evaluation shows that hybrid systems improve the detection probability and time. Finally, we illustrate how to compute the minimum number of fixed sensors for the given detection time thresholds in our exemplary scenario.
Dynamic point cloud compression based on projections, surface reconstruction and video compression
(2021)
In this paper we will present a new dynamic point cloud compression based on different projection types and bit depth, combined with the surface reconstruction algorithm and video compression for obtained geometry and texture maps. Texture maps have been compressed after creating Voronoi diagrams. Used video compression is specific for geometry (FFV1) and texture (H.265/HEVC). Decompressed point clouds are reconstructed using a Poisson surface reconstruction algorithm. Comparison with the original point clouds was performed using point-to-point and point-to-plane measures. Comprehensive experiments show better performance for some projection maps: cylindrical, Miller and Mercator projections.
Global Navigation Satellite System (GNSS) provides accurate positioning data for vehicular navigation in open outdoor environment. In an indoor environment, Light Detection and Ranging (LIDAR) Simultaneous Localization and Mapping (SLAM) establishes a two-dimensional map and provides positioning data. However, LIDAR can only provide relative positioning data and it cannot directly provide the latitude and longitude of the current position. As a consequence, GNSS/Inertial Navigation System (INS) integrated navigation could be employed in outdoors, while the indoors part makes use of INS/LIDAR integrated navigation and the corresponding switching navigation will make the indoor and outdoor positioning consistent. In addition, when the vehicle enters the garage, the GNSS signal will be blurred for a while and then disappeared. Ambiguous GNSS satellite signals will lead to the continuous distortion or overall drift of the positioning trajectory in the indoor condition. Therefore, an INS/LIDAR seamless integrated navigation algorithm and a switching algorithm based on vehicle navigation system are designed. According to the experimental data, the positioning accuracy of the INS/LIDAR navigation algorithm in the simulated environmental experiment is 50% higher than that of the Dead Reckoning (DR) algorithm. Besides, the switching algorithm developed based on the INS/LIDAR integrated navigation algorithm can achieve 80% success rate in navigation mode switching.
To deliver the best user experience (UX), the human-centered design cycle (HCDC) serves as a well-established guideline to application developers. However, it does not yet cover network-specific requirements, which become increasingly crucial, as most applications deliver experience over the Internet. The missing network-centric view is provided by Quality of Experience (QoE), which could team up with UX towards an improved overall experience. By considering QoE aspects during the development process, it can be achieved that applications become network-aware by design. In this paper, the Quality of Experience Centered Design Cycle (QoE-CDC) is proposed, which provides guidelines on how to design applications with respect to network-specific requirements and QoE. Its practical value is showcased for popular application types and validated by outlining the design of a new smartphone application. We show that combining HCDC and QoE-CDC will result in an application design, which reaches a high UX and avoids QoE degradation.
Having a mixed-cultural membership becomes increasingly common in our modern society. It is thus beneficial in several ways to create Intelligent Virtual Agents (IVAs) that reflect a mixed-cultural background as well, e.g., for educational settings. For research with such IVAs, it is essential that they are classified as non-native by members of a target culture. In this paper, we focus on variations of IVAs’ speech to create the impression of non-native speakers that are identified as such by speakers of two different mother tongues. In particular, we investigate grammatical mistakes and identify thresholds beyond which the agents is clearly categorised as a non-native speaker. Therefore, we conducted two experiments: one for native speakers of German, and one for native speakers of English. Results of the German study indicate that beyond 10% of word order mistakes and 25% of infinitive mistakes German-speaking IVAs are perceived as non-native speakers. Results of the English study indicate that beyond 50% of omission mistakes and 50% of infinitive mistakes English-speaking IVAs are perceived as non-native speakers. We believe these thresholds constitute helpful guidelines for computational approaches of non-native speaker generation, simplifying research with IVAs in mixed-cultural settings.
Proximity dimensions and the emergence of collaboration: a HypTrails study on German AI research
(2021)
Creation and exchange of knowledge depends on collaboration. Recent work has suggested that the emergence of collaboration frequently relies on geographic proximity. However, being co-located tends to be associated with other dimensions of proximity, such as social ties or a shared organizational environment. To account for such factors, multiple dimensions of proximity have been proposed, including cognitive, institutional, organizational, social and geographical proximity. Since they strongly interrelate, disentangling these dimensions and their respective impact on collaboration is challenging. To address this issue, we propose various methods for measuring different dimensions of proximity. We then present an approach to compare and rank them with respect to the extent to which they indicate co-publications and co-inventions. We adapt the HypTrails approach, which was originally developed to explain human navigation, to co-author and co-inventor graphs. We evaluate this approach on a subset of the German research community, specifically academic authors and inventors active in research on artificial intelligence (AI). We find that social proximity and cognitive proximity are more important for the emergence of collaboration than geographic proximity.
In many real world settings, imbalanced data impedes model performance of learning algorithms, like neural networks, mostly for rare cases. This is especially problematic for tasks focusing on these rare occurrences. For example, when estimating precipitation, extreme rainfall events are scarce but important considering their potential consequences. While there are numerous well studied solutions for classification settings, most of them cannot be applied to regression easily. Of the few solutions for regression tasks, barely any have explored cost-sensitive learning which is known to have advantages compared to sampling-based methods in classification tasks. In this work, we propose a sample weighting approach for imbalanced regression datasets called DenseWeight and a cost-sensitive learning approach for neural network regression with imbalanced data called DenseLoss based on our weighting scheme. DenseWeight weights data points according to their target value rarities through kernel density estimation (KDE). DenseLoss adjusts each data point’s influence on the loss according to DenseWeight, giving rare data points more influence on model training compared to common data points. We show on multiple differently distributed datasets that DenseLoss significantly improves model performance for rare data points through its density-based weighting scheme. Additionally, we compare DenseLoss to the state-of-the-art method SMOGN, finding that our method mostly yields better performance. Our approach provides more control over model training as it enables us to actively decide on the trade-off between focusing on common or rare cases through a single hyperparameter, allowing the training of better models for rare data points.
In the last decades, the classical Vehicle Routing Problem (VRP), i.e., assigning a set of orders to vehicles and planning their routes has been intensively researched. As only the assignment of order to vehicles and their routes is already an NP-complete problem, the application of these algorithms in practice often fails to take into account the constraints and restrictions that apply in real-world applications, the so called rich VRP (rVRP) and are limited to single aspects. In this work, we incorporate the main relevant real-world constraints and requirements. We propose a two-stage strategy and a Timeline algorithm for time windows and pause times, and apply a Genetic Algorithm (GA) and Ant Colony Optimization (ACO) individually to the problem to find optimal solutions. Our evaluation of eight different problem instances against four state-of-the-art algorithms shows that our approach handles all given constraints in a reasonable time.
A bipartite graph G=(U,V,E) is convex if the vertices in V can be linearly ordered such that for each vertex u∈U, the neighbors of u are consecutive in the ordering of V. An induced matching H of G is a matching for which no edge of E connects endpoints of two different edges of H. We show that in a convex bipartite graph with n vertices and m weighted edges, an induced matching of maximum total weight can be computed in O(n+m) time. An unweighted convex bipartite graph has a representation of size O(n) that records for each vertex u∈U the first and last neighbor in the ordering of V. Given such a compact representation, we compute an induced matching of maximum cardinality in O(n) time. In convex bipartite graphs, maximum-cardinality induced matchings are dual to minimum chain covers. A chain cover is a covering of the edge set by chain subgraphs, that is, subgraphs that do not contain induced matchings of more than one edge. Given a compact representation, we compute a representation of a minimum chain cover in O(n) time. If no compact representation is given, the cover can be computed in O(n+m) time. All of our algorithms achieve optimal linear running time for the respective problem and model, and they improve and generalize the previous results in several ways: The best algorithms for the unweighted problem versions had a running time of O(n\(^{2}\)) (Brandstädt et al. in Theor. Comput. Sci. 381(1–3):260–265, 2007. https://doi.org/10.1016/j.tcs.2007.04.006). The weighted case has not been considered before.
The joint 1st Workshop on Evaluations and Measurements in Self-Aware Computing Systems (EMSAC 2019) and Workshop on Self-Aware Computing (SeAC) was held as part of the FAS* conference alliance in conjunction with the 16th IEEE International Conference on Autonomic Computing (ICAC) and the 13th IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO) in Umeå, Sweden on 20 June 2019. The goal of this one-day workshop was to bring together researchers and practitioners from academic environments and from the industry to share their solutions, ideas, visions, and doubts in self-aware computing systems in general and in the evaluation and measurements of such systems in particular. The workshop aimed to enable discussions, partnerships, and collaborations among the participants. This special issue follows the theme of the workshop. It contains extended versions of workshop presentations as well as additional contributions.
Even today, the automatic digitisation of scanned documents in general, but especially the automatic optical music recognition (OMR) of historical manuscripts, still remains an enormous challenge, since both handwritten musical symbols and text have to be identified. This paper focuses on the Medieval so-called square notation developed in the 11th–12th century, which is already composed of staff lines, staves, clefs, accidentals, and neumes that are roughly spoken connected single notes. The aim is to develop an algorithm that captures both the neumes, and in particular its melody, which can be used to reconstruct the original writing. Our pipeline is similar to the standard OMR approach and comprises a novel staff line and symbol detection algorithm based on deep Fully Convolutional Networks (FCN), which perform pixel-based predictions for either staff lines or symbols and their respective types. Then, the staff line detection combines the extracted lines to staves and yields an F\(_1\) -score of over 99% for both detecting lines and complete staves. For the music symbol detection, we choose a novel approach that skips the step to identify neumes and instead directly predicts note components (NCs) and their respective affiliation to a neume. Furthermore, the algorithm detects clefs and accidentals. Our algorithm predicts the symbol sequence of a staff with a diplomatic symbol accuracy rate (dSAR) of about 87%, which includes symbol type and location. If only the NCs without their respective connection to a neume, all clefs and accidentals are of interest, the algorithm reaches an harmonic symbol accuracy rate (hSAR) of approximately 90%. In general, the algorithm recognises a symbol in the manuscript with an F\(_1\) -score of over 96%.
This study provides a systematic literature review of research (2001–2020) in the field of teaching and learning a foreign language and intercultural learning using immersive technologies. Based on 2507 sources, 54 articles were selected according to a predefined selection criteria. The review is aimed at providing information about which immersive interventions are being used for foreign language learning and teaching and where potential research gaps exist. The papers were analyzed and coded according to the following categories: (1) investigation form and education level, (2) degree of immersion, and technology used, (3) predictors, and (4) criterions. The review identified key research findings relating the use of immersive technologies for learning and teaching a foreign language and intercultural learning at cognitive, affective, and conative levels. The findings revealed research gaps in the area of teachers as a target group, and virtual reality (VR) as a fully immersive intervention form. Furthermore, the studies reviewed rarely examined behavior, and implicit measurements related to inter- and trans-cultural learning and teaching. Inter- and transcultural learning and teaching especially is an underrepresented investigation subject. Finally, concrete suggestions for future research are given. The systematic review contributes to the challenge of interdisciplinary cooperation between pedagogy, foreign language didactics, and Human-Computer Interaction to achieve innovative teaching-learning formats and a successful digital transformation.