004 Datenverarbeitung; Informatik
Refine
Has Fulltext
- yes (78)
Is part of the Bibliography
- yes (78)
Year of publication
Document Type
- Journal article (78) (remove)
Keywords
- virtual reality (11)
- machine learning (5)
- augmented reality (3)
- immersion (3)
- Deep learning (2)
- Quadrocopter (2)
- Quadrotor (2)
- XR (2)
- artificial intelligence (2)
- automation (2)
Institute
- Institut für Informatik (78) (remove)
Even today, the automatic digitisation of scanned documents in general, but especially the automatic optical music recognition (OMR) of historical manuscripts, still remains an enormous challenge, since both handwritten musical symbols and text have to be identified. This paper focuses on the Medieval so-called square notation developed in the 11th–12th century, which is already composed of staff lines, staves, clefs, accidentals, and neumes that are roughly spoken connected single notes. The aim is to develop an algorithm that captures both the neumes, and in particular its melody, which can be used to reconstruct the original writing. Our pipeline is similar to the standard OMR approach and comprises a novel staff line and symbol detection algorithm based on deep Fully Convolutional Networks (FCN), which perform pixel-based predictions for either staff lines or symbols and their respective types. Then, the staff line detection combines the extracted lines to staves and yields an F\(_1\) -score of over 99% for both detecting lines and complete staves. For the music symbol detection, we choose a novel approach that skips the step to identify neumes and instead directly predicts note components (NCs) and their respective affiliation to a neume. Furthermore, the algorithm detects clefs and accidentals. Our algorithm predicts the symbol sequence of a staff with a diplomatic symbol accuracy rate (dSAR) of about 87%, which includes symbol type and location. If only the NCs without their respective connection to a neume, all clefs and accidentals are of interest, the algorithm reaches an harmonic symbol accuracy rate (hSAR) of approximately 90%. In general, the algorithm recognises a symbol in the manuscript with an F\(_1\) -score of over 96%.
Plenty of theories, models, measures, and investigations target the understanding of virtual presence, i.e., the sense of presence in immersive Virtual Reality (VR). Other varieties of the so-called eXtended Realities (XR), e.g., Augmented and Mixed Reality (AR and MR) incorporate immersive features to a lesser degree and continuously combine spatial cues from the real physical space and the simulated virtual space. This blurred separation questions the applicability of the accumulated knowledge about the similarities of virtual presence and presence occurring in other varieties of XR, and corresponding outcomes. The present work bridges this gap by analyzing the construct of presence in mixed realities (MR). To achieve this, the following presents (1) a short review of definitions, dimensions, and measurements of presence in VR, and (2) the state of the art views on MR. Additionally, we (3) derived a working definition of MR, extending the Milgram continuum. This definition is based on entities reaching from real to virtual manifestations at one time point. Entities possess different degrees of referential power, determining the selection of the frame of reference. Furthermore, we (4) identified three research desiderata, including research questions about the frame of reference, the corresponding dimension of transportation, and the dimension of realism in MR. Mainly the relationship between the main aspects of virtual presence of immersive VR, i.e., the place-illusion, and the plausibility-illusion, and of the referential power of MR entities are discussed regarding the concept, measures, and design of presence in MR. Finally, (5) we suggested an experimental setup to reveal the research heuristic behind experiments investigating presence in MR. The present work contributes to the theories and the meaning of and approaches to simulate and measure presence in MR. We hypothesize that research about essential underlying factors determining user experience (UX) in MR simulations and experiences is still in its infancy and hopes this article provides an encouraging starting point to tackle related questions.
The rating of perceived exertion (RPE) is a subjective load marker and may assist in individualizing training prescription, particularly by adjusting running intensity. Unfortunately, RPE has shortcomings (e.g., underreporting) and cannot be monitored continuously and automatically throughout a training sessions. In this pilot study, we aimed to predict two classes of RPE (≤15 “Somewhat hard to hard” on Borg’s 6–20 scale vs. RPE >15 in runners by analyzing data recorded by a commercially-available smartwatch with machine learning algorithms. Twelve trained and untrained runners performed long-continuous runs at a constant self-selected pace to volitional exhaustion. Untrained runners reported their RPE each kilometer, whereas trained runners reported every five kilometers. The kinetics of heart rate, step cadence, and running velocity were recorded continuously ( 1 Hz ) with a commercially-available smartwatch (Polar V800). We trained different machine learning algorithms to estimate the two classes of RPE based on the time series sensor data derived from the smartwatch. Predictions were analyzed in different settings: accuracy overall and per runner type; i.e., accuracy for trained and untrained runners independently. We achieved top accuracies of 84.8 % for the whole dataset, 81.8 % for the trained runners, and 86.1 % for the untrained runners. We predict two classes of RPE with high accuracy using machine learning and smartwatch data. This approach might aid in individualizing training prescriptions.
In the present work, a simulation system is proposed that can be used as an educational tool by physicians in training basic skills of minimally invasive vascular interventions. In order to accomplish this objective, initially the physical model of the wire proposed by Konings has been improved. As a result, a simpler and more stable method was obtained to calculate the equilibrium configuration of the wire. In addition, a geometrical method is developed to perform relaxations. It is particularly useful when the wire is hindered in the physical method because of the boundary conditions. Then a recipe is given to merge the physical and the geometrical methods, resulting in efficient relaxations. Moreover, tests have shown that the shape of the virtual wire agrees with the experiment. The proposed algorithm allows real-time executions, and furthermore, the hardware to assemble the simulator has a low cost.
Group-based communication is a highly popular communication paradigm, which is especially prominent in mobile instant messaging (MIM) applications, such as WhatsApp. Chat groups in MIM applications facilitate the sharing of various types of messages (e.g., text, voice, image, video) among a large number of participants. As each message has to be transmitted to every other member of the group, which multiplies the traffic, this has a massive impact on the underlying communication networks. However, most chat groups are private and network operators cannot obtain deep insights into MIM communication via network measurements due to end-to-end encryption. Thus, the generation of traffic is not well understood, given that it depends on sizes of communication groups, speed of communication, and exchanged message types. In this work, we provide a huge data set of 5,956 private WhatsApp chat histories, which contains over 76 million messages from more than 117,000 users. We describe and model the properties of chat groups and users, and the communication within these chat groups, which gives unprecedented insights into private MIM communication. In addition, we conduct exemplary measurements for the most popular message types, which empower the provided models to estimate the traffic over time in a chat group.
Semantic Fusion for Natural Multimodal Interfaces using Concurrent Augmented Transition Networks
(2018)
Semantic fusion is a central requirement of many multimodal interfaces. Procedural methods like finite-state transducers and augmented transition networks have proven to be beneficial to implement semantic fusion. They are compliant with rapid development cycles that are common for the development of user interfaces, in contrast to machine-learning approaches that require time-costly training and optimization. We identify seven fundamental requirements for the implementation of semantic fusion: Action derivation, continuous feedback, context-sensitivity, temporal relation support, access to the interaction context, as well as the support of chronologically unsorted and probabilistic input. A subsequent analysis reveals, however, that there is currently no solution for fulfilling the latter two requirements. As the main contribution of this article, we thus present the Concurrent Cursor concept to compensate these shortcomings. In addition, we showcase a reference implementation, the Concurrent Augmented Transition Network (cATN), that validates the concept’s feasibility in a series of proof of concept demonstrations as well as through a comparative benchmark. The cATN fulfills all identified requirements and fills the lack amongst previous solutions. It supports the rapid prototyping of multimodal interfaces by means of five concrete traits: Its declarative nature, the recursiveness of the underlying transition network, the network abstraction constructs of its description language, the utilized semantic queries, and an abstraction layer for lexical information. Our reference implementation was and is used in various student projects, theses, as well as master-level courses. It is openly available and showcases that non-experts can effectively implement multimodal interfaces, even for non-trivial applications in mixed and virtual reality.
Proximity dimensions and the emergence of collaboration: a HypTrails study on German AI research
(2021)
Creation and exchange of knowledge depends on collaboration. Recent work has suggested that the emergence of collaboration frequently relies on geographic proximity. However, being co-located tends to be associated with other dimensions of proximity, such as social ties or a shared organizational environment. To account for such factors, multiple dimensions of proximity have been proposed, including cognitive, institutional, organizational, social and geographical proximity. Since they strongly interrelate, disentangling these dimensions and their respective impact on collaboration is challenging. To address this issue, we propose various methods for measuring different dimensions of proximity. We then present an approach to compare and rank them with respect to the extent to which they indicate co-publications and co-inventions. We adapt the HypTrails approach, which was originally developed to explain human navigation, to co-author and co-inventor graphs. We evaluate this approach on a subset of the German research community, specifically academic authors and inventors active in research on artificial intelligence (AI). We find that social proximity and cognitive proximity are more important for the emergence of collaboration than geographic proximity.
Crowdsensing offers a cost-effective way to collect large amounts of environmental sensor data; however, the spatial distribution of crowdsensing sensors can hardly be influenced, as the participants carry the sensors, and, additionally, the quality of the crowdsensed data can vary significantly. Hybrid systems that use mobile users in conjunction with fixed sensors might help to overcome these limitations, as such systems allow assessing the quality of the submitted crowdsensed data and provide sensor values where no crowdsensing data are typically available. In this work, we first used a simulation study to analyze a simple crowdsensing system concerning the detection performance of spatial events to highlight the potential and limitations of a pure crowdsourcing system. The results indicate that even if only a small share of inhabitants participate in crowdsensing, events that have locations correlated with the population density can be easily and quickly detected using such a system. On the contrary, events with uniformly randomly distributed locations are much harder to detect using a simple crowdsensing-based approach. A second evaluation shows that hybrid systems improve the detection probability and time. Finally, we illustrate how to compute the minimum number of fixed sensors for the given detection time thresholds in our exemplary scenario.
The issue of sustainability is at the top of the political and societal agenda, being considered of extreme importance and urgency. Human individual action impacts the environment both locally (e.g., local air/water quality, noise disturbance) and globally (e.g., climate change, resource use). Urban environments represent a crucial example, with an increasing realization that the most effective way of producing a change is involving the citizens themselves in monitoring campaigns (a citizen science bottom-up approach). This is possible by developing novel technologies and IT infrastructures enabling large citizen participation. Here, in the wider framework of one of the first such projects, we show results from an international competition where citizens were involved in mobile air pollution monitoring using low cost sensing devices, combined with a web-based game to monitor perceived levels of pollution. Measures of shift in perceptions over the course of the campaign are provided, together with insights into participatory patterns emerging from this study. Interesting effects related to inertia and to direct involvement in measurement activities rather than indirect information exposure are also highlighted, indicating that direct involvement can enhance learning and environmental awareness. In the future, this could result in better adoption of policies towards decreasing pollution.
The strict restrictions introduced by the COVID-19 lockdowns, which started from March 2020, changed people’s daily lives and habits on many different levels. In this work, we investigate the impact of the lockdown on the communication behavior in the mobile instant messaging application WhatsApp. Our evaluations are based on a large dataset of 2577 private chat histories with 25,378,093 messages from 51,973 users. The analysis of the one-to-one and group conversations confirms that the lockdown severely altered the communication in WhatsApp chats compared to pre-pandemic time ranges. In particular, we observe short-term effects, which caused an increased message frequency in the first lockdown months and a shifted communication activity during the day in March and April 2020. Moreover, we also see long-term effects of the ongoing pandemic situation until February 2021, which indicate a change of communication behavior towards more regular messaging, as well as a persisting change in activity during the day. The results of our work show that even anonymized chat histories can tell us a lot about people’s behavior and especially behavioral changes during the COVID-19 pandemic and thus are of great relevance for behavioral researchers. Furthermore, looking at the pandemic from an Internet provider perspective, these insights can be used during the next pandemic, or if the current COVID-19 situation worsens, to adapt communication networks to the changed usage behavior early on and thus avoid network congestion.