OPUS Würzburg | 004 Datenverarbeitung; Informatik

004 Datenverarbeitung; Informatik

38 search hits

21 to 30

Sort by

Line-level layout recognition of historical documents with background knowledge (2023)

Fischer, Norbert ; Hartelt, Alexander ; Puppe, Frank

Digitization and transcription of historic documents offer new research opportunities for humanists and are the topics of many edition projects. However, manual work is still required for the main phases of layout recognition and the subsequent optical character recognition (OCR) of early printed documents. This paper describes and evaluates how deep learning approaches recognize text lines and can be extended to layout recognition using background knowledge. The evaluation was performed on five corpora of early prints from the 15th and 16th Centuries, representing a variety of layout features. While the main text with standard layouts could be recognized in the correct reading order with a precision and recall of up to 99.9%, also complex layouts were recognized at a rate as high as 90% by using background knowledge, the full potential of which was revealed if many pages of the same source were transcribed.

Leveraging deep learning for identification and structural determination of novel protein complexes from \(in\) \(situ\) electron cryotomography of \(Mycoplasma\) \(pneumoniae\) (2023)

Somody, Joseph Christian Campbell

The holy grail of structural biology is to study a protein in situ, and this goal has been fast approaching since the resolution revolution and the achievement of atomic resolution. A cell's interior is not a dilute environment, and proteins have evolved to fold and function as needed in that environment; as such, an investigation of a cellular component should ideally include the full complexity of the cellular environment. Imaging whole cells in three dimensions using electron cryotomography is the best method to accomplish this goal, but it comes with a limitation on sample thickness and produces noisy data unamenable to direct analysis. This thesis establishes a novel workflow to systematically analyse whole-cell electron cryotomography data in three dimensions and to find and identify instances of protein complexes in the data to set up a determination of their structure and identity for success. Mycoplasma pneumoniae is a very small parasitic bacterium with fewer than 700 protein-coding genes, is thin enough and small enough to be imaged in large quantities by electron cryotomography, and can grow directly on the grids used for imaging, making it ideal for exploratory studies in structural proteomics. As part of the workflow, a methodology for training deep-learning-based particle-picking models is established. As a proof of principle, a dataset of whole-cell Mycoplasma pneumoniae tomograms is used with this workflow to characterize a novel membrane-associated complex observed in the data. Ultimately, 25431 such particles are picked from 353 tomograms and refined to a density map with a resolution of 11 Å. Making good use of orthogonal datasets to filter search space and verify results, structures were predicted for candidate proteins and checked for suitable fit in the density map. In the end, with this approach, nine proteins were found to be part of the complex, which appears to be associated with chaperone activity and interact with translocon machinery. Visual proteomics refers to the ultimate potential of in situ electron cryotomography: the comprehensive interpretation of tomograms. The workflow presented here is demonstrated to help in reaching that potential.

KIETA: Key-insight extraction from scientific tables (2023)

Kempf, Sebastian ; Krug, Markus ; Puppe, Frank

An important but very time consuming part of the research process is literature review. An already large and nevertheless growing ground set of publications as well as a steadily increasing publication rate continue to worsen the situation. Consequently, automating this task as far as possible is desirable. Experimental results of systems are key-insights of high importance during literature review and usually represented in form of tables. Our pipeline KIETA exploits these tables to contribute to the endeavor of automation by extracting them and their contained knowledge from scientific publications. The pipeline is split into multiple steps to guarantee modularity as well as analyzability, and agnosticim regarding the specific scientific domain up until the knowledge extraction step, which is based upon an ontology. Additionally, a dataset of corresponding articles has been manually annotated with information regarding table and knowledge extraction. Experiments show promising results that signal the possibility of an automated system, while also indicating limits of extracting knowledge from tables without any context.

JCAS-Enabled Sensing as a Service in 6th-Generation Mobile Communication Networks (2023)

Rauber, Christof A. O. ; Brechtel, Lukas ; Schotten, Hans D.

The introduction of new types of frequency spectrum in 6G technology facilitates the convergence of conventional mobile communications and radar functions. Thus, the mobile network itself becomes a versatile sensor system. This enables mobile network operators to offer a sensing service in addition to conventional data and telephony services. The potential benefits are expected to accrue to various stakeholders, including individuals, the environment, and society in general. The paper discusses technological development, possible integration, and use cases, as well as future development areas.

How to Model and Predict the Scalability of a Hardware-In-The-Loop Test Bench for Data Re-Injection? (2023)

Funda, Christoph ; Konheiser, Tobias ; German, Reinhard ; Hielscher, Kai-Steffen

This paper describes a novel application of an empirical network calculus model based on measurements of a hardware-in-the-loop (HIL) test system. The aim is to predict the performance of a HIL test bench for open-loop re-injection in the context of scalability. HIL test benches are distributed computer systems including software, hardware, and networking devices. They are used to validate complex technical systems, but have not yet been system under study themselves. Our approach is to use measurements from the HIL system to create an empirical model for arrival and service curves. We predict the performance and design the previously unknown parameters of the HIL simulator with network calculus (NC), namely the buffer sizes and the minimum needed pre-buffer time for the playback buffer. We furthermore show, that it is possible to estimate the CPU load from arrival and service-curves based on the utilization theorem, and hence estimate the scalability of the HIL system in the context of the number of sensor streams.

Giving historical photographs a new perspective: introducing camera orientation parameters as new metadata in a large-scale 4D application (2023)

Maiwald, Ferdinand ; Bruschke, Jonas ; Schneider, Danilo ; Wacker, Markus ; Niebling, Florian

The ongoing digitization of historical photographs in archives allows investigating the quality, quantity, and distribution of these images. However, the exact interior and exterior camera orientations of these photographs are usually lost during the digitization process. The proposed method uses content-based image retrieval (CBIR) to filter exterior images of single buildings in combination with metadata information. The retrieved photographs are automatically processed in an adapted structure-from-motion (SfM) pipeline to determine the camera parameters. In an interactive georeferencing process, the calculated camera positions are transferred into a global coordinate system. As all image and camera data are efficiently stored in the proposed 4D database, they can be conveniently accessed afterward to georeference newly digitized images by using photogrammetric triangulation and spatial resection. The results show that the CBIR and the subsequent SfM are robust methods for various kinds of buildings and different quantity of data. The absolute accuracy of the camera positions after georeferencing lies in the range of a few meters likely introduced by the inaccurate LOD2 models used for transformation. The proposed photogrammetric method, the database structure, and the 4D visualization interface enable adding historical urban photographs and 3D models from other locations.

Federated Learning for Service Placement in Fog and Edge Computing (2023)

Dworzak, Manuel ; Großmann, Marcel ; Le, Duy Thanh

Service orchestration requires enormous attention and is a struggle nowadays. Of course, virtualization provides a base level of abstraction for services to be deployable on a lot of infrastructures. With container virtualization, the trend to migrate applications to a micro-services level in order to be executable in Fog and Edge Computing environments increases manageability and maintenance efforts rapidly. Similarly, network virtualization adds effort to calibrate IP flows for Software-Defined Networks and eventually route it by means of Network Function Virtualization. Nevertheless, there are concepts like MAPE-K to support micro-service distribution in next-generation cloud and network environments. We want to explore, how a service distribution can be improved by adopting machine learning concepts for infrastructure or service changes. Therefore, we show how federated machine learning is integrated into a cloud-to-fog-continuum without burdening single nodes.

Emulation of Multipath Transmissions in P4 Networks with Kathará (2023)

Großmann, Marcel ; Homeyer, Tobias

Packets sent over a network can either get lost or reach their destination. Protocols like TCP try to solve this problem by resending the lost packets. However, retransmissions consume a lot of time and are cumbersome for the transmission of critical data. Multipath solutions are quite common to address this reliability issue and are available on almost every layer of the ISO/OSI model. We propose a solution based on a P4 network to duplicate packets in order to send them to their destination via multiple routes. The last network hop ensures that only a single copy of the traffic is further forwarded to its destination by adopting a concept similar to Bloom filters. Besides, if fast delivery is requested we provide a P4 prototype, which randomly forwards the packets over different transmission paths. For reproducibility, we implement our approach in a container-based network emulation system called Kathará.

Defining and Implementing Domain-Specific Languages with Prolog (2023)

Nogatz, Falco

The landscape of today’s programming languages is manifold. With the diversity of applications, the difficulty of adequately addressing and specifying the used programs increases. This often leads to newly designed and implemented domain-specific languages. They enable domain experts to express knowledge in their preferred format, resulting in more readable and concise programs. Due to its flexible and declarative syntax without reserved keywords, the logic programming language Prolog is particularly suitable for defining and embedding domain-specific languages. This thesis addresses the questions and challenges that arise when integrating domain-specific languages into Prolog. We compare the two approaches to define them either externally or internally, and provide assisting tools for each. The grammar of a formal language is usually defined in the extended Backus–Naur form. In this work, we handle this formalism as a domain-specific language in Prolog, and define term expansions that allow to translate it into equivalent definite clause grammars. We present the package library(dcg4pt) for SWI-Prolog, which enriches them by an additional argument to automatically process the term’s corresponding parse tree. To simplify the work with definite clause grammars, we visualise their application by a web-based tracer. The external integration of domain-specific languages requires the programmer to keep the grammar, parser, and interpreter in sync. In many cases, domain-specific languages can instead be directly embedded into Prolog by providing appropriate operator definitions. In addition, we propose syntactic extensions for Prolog to expand its expressiveness, for instance to state logic formulas with their connectives verbatim. This allows to use all tools that were originally written for Prolog, for instance code linters and editors with syntax highlighting. We present the package library(plammar), a standard-compliant parser for Prolog source code, written in Prolog. It is able to automatically infer from example sentences the required operator definitions with their classes and precedences as well as the required Prolog language extensions. As a result, we can automatically answer the question: Is it possible to model these example sentences as valid Prolog clauses, and how? We discuss and apply the two approaches to internal and external integrations for several domain-specific languages, namely the extended Backus–Naur form, GraphQL, XPath, and a controlled natural language to represent expert rules in if-then form. The created toolchain with library(dcg4pt) and library(plammar) yields new application opportunities for static Prolog source code analysis, which we also present.

Deep neural network regression for normalized digital surface model generation with Sentinel-2 imagery (2023)

Müller, Konstantin ; Leppich, Robert ; Geiß, Christian ; Borst, Vanessa ; Pelizari, Patrick Aravena ; Kounev, Samuel ; Taubenböck, Hannes

In recent history, normalized digital surface models (nDSMs) have been constantly gaining importance as a means to solve large-scale geographic problems. High-resolution surface models are precious, as they can provide detailed information for a specific area. However, measurements with a high resolution are time consuming and costly. Only a few approaches exist to create high-resolution nDSMs for extensive areas. This article explores approaches to extract high-resolution nDSMs from low-resolution Sentinel-2 data, allowing us to derive large-scale models. We thereby utilize the advantages of Sentinel 2 being open access, having global coverage, and providing steady updates through a high repetition rate. Several deep learning models are trained to overcome the gap in producing high-resolution surface maps from low-resolution input data. With U-Net as a base architecture, we extend the capabilities of our model by integrating tailored multiscale encoders with differently sized kernels in the convolution as well as conformed self-attention inside the skip connection gates. Using pixelwise regression, our U-Net base models can achieve a mean height error of approximately 2 m. Moreover, through our enhancements to the model architecture, we reduce the model error by more than 7%.

21 to 30

004 Datenverarbeitung; Informatik

Refine

Has Fulltext

Is part of the Bibliography

Year of publication

Document Type

Language

Keywords

Author

Institute

Sonstige beteiligte Institutionen

EU-Project number / Contract (GA) number

38 search hits