Refine
Has Fulltext
- yes (52)
Is part of the Bibliography
- yes (52)
Document Type
- Journal article (50)
- Doctoral Thesis (2)
Keywords
- machine learning (52) (remove)
Institute
- Institut für Geographie und Geologie (10)
- Institut für Informatik (9)
- Center for Computational and Theoretical Biology (5)
- Institut für Klinische Epidemiologie und Biometrie (5)
- Betriebswirtschaftliches Institut (4)
- Medizinische Klinik und Poliklinik II (4)
- Pathologisches Institut (4)
- Theodor-Boveri-Institut für Biowissenschaften (4)
- Institut für diagnostische und interventionelle Radiologie (Institut für Röntgendiagnostik) (3)
- Klinik und Poliklinik für Mund-, Kiefer- und Plastische Gesichtschirurgie (3)
Sonstige beteiligte Institutionen
Acceleration is a central aim of clinical and technical research in magnetic resonance imaging (MRI) today, with the potential to increase robustness, accessibility and patient comfort, reduce cost, and enable entirely new kinds of examinations. A key component in this endeavor is image reconstruction, as most modern approaches build on advanced signal and image processing. Here, deep learning (DL)-based methods have recently shown considerable potential, with numerous publications demonstrating benefits for MRI reconstruction. However, these methods often come at the cost of an increased risk for subtle yet critical errors. Therefore, the aim of this thesis is to advance DL-based MRI reconstruction, while ensuring high quality and fidelity with measured data. A network architecture specifically suited for this purpose is the variational network (VN). To investigate the benefits these can bring to non-Cartesian cardiac imaging, the first part presents an application of VNs, which were specifically adapted to the reconstruction of accelerated spiral acquisitions. The proposed method is compared to a segmented exam, a U-Net and a compressed sensing (CS) model using qualitative and quantitative measures. While the U-Net performed poorly, the VN as well as the CS reconstruction showed good output quality. In functional cardiac imaging, the proposed real-time method with VN reconstruction substantially accelerates examinations over the gold-standard, from over 10 to just 1 minute. Clinical parameters agreed on average.
Generally in MRI reconstruction, the assessment of image quality is complex, in particular for modern non-linear methods. Therefore, advanced techniques for precise evaluation of quality were subsequently demonstrated.
With two distinct methods, resolution and amplification or suppression of noise are quantified locally in each pixel of a reconstruction. Using these, local maps of resolution and noise in parallel imaging (GRAPPA), CS, U-Net and VN reconstructions were determined for MR images of the brain. In the tested images, GRAPPA delivers uniform and ideal resolution, but amplifies noise noticeably. The other methods adapt their behavior to image structure, where different levels of local blurring were observed at edges compared to homogeneous areas, and noise was suppressed except at edges. Overall, VNs were found to combine a number of advantageous properties, including a good trade-off between resolution and noise, fast reconstruction times, and high overall image quality and fidelity of the produced output. Therefore, this network architecture seems highly promising for MRI reconstruction.
Grünflächen stellen einen der wichtigsten Umwelteinflüsse in der Wohnumwelt der Menschen dar. Einerseits wirken sie sich positiv auf die physische und mentale Gesundheit der Menschen aus, andererseits können Grünflächen auch negative Wirkungen anderer Faktoren abmildern, wie beispielsweise die im Laufe des Klimawandels zunehmenden Hitzeereignisse. Dennoch sind Grünflächen nicht für die gesamte Bevölkerung gleichermaßen zugänglich. Bestehende Forschung im Kontext der Umweltgerechtigkeit (UG) konnte bereits aufzeigen, dass unterschiedliche sozio-ökonomische und demographische Gruppen der deutschen Bevölkerung unterschiedlichen Zugriff auf Grünflächen haben. An bestehenden Analysen von Umwelteinflüssen im Kontext der UG wird kritisiert, dass die Auswertung geographischer Daten häufig auf zu stark aggregiertem Level geschieht, wodurch lokal spezifische Expositionen nicht mehr genau abgebildet werden. Dies trifft insbesondere für großflächig angelegte Studien zu. So werden wichtige räumliche Informationen verloren. Doch moderne Erdbeobachtungs- und Geodaten sind so detailliert wie nie und Methoden des maschinellen Lernens ermöglichen die effiziente Verarbeitung zur Ableitung höherwertiger Informationen.
Das übergeordnete Ziel dieser Arbeit besteht darin, am Beispiel von Grünflächen in Deutschland methodische Schritte der systematischen Umwandlung umfassender Geodaten in relevante Geoinformationen für die großflächige und hochaufgelöste Analyse von Umwelteigenschaften aufzuzeigen und durchzuführen. An der Schnittstelle der Disziplinen Fernerkundung, Geoinformatik, Sozialgeographie und Umweltgerechtigkeitsforschung sollen Potenziale moderner Methoden für die Verbesserung der räumlichen und semantischen Auflösung von Geoinformationen erforscht werden. Hierfür werden Methoden des maschinellen Lernens eingesetzt, um Landbedeckung und -nutzung auf nationaler Ebene zu erfassen. Diese Entwicklungen sollen dazu beitragen bestehende Datenlücken zu schließen und Aufschluss über die Verteilungsgerechtigkeit von Grünflächen zu bieten.
Diese Dissertation gliedert sich in drei konzeptionelle Teilschritte. Im ersten Studienteil werden Erdbeobachtungsdaten der Sentinel-2 Satelliten zur deutschlandweiten Klassifikation von Landbedeckungsinformationen verwendet. In Kombination mit punktuellen Referenzdaten der europaweiten Erfassung für Landbedeckungs- und Landnutzungsinformationen des Land Use and Coverage Area Frame Survey (LUCAS) wird ein maschinelles Lernverfahren trainiert. In diesem Kontext werden verschiedene Vorverarbeitungsschritte der LUCAS-Daten und deren Einfluss auf die Klassifikationsgenauigkeit beleuchtet. Das Klassifikationsverfahren ist in der Lage Landbedeckungsinformationen auch in komplexen urbanen Gebieten mit hoher Genauigkeit abzuleiten. Ein Ergebnis des Studienteils ist eine deutschlandweite Landbedeckungsklassifikation mit einer Gesamtgenauigkeit von 93,07 %, welche im weiteren Verlauf der Arbeit genutzt wird, um grüne Landbedeckung (GLC) räumlich zu quantifizieren.
Im zweiten konzeptionellen Teil der Arbeit steht die differenzierte Betrachtung von Grünflächen anhand des Beispiels öffentlicher Grünflächen (PGS), die häufig Gegenstand der UG-Forschung ist, im Vordergrund. Doch eine häufig verwendete Quelle für räumliche Daten zu öffentlichen Grünflächen, der European Urban Atlas (EUA), wird bisher nicht flächendeckend für Deutschland erhoben. Dieser Studienteil verfolgt einen datengetriebenen Ansatz, die Verfügbarkeit von öffentlichem Grün auf der räumlichen Ebene von Nachbarschaften für ganz Deutschland zu ermitteln. Hierfür dienen bereits vom EUA erfasste Gebiete als Referenz. Mithilfe einer Kombination von Erdbeobachtungsdaten und Informationen aus dem OpenStreetMap-Projekt wird ein Deep Learning -basiertes Fusionsnetzwerk erstellt, welche die verfügbare Fläche von öffentlichem Grün quantifiziert. Das Ergebnis dieses Schrittes ist ein Modell, welches genutzt wird, um die Menge öffentlicher Grünflächen in der Nachbarschaft zu schätzen (𝑅 2 = 0.952).
Der dritte Studienteil greift die Ergebnisse der ersten beiden Studienteile auf und betrachtet die Verteilung von Grünflächen in Deutschland unter Hinzunahme von georeferenzierten Bevölkerungsdaten. Diese exemplarische Analyse unterscheidet dabei Grünflächen nach zwei Typen: GLC und PGS. Zunächst wird mithilfe deskriptiver Statistiken die generelle Grünflächenverteilung in der Bevölkerung Deutschlands beleuchtet. Daraufhin wird die Verteilungsgerechtigkeit anhand gängiger Gerechtigkeitsmetriken bestimmt. Abschließend werden die Zusammenhänge zwischen der demographischen Komposition der Nachbarschaft und der verfügbaren Menge von Grünflächen anhand dreier exemplarischer soziodemographischer Gesellschaftsgruppen untersucht. Die Analyse zeigt starke Unterschiede der Verfügbarkeit von PGS zwischen städtischen und ländlichen Gebieten. Ein höherer Prozentsatz der Stadtbevölkerung hat Zugriff das Mindestmaß von PGS gemessen an der Vorgabe der Weltgesundheitsorganisation. Die Ergebnisse zeigen auch einen deutlichen Unterschied bezüglich der Verteilungsgerechtigkeit zwischen GLC und PGS und verdeutlichen die Relevanz der Unterscheidung von Grünflächentypen für derartige
Untersuchungen. Die abschließende Betrachtung verschiedener Bevölkerungsgruppen arbeitet Unterschiede auf soziodemographischer Ebene auf.
In der Zusammenschau demonstriert diese Arbeit wie moderne Geodaten und Methoden des maschinellen Lernens genutzt werden können bisherige Limitierungen räumlicher Datensätze zu überwinden. Am Beispiel von Grünflächen in der Wohnumgebung der Bevölkerung Deutschlands wird gezeigt, dass landesweite Analysen zur Umweltgerechtigkeit durch hochaufgelöste und lokal feingliedrige geographische Informationen bereichert werden können. Diese Arbeit verdeutlicht, wie die Methoden der Erdbeobachtung und Geoinformatik einen wichtigen Beitrag leisten können, die Ungleichheit der Wohnumwelt der Menschen zu identifizieren und schlussendlich den nachhaltigen Siedlungsbau in Form von objektiven Informationen zu unterstützen und überwachen.
The fast and accurate yield estimates with the increasing availability and variety of global satellite products and the rapid development of new algorithms remain a goal for precision agriculture and food security. However, the consistency and reliability of suitable methodologies that provide accurate crop yield outcomes still need to be explored. The study investigates the coupling of crop modeling and machine learning (ML) to improve the yield prediction of winter wheat (WW) and oil seed rape (OSR) and provides examples for the Free State of Bavaria (70,550 km2), Germany, in 2019. The main objectives are to find whether a coupling approach [Light Use Efficiency (LUE) + Random Forest (RF)] would result in better and more accurate yield predictions compared to results provided with other models not using the LUE. Four different RF models [RF1 (input: Normalized Difference Vegetation Index (NDVI)), RF2 (input: climate variables), RF3 (input: NDVI + climate variables), RF4 (input: LUE generated biomass + climate variables)], and one semi-empiric LUE model were designed with different input requirements to find the best predictors of crop monitoring. The results indicate that the individual use of the NDVI (in RF1) and the climate variables (in RF2) could not be the most accurate, reliable, and precise solution for crop monitoring; however, their combined use (in RF3) resulted in higher accuracies. Notably, the study suggested the coupling of the LUE model variables to the RF4 model can reduce the relative root mean square error (RRMSE) from −8% (WW) and −1.6% (OSR) and increase the R
2 by 14.3% (for both WW and OSR), compared to results just relying on LUE. Moreover, the research compares models yield outputs by inputting three different spatial inputs: Sentinel-2(S)-MOD13Q1 (10 m), Landsat (L)-MOD13Q1 (30 m), and MOD13Q1 (MODIS) (250 m). The S-MOD13Q1 data has relatively improved the performance of models with higher mean R
2 [0.80 (WW), 0.69 (OSR)], and lower RRMSE (%) (9.18, 10.21) compared to L-MOD13Q1 (30 m) and MOD13Q1 (250 m). Satellite-based crop biomass, solar radiation, and temperature are found to be the most influential variables in the yield prediction of both crops.
Ever-growing data availability combined with rapid progress in analytics has laid the foundation for the emergence of business process analytics. Organizations strive to leverage predictive process analytics to obtain insights. However, current implementations are designed to deal with homogeneous data. Consequently, there is limited practical use in an organization with heterogeneous data sources. The paper proposes a method for predictive end-to-end enterprise process network monitoring leveraging multi-headed deep neural networks to overcome this limitation. A case study performed with a medium-sized German manufacturing company highlights the method’s utility for organizations.
Artificial intelligence (AI) has already arrived in many areas of our lives and, because of the increasing availability of computing power, can now be used for complex tasks in medicine and dentistry. This is reflected by an exponential increase in scientific publications aiming to integrate AI into everyday clinical routines. Applications of AI in orthodontics are already manifold and range from the identification of anatomical/pathological structures or reference points in imaging to the support of complex decision-making in orthodontic treatment planning. The aim of this article is to give the reader an overview of the current state of the art regarding applications of AI in orthodontics and to provide a perspective for the use of such AI solutions in clinical routine. For this purpose, we present various use cases for AI in orthodontics, for which research is already available. Considering the current scientific progress, it is not unreasonable to assume that AI will become an integral part of orthodontic diagnostics and treatment planning in the near future. Although AI will equally likely not be able to replace the knowledge and experience of human experts in the not-too-distant future, it probably will be able to support practitioners, thus serving as a quality-assuring component in orthodontic patient care.
Artificial intelligence (AI) is predicted to play an increasingly important role in perioperative medicine in the very near future. However, little is known about what anesthesiologists know and think about AI in this context. This is important because the successful introduction of new technologies depends on the understanding and cooperation of end users. We sought to investigate how much anesthesiologists know about AI and what they think about the introduction of AI-based technologies into the clinical setting. In order to better understand what anesthesiologists think of AI, we recruited 21 anesthesiologists from 2 university hospitals for face-to-face structured interviews. The interview transcripts were subdivided sentence-by-sentence into discrete statements, and statements were then grouped into key themes. Subsequently, a survey of closed questions based on these themes was sent to 70 anesthesiologists from 3 university hospitals for rating. In the interviews, the base level of knowledge of AI was good at 86 of 90 statements (96%), although awareness of the potential applications of AI in anesthesia was poor at only 7 of 42 statements (17%). Regarding the implementation of AI in anesthesia, statements were split roughly evenly between pros (46 of 105, 44%) and cons (59 of 105, 56%). Interviewees considered that AI could usefully be used in diverse tasks such as risk stratification, the prediction of vital sign changes, or as a treatment guide. The validity of these themes was probed in a follow-up survey of 70 anesthesiologists with a response rate of 70%, which confirmed an overall positive view of AI in this group. Anesthesiologists hold a range of opinions, both positive and negative, regarding the application of AI in their field of work. Survey-based studies do not always uncover the full breadth of nuance of opinion amongst clinicians. Engagement with specific concerns, both technical and ethical, will prove important as this technology moves from research to the clinic.
Gait disturbances are common manifestations of Parkinson’s disease (PD), with unmet therapeutic needs. Inertial measurement units (IMUs) are capable of monitoring gait, but they lack neurophysiological information that may be crucial for studying gait disturbances in these patients. Here, we present a machine learning approach to approximate IMU angular velocity profiles and subsequently gait events using electromyographic (EMG) channels during overground walking in patients with PD. We recorded six parkinsonian patients while they walked for at least three minutes. Patient-agnostic regression models were trained on temporally embedded EMG time series of different combinations of up to five leg muscles bilaterally (i.e., tibialis anterior, soleus, gastrocnemius medialis, gastrocnemius lateralis, and vastus lateralis). Gait events could be detected with high temporal precision (median displacement of <50 ms), low numbers of missed events (<2%), and next to no false-positive event detections (<0.1%). Swing and stance phases could thus be determined with high fidelity (median F1-score of ~0.9). Interestingly, the best performance was obtained using as few as two EMG probes placed on the left and right vastus lateralis. Our results demonstrate the practical utility of the proposed EMG-based system for gait event prediction, which allows the simultaneous acquisition of an electromyographic signal to be performed. This gait analysis approach has the potential to make additional measurement devices such as IMUs and force plates less essential, thereby reducing financial and preparation overheads and discomfort factors in gait studies.
Improved wall temperature prediction for the LUMEN rocket combustion chamber with neural networks
(2023)
Accurate calculations of the heat transfer and the resulting maximum wall temperature are essential for the optimal design of reliable and efficient regenerative cooling systems. However, predicting the heat transfer of supercritical methane flowing in cooling channels of a regeneratively cooled rocket combustor presents a significant challenge. High-fidelity CFD calculations provide sufficient accuracy but are computationally too expensive to be used within elaborate design optimization routines. In a previous work it has been shown that a surrogate model based on neural networks is able to predict the maximum wall temperature along straight cooling channels with convincing precision when trained with data from CFD simulations for simple cooling channel segments. In this paper, the methodology is extended to cooling channels with curvature. The predictions of the extended model are tested against CFD simulations with different boundary conditions for the representative LUMEN combustor contour with varying geometries and heat flux densities. The high accuracy of the extended model’s predictions, suggests that it will be a valuable tool for designing and analyzing regenerative cooling systems with greater efficiency and effectiveness.
(1) Background: C-X-C Motif Chemokine Receptor 4 (CXCR4) and Fibroblast Activation Protein Alpha (FAP) are promising theranostic targets. However, it is unclear whether CXCR4 and FAP positivity mark distinct microenvironments, especially in solid tumors. (2) Methods: Using Random Forest (RF) analysis, we searched for entity-independent mRNA and microRNA signatures related to CXCR4 and FAP overexpression in our pan-cancer cohort from The Cancer Genome Atlas (TCGA) database — representing n = 9242 specimens from 29 tumor entities. CXCR4- and FAP-positive samples were assessed via StringDB cluster analysis, EnrichR, Metascape, and Gene Set Enrichment Analysis (GSEA). Findings were validated via correlation analyses in n = 1541 tumor samples. TIMER2.0 analyzed the association of CXCR4 / FAP expression and infiltration levels of immune-related cells. (3) Results: We identified entity-independent CXCR4 and FAP gene signatures representative for the majority of solid cancers. While CXCR4 positivity marked an immune-related microenvironment, FAP overexpression highlighted an angiogenesis-associated niche. TIMER2.0 analysis confirmed characteristic infiltration levels of CD8+ cells for CXCR4-positive tumors and endothelial cells for FAP-positive tumors. (4) Conclusions: CXCR4- and FAP-directed PET imaging could provide a non-invasive decision aid for entity-agnostic treatment of microenvironment in solid malignancies. Moreover, this machine learning workflow can easily be transferred towards other theranostic targets.
Bioimages frequently exhibit low signal-to-noise ratios due to experimental conditions, specimen characteristics, and imaging trade-offs. Reliable segmentation of such ambiguous images is difficult and laborious. Here we introduce deepflash2, a deep learning-enabled segmentation tool for bioimage analysis. The tool addresses typical challenges that may arise during the training, evaluation, and application of deep learning models on ambiguous data. The tool’s training and evaluation pipeline uses multiple expert annotations and deep model ensembles to achieve accurate results. The application pipeline supports various use-cases for expert annotations and includes a quality assurance mechanism in the form of uncertainty measures. Benchmarked against other tools, deepflash2 offers both high predictive accuracy and efficient computational resource usage. The tool is built upon established deep learning libraries and enables sharing of trained model ensembles with the research community. deepflash2 aims to simplify the integration of deep learning into bioimage analysis projects while improving accuracy and reliability.
Colorectal cancer (CRC) is a leading cause of cancer-related deaths worldwide. The best method to prevent CRC is with a colonoscopy. During this procedure, the gastroenterologist searches for polyps. However, there is a potential risk of polyps being missed by the gastroenterologist. Automated detection of polyps helps to assist the gastroenterologist during a colonoscopy. There are already publications examining the problem of polyp detection in the literature. Nevertheless, most of these systems are only used in the research context and are not implemented for clinical application. Therefore, we introduce the first fully open-source automated polyp-detection system scoring best on current benchmark data and implementing it ready for clinical application. To create the polyp-detection system (ENDOMIND-Advanced), we combined our own collected data from different hospitals and practices in Germany with open-source datasets to create a dataset with over 500,000 annotated images. ENDOMIND-Advanced leverages a post-processing technique based on video detection to work in real-time with a stream of images. It is integrated into a prototype ready for application in clinical interventions. We achieve better performance compared to the best system in the literature and score a F1-score of 90.24% on the open-source CVC-VideoClinicDB benchmark.
Machine learning techniques are excellent to analyze expression data from single cells. These techniques impact all fields ranging from cell annotation and clustering to signature identification. The presented framework evaluates gene selection sets how far they optimally separate defined phenotypes or cell groups. This innovation overcomes the present limitation to objectively and correctly identify a small gene set of high information content regarding separating phenotypes for which corresponding code scripts are provided. The small but meaningful subset of the original genes (or feature space) facilitates human interpretability of the differences of the phenotypes including those found by machine learning results and may even turn correlations between genes and phenotypes into a causal explanation. For the feature selection task, the principal feature analysis is utilized which reduces redundant information while selecting genes that carry the information for separating the phenotypes. In this context, the presented framework shows explainability of unsupervised learning as it reveals cell-type specific signatures. Apart from a Seurat preprocessing tool and the PFA script, the pipeline uses mutual information to balance accuracy and size of the gene set if desired. A validation part to evaluate the gene selection for their information content regarding the separation of the phenotypes is provided as well, binary and multiclass classification of 3 or 4 groups are studied. Results from different single-cell data are presented. In each, only about ten out of more than 30000 genes are identified as carrying the relevant information. The code is provided in a GitHub repository at https://github.com/AC-PHD/Seurat_PFA_pipeline.
During the COVID-19 pandemic, the novel coronavirus had an impact not only on public health but also on the mental health of the population. Public sentiment on mental health and depression is often captured only in small, survey-based studies, while work based on Twitter data often only looks at the period during the pandemic and does not make comparisons with the pre-pandemic situation. We collected tweets that included the hashtags #MentalHealth and #Depression from before and during the pandemic (8.5 months each). We used LDA (Latent Dirichlet Allocation) for topic modeling and LIWC, VADER, and NRC for sentiment analysis. We used three machine-learning classifiers to seek evidence regarding an automatically detectable change in tweets before vs. during the pandemic: (1) based on TF-IDF values, (2) based on the values from the sentiment libraries, (3) based on tweet content (deep-learning BERT classifier). Topic modeling revealed that Twitter users who explicitly used the hashtags #Depression and especially #MentalHealth did so to raise awareness. We observed an overall positive sentiment, and in tough times such as during the COVID-19 pandemic, tweets with #MentalHealth were often associated with gratitude. Among the three classification approaches, the BERT classifier showed the best performance, with an accuracy of 81% for #MentalHealth and 79% for #Depression. Although the data may have come from users familiar with mental health, these findings can help gauge public sentiment on the topic. The combination of (1) sentiment analysis, (2) topic modeling, and (3) tweet classification with machine learning proved useful in gaining comprehensive insight into public sentiment and could be applied to other data sources and topics.
Background
Medical resource management can be improved by assessing the likelihood of prolonged length of stay (LOS) for head and neck cancer surgery patients. The objective of this study was to develop predictive models that could be used to determine whether a patient's LOS after cancer surgery falls within the normal range of the cohort.
Methods
We conducted a retrospective analysis of a dataset consisting of 300 consecutive patients who underwent head and neck cancer surgery between 2017 and 2022 at a single university medical center. Prolonged LOS was defined as LOS exceeding the 75th percentile of the cohort. Feature importance analysis was performed to evaluate the most important predictors for prolonged LOS. We then constructed 7 machine learning and deep learning algorithms for the prediction modeling of prolonged LOS.
Results
The algorithms reached accuracy values of 75.40 (radial basis function neural network) to 97.92 (Random Trees) for the training set and 64.90 (multilayer perceptron neural network) to 84.14 (Random Trees) for the testing set. The leading parameters predicting prolonged LOS were operation time, ischemia time, the graft used, the ASA score, the intensive care stay, and the pathological stages. The results revealed that patients who had a higher number of harvested lymph nodes (LN) had a lower probability of recurrence but also a greater LOS. However, patients with prolonged LOS were also at greater risk of recurrence, particularly when fewer (LN) were extracted. Further, LOS was more strongly correlated with the overall number of extracted lymph nodes than with the number of positive lymph nodes or the ratio of positive to overall extracted lymph nodes, indicating that particularly unnecessary lymph node extraction might be associated with prolonged LOS.
Conclusions
The results emphasize the need for a closer follow-up of patients who experience prolonged LOS. Prospective trials are warranted to validate the present results.
Highlights
• Brain connectivity states identified by cofluctuation strength.
• CMEP as new method to robustly predict human traits from brain imaging data.
• Network-identifying connectivity ‘events’ are not predictive of cognitive ability.
• Sixteen temporally independent fMRI time frames allow for significant prediction.
• Neuroimaging-based assessment of cognitive ability requires sufficient scan lengths.
Abstract
Human functional brain connectivity can be temporally decomposed into states of high and low cofluctuation, defined as coactivation of brain regions over time. Rare states of particularly high cofluctuation have been shown to reflect fundamentals of intrinsic functional network architecture and to be highly subject-specific. However, it is unclear whether such network-defining states also contribute to individual variations in cognitive abilities – which strongly rely on the interactions among distributed brain regions. By introducing CMEP, a new eigenvector-based prediction framework, we show that as few as 16 temporally separated time frames (< 1.5% of 10 min resting-state fMRI) can significantly predict individual differences in intelligence (N = 263, p < .001). Against previous expectations, individual's network-defining time frames of particularly high cofluctuation do not predict intelligence. Multiple functional brain networks contribute to the prediction, and all results replicate in an independent sample (N = 831). Our results suggest that although fundamentals of person-specific functional connectomes can be derived from few time frames of highest connectivity, temporally distributed information is necessary to extract information about cognitive abilities. This information is not restricted to specific connectivity states, like network-defining high-cofluctuation states, but rather reflected across the entire length of the brain connectivity time series.
Variability of gene expression due to stochasticity of transcription or variation of extrinsic signals, termed biological noise, is a potential driving force of cellular differentiation. Utilizing single-cell RNA-sequencing, we develop VarID2 for the quantification of biological noise at single-cell resolution. VarID2 reveals enhanced nuclear versus cytoplasmic noise, and distinct regulatory modes stratified by correlation between noise, expression, and chromatin accessibility. Noise levels are minimal in murine hematopoietic stem cells (HSCs) and increase during differentiation and ageing. Differential noise identifies myeloid-biased Dlk1+ long-term HSCs in aged mice with enhanced quiescence and self-renewal capacity. VarID2 reveals noise dynamics invisible to conventional single-cell transcriptome analysis.
Objectives
Open-access cancer imaging datasets have become integral for evaluating novel AI approaches in radiology. However, their use in quantitative analysis with radiomics features presents unique challenges, such as incomplete documentation, low visibility, non-uniform data formats, data inhomogeneity, and complex preprocessing. These issues may cause problems with reproducibility and standardization in radiomics studies.
Methods
We systematically reviewed imaging datasets with public copyright licenses, published up to March 2023 across four large online cancer imaging archives. We included only datasets with tomographic images (CT, MRI, or PET), segmentations, and clinical annotations, specifically identifying those suitable for radiomics research. Reproducible preprocessing and feature extraction were performed for each dataset to enable their easy reuse.
Results
We discovered 29 datasets with corresponding segmentations and labels in the form of health outcomes, tumor pathology, staging, imaging-based scores, genetic markers, or repeated imaging. We compiled a repository encompassing 10,354 patients and 49,515 scans. Of the 29 datasets, 15 were licensed under Creative Commons licenses, allowing both non-commercial and commercial usage and redistribution, while others featured custom or restricted licenses. Studies spanned from the early 1990s to 2021, with the majority concluding after 2013. Seven different formats were used for the imaging data. Preprocessing and feature extraction were successfully performed for each dataset.
Conclusion
RadiomicsHub is a comprehensive public repository with radiomics features derived from a systematic review of public cancer imaging datasets. By converting all datasets to a standardized format and ensuring reproducible and traceable processing, RadiomicsHub addresses key reproducibility and standardization challenges in radiomics.
Critical relevance statement
This study critically addresses the challenges associated with locating, preprocessing, and extracting quantitative features from open-access datasets, to facilitate more robust and reliable evaluations of radiomics models.
Key points
- Through a systematic review, we identified 29 cancer imaging datasets suitable for radiomics research.
- A public repository with collection overview and radiomics features, encompassing 10,354 patients and 49,515 scans, was compiled.
- Most datasets can be shared, used, and built upon freely under a Creative Commons license.
- All 29 identified datasets have been converted into a common format to enable reproducible radiomics feature extraction.
Background
Machine learning, especially deep learning, is becoming more and more relevant in research and development in the medical domain. For all the supervised deep learning applications, data is the most critical factor in securing successful implementation and sustaining the progress of the machine learning model. Especially gastroenterological data, which often involves endoscopic videos, are cumbersome to annotate. Domain experts are needed to interpret and annotate the videos. To support those domain experts, we generated a framework. With this framework, instead of annotating every frame in the video sequence, experts are just performing key annotations at the beginning and the end of sequences with pathologies, e.g., visible polyps. Subsequently, non-expert annotators supported by machine learning add the missing annotations for the frames in-between.
Methods
In our framework, an expert reviews the video and annotates a few video frames to verify the object’s annotations for the non-expert. In a second step, a non-expert has visual confirmation of the given object and can annotate all following and preceding frames with AI assistance. After the expert has finished, relevant frames will be selected and passed on to an AI model. This information allows the AI model to detect and mark the desired object on all following and preceding frames with an annotation. Therefore, the non-expert can adjust and modify the AI predictions and export the results, which can then be used to train the AI model.
Results
Using this framework, we were able to reduce workload of domain experts on average by a factor of 20 on our data. This is primarily due to the structure of the framework, which is designed to minimize the workload of the domain expert. Pairing this framework with a state-of-the-art semi-automated AI model enhances the annotation speed further. Through a prospective study with 10 participants, we show that semi-automated annotation using our tool doubles the annotation speed of non-expert annotators compared to a well-known state-of-the-art annotation tool.
Conclusion
In summary, we introduce a framework for fast expert annotation for gastroenterologists, which reduces the workload of the domain expert considerably while maintaining a very high annotation quality. The framework incorporates a semi-automated annotation system utilizing trained object detection models. The software and framework are open-source.
Prediction of tinnitus perception based on daily life mHealth data using country origin and season
(2022)
Tinnitus is an auditory phantom perception without external sound stimuli. This chronic perception can severely affect quality of life. Because tinnitus symptoms are highly heterogeneous, multimodal data analyses are increasingly used to gain new insights. MHealth data sources, with their particular focus on country- and season-specific differences, can provide a promising avenue for new insights. Therefore, we examined data from the TrackYourTinnitus (TYT) mHealth platform to create symptom profiles of TYT users. We used gradient boosting engines to classify momentary tinnitus and regress tinnitus loudness, using country of origin and season as features. At the daily assessment level, tinnitus loudness can be regressed with a mean absolute error rate of 7.9% points. In turn, momentary tinnitus can be classified with an F1 score of 93.79%. Both results indicate differences in the tinnitus of TYT users with respect to season and country of origin. The significance of the features was evaluated using statistical and explainable machine learning methods. It was further shown that tinnitus varies with temperature in certain countries. The results presented show that season and country of origin appear to be valuable features when combined with longitudinal mHealth data at the level of daily assessment.
Purpose
Machine learning based on radiomics features has seen huge success in a variety of clinical applications. However, the need for standardization and reproducibility has been increasingly recognized as a necessary step for future clinical translation. We developed a novel, intuitive open-source framework to facilitate all data analysis steps of a radiomics workflow in an easy and reproducible manner and evaluated it by reproducing classification results in eight available open-source datasets from different clinical entities.
Methods
The framework performs image preprocessing, feature extraction, feature selection, modeling, and model evaluation, and can automatically choose the optimal parameters for a given task. All analysis steps can be reproduced with a web application, which offers an interactive user interface and does not require programming skills. We evaluated our method in seven different clinical applications using eight public datasets: six datasets from the recently published WORC database, and two prostate MRI datasets—Prostate MRI and Ultrasound With Pathology and Coordinates of Tracked Biopsy (Prostate-UCLA) and PROSTATEx.
Results
In the analyzed datasets, AutoRadiomics successfully created and optimized models using radiomics features. For WORC datasets, we achieved AUCs ranging from 0.56 for lung melanoma metastases detection to 0.93 for liposarcoma detection and thereby managed to replicate the previously reported results. No significant overfitting between training and test sets was observed. For the prostate cancer detection task, results were better in the PROSTATEx dataset (AUC = 0.73 for prostate and 0.72 for lesion mask) than in the Prostate-UCLA dataset (AUC 0.61 for prostate and 0.65 for lesion mask), with external validation results varying from AUC = 0.51 to AUC = 0.77.
Conclusion
AutoRadiomics is a robust tool for radiomic studies, which can be used as a comprehensive solution, one of the analysis steps, or an exploratory tool. Its wide applicability was confirmed by the results obtained in the diverse analyzed datasets. The framework, as well as code for this analysis, are publicly available under https://github.com/pwoznicki/AutoRadiomics.