Refine
Has Fulltext
- yes (50)
Is part of the Bibliography
- yes (50) (remove)
Document Type
- Journal article (49)
- Doctoral Thesis (1)
Language
- English (50) (remove)
Keywords
- machine learning (50) (remove)
Institute
- Institut für Geographie und Geologie (9)
- Institut für Informatik (9)
- Center for Computational and Theoretical Biology (5)
- Institut für Klinische Epidemiologie und Biometrie (5)
- Medizinische Klinik und Poliklinik II (4)
- Pathologisches Institut (4)
- Theodor-Boveri-Institut für Biowissenschaften (4)
- Betriebswirtschaftliches Institut (3)
- Institut für diagnostische und interventionelle Radiologie (Institut für Röntgendiagnostik) (3)
- Klinik und Poliklinik für Mund-, Kiefer- und Plastische Gesichtschirurgie (3)
Sonstige beteiligte Institutionen
Background
Germinal center-derived B cell lymphomas are tumors of the lymphoid tissues representing one of the most heterogeneous malignancies. Here we characterize the variety of transcriptomic phenotypes of this disease based on 873 biopsy specimens collected in the German Cancer Aid MMML (Molecular Mechanisms in Malignant Lymphoma) consortium. They include diffuse large B cell lymphoma (DLBCL), follicular lymphoma (FL), Burkitt’s lymphoma, mixed FL/DLBCL lymphomas, primary mediastinal large B cell lymphoma, multiple myeloma, IRF4-rearranged large cell lymphoma, MYC-negative Burkitt-like lymphoma with chr. 11q aberration and mantle cell lymphoma.
Methods
We apply self-organizing map (SOM) machine learning to microarray-derived expression data to generate a holistic view on the transcriptome landscape of lymphomas, to describe the multidimensional nature of gene regulation and to pursue a modular view on co-expression. Expression data were complemented by pathological, genetic and clinical characteristics.
Results
We present a transcriptome map of B cell lymphomas that allows visual comparison between the SOM portraits of different lymphoma strata and individual cases. It decomposes into one dozen modules of co-expressed genes related to different functional categories, to genetic defects and to the pathogenesis of lymphomas. On a molecular level, this disease rather forms a continuum of expression states than clearly separated phenotypes. We introduced the concept of combinatorial pattern types (PATs) that stratifies the lymphomas into nine PAT groups and, on a coarser level, into five prominent cancer hallmark types with proliferation, inflammation and stroma signatures. Inflammation signatures in combination with healthy B cell and tonsil characteristics associate with better overall survival rates, while proliferation in combination with inflammation and plasma cell characteristics worsens it. A phenotypic similarity tree is presented that reveals possible progression paths along the transcriptional dimensions. Our analysis provided a novel look on the transition range between FL and DLBCL, on DLBCL with poor prognosis showing expression patterns resembling that of Burkitt’s lymphoma and particularly on ‘double-hit’ MYC and BCL2 transformed lymphomas.
Conclusions
The transcriptome map provides a tool that aggregates, refines and visualizes the data collected in the MMML study and interprets them in the light of previous knowledge to provide orientation and support in current and future studies on lymphomas and on other cancer entities.
Supraglacial meltwater accumulation on ice sheets can be a main driver for accelerated ice discharge, mass loss, and global sea-level-rise. With further increasing surface air temperatures, meltwater-induced hydrofracturing, basal sliding, or surface thinning will cumulate and most likely trigger unprecedented ice mass loss on the Greenland and Antarctic ice sheets. While the Greenland surface hydrological network as well as its impacts on ice dynamics and mass balance has been studied in much detail, Antarctic supraglacial lakes remain understudied with a circum-Antarctic record of their spatio-temporal development entirely lacking. This study provides the first automated supraglacial lake extent mapping method using Sentinel-1 synthetic aperture radar (SAR) imagery over Antarctica and complements the developed optical Sentinel-2 supraglacial lake detection algorithm presented in our companion paper. In detail, we propose the use of a modified U-Net for semantic segmentation of supraglacial lakes in single-polarized Sentinel-1 imagery. The convolutional neural network (CNN) is implemented with residual connections for optimized performance as well as an Atrous Spatial Pyramid Pooling (ASPP) module for multiscale feature extraction. The algorithm is trained on 21,200 Sentinel-1 image patches and evaluated in ten spatially or temporally independent test acquisitions. In addition, George VI Ice Shelf is analyzed for intra-annual lake dynamics throughout austral summer 2019/2020 and a decision-level fused Sentinel-1 and Sentinel-2 maximum lake extent mapping product is presented for January 2020 revealing a more complete supraglacial lake coverage (~770 km\(^2\)) than the individual single-sensor products. Classification results confirm the reliability of the proposed workflow with an average Kappa coefficient of 0.925 and a F\(_1\)-score of 93.0% for the supraglacial water class across all test regions. Furthermore, the algorithm is applied in an additional test region covering supraglacial lakes on the Greenland ice sheet which further highlights the potential for spatio-temporal transferability. Future work involves the integration of more training data as well as intra-annual analyses of supraglacial lake occurrence across the whole continent and with focus on supraglacial lake development throughout a summer melt season and into Antarctic winter.
Colorectal cancer (CRC) is a leading cause of cancer-related deaths worldwide. The best method to prevent CRC is with a colonoscopy. During this procedure, the gastroenterologist searches for polyps. However, there is a potential risk of polyps being missed by the gastroenterologist. Automated detection of polyps helps to assist the gastroenterologist during a colonoscopy. There are already publications examining the problem of polyp detection in the literature. Nevertheless, most of these systems are only used in the research context and are not implemented for clinical application. Therefore, we introduce the first fully open-source automated polyp-detection system scoring best on current benchmark data and implementing it ready for clinical application. To create the polyp-detection system (ENDOMIND-Advanced), we combined our own collected data from different hospitals and practices in Germany with open-source datasets to create a dataset with over 500,000 annotated images. ENDOMIND-Advanced leverages a post-processing technique based on video detection to work in real-time with a stream of images. It is integrated into a prototype ready for application in clinical interventions. We achieve better performance compared to the best system in the literature and score a F1-score of 90.24% on the open-source CVC-VideoClinicDB benchmark.
The identification of biomarker signatures is important for cancer diagnosis and prognosis. However, the detection of clinical reliable signatures is influenced by limited data availability, which may restrict statistical power. Moreover, methods for integration of large sample cohorts and signature identification are limited. We present a step-by-step computational protocol for functional gene expression analysis and the identification of diagnostic and prognostic signatures by combining meta-analysis with machine learning and survival analysis. The novelty of the toolbox lies in its all-in-one functionality, generic design, and modularity. It is exemplified for lung cancer, including a comprehensive evaluation using different validation strategies. However, the protocol is not restricted to specific disease types and can therefore be used by a broad community. The accompanying R package vignette runs in ~1 h and describes the workflow in detail for use by researchers with limited bioinformatics training.
Objectives
Open-access cancer imaging datasets have become integral for evaluating novel AI approaches in radiology. However, their use in quantitative analysis with radiomics features presents unique challenges, such as incomplete documentation, low visibility, non-uniform data formats, data inhomogeneity, and complex preprocessing. These issues may cause problems with reproducibility and standardization in radiomics studies.
Methods
We systematically reviewed imaging datasets with public copyright licenses, published up to March 2023 across four large online cancer imaging archives. We included only datasets with tomographic images (CT, MRI, or PET), segmentations, and clinical annotations, specifically identifying those suitable for radiomics research. Reproducible preprocessing and feature extraction were performed for each dataset to enable their easy reuse.
Results
We discovered 29 datasets with corresponding segmentations and labels in the form of health outcomes, tumor pathology, staging, imaging-based scores, genetic markers, or repeated imaging. We compiled a repository encompassing 10,354 patients and 49,515 scans. Of the 29 datasets, 15 were licensed under Creative Commons licenses, allowing both non-commercial and commercial usage and redistribution, while others featured custom or restricted licenses. Studies spanned from the early 1990s to 2021, with the majority concluding after 2013. Seven different formats were used for the imaging data. Preprocessing and feature extraction were successfully performed for each dataset.
Conclusion
RadiomicsHub is a comprehensive public repository with radiomics features derived from a systematic review of public cancer imaging datasets. By converting all datasets to a standardized format and ensuring reproducible and traceable processing, RadiomicsHub addresses key reproducibility and standardization challenges in radiomics.
Critical relevance statement
This study critically addresses the challenges associated with locating, preprocessing, and extracting quantitative features from open-access datasets, to facilitate more robust and reliable evaluations of radiomics models.
Key points
- Through a systematic review, we identified 29 cancer imaging datasets suitable for radiomics research.
- A public repository with collection overview and radiomics features, encompassing 10,354 patients and 49,515 scans, was compiled.
- Most datasets can be shared, used, and built upon freely under a Creative Commons license.
- All 29 identified datasets have been converted into a common format to enable reproducible radiomics feature extraction.
An approach to aerodynamically optimizing cycling posture and reducing drag in an Ironman (IM) event was elaborated. Therefore, four commonly used positions in cycling were investigated and simulated for a flow velocity of 10 m/s and yaw angles of 0–20° using OpenFoam-based Nabla Flow CFD simulation software software. A cyclist was scanned using an IPhone 12, and a special-purpose meshing software BLENDER was used. Significant differences were observed by changing and optimizing the cyclist’s posture. Aerodynamic drag coefficient (CdA) varies by more than a factor of 2, ranging from 0.214 to 0.450. Within a position, the CdA tends to increase slightly at yaw angles of 5–10° and decrease at higher yaw angles compared to a straight head wind, except for the time trial (TT) position. The results were applied to the IM Hawaii bike course (180 km), estimating a constant power output of 300 W. Including the wind distributions, two different bike split models for performance prediction were applied. Significant time saving of roughly 1 h was found. Finally, a machine learning approach to deduce 3D triangulation for specific body shapes from 2D pictures was tested.
o build, run, and maintain reliable manufacturing machines, the condition of their components has to be continuously monitored. When following a fine-grained monitoring of these machines, challenges emerge pertaining to the (1) feeding procedure of large amounts of sensor data to downstream processing components and the (2) meaningful analysis of the produced data. Regarding the latter aspect, manifold purposes are addressed by practitioners and researchers. Two analyses of real-world datasets that were generated in production settings are discussed in this paper. More specifically, the analyses had the goals (1) to detect sensor data anomalies for further analyses of a pharma packaging scenario and (2) to predict unfavorable temperature values of a 3D printing machine environment. Based on the results of the analyses, it will be shown that a proper management of machines and their components in industrial manufacturing environments can be efficiently supported by the detection of anomalies. The latter shall help to support the technical evangelists of the production companies more properly.
Artificial intelligence (AI) has already arrived in many areas of our lives and, because of the increasing availability of computing power, can now be used for complex tasks in medicine and dentistry. This is reflected by an exponential increase in scientific publications aiming to integrate AI into everyday clinical routines. Applications of AI in orthodontics are already manifold and range from the identification of anatomical/pathological structures or reference points in imaging to the support of complex decision-making in orthodontic treatment planning. The aim of this article is to give the reader an overview of the current state of the art regarding applications of AI in orthodontics and to provide a perspective for the use of such AI solutions in clinical routine. For this purpose, we present various use cases for AI in orthodontics, for which research is already available. Considering the current scientific progress, it is not unreasonable to assume that AI will become an integral part of orthodontic diagnostics and treatment planning in the near future. Although AI will equally likely not be able to replace the knowledge and experience of human experts in the not-too-distant future, it probably will be able to support practitioners, thus serving as a quality-assuring component in orthodontic patient care.
Background: Tinnitus is often described as the phantom perception of a sound and is experienced by 5.1% to 42.7% of the population worldwide, at least once during their lifetime. The symptoms often reduce the patient's quality of life. The TrackYourTinnitus (TYT) mobile health (mHealth) crowdsensing platform was developed for two operating systems (OS)-Android and iOS-to help patients demystify the daily moment-to-moment variations of their tinnitus symptoms. In all platforms developed for more than one OS, it is important to investigate whether the crowdsensed data predicts the OS that was used in order to understand the degree to which the OS is a confounder that is necessary to consider.
Associations between periodontitis and COPD: An artificial intelligence-based analysis of NHANES III
(2022)
A number of cross-sectional epidemiological studies suggest that poor oral health is associated with respiratory diseases. However, the number of cases within the studies was limited, and the studies had different measurement conditions. By analyzing data from the National Health and Nutrition Examination Survey III (NHANES III), this study aimed to investigate possible associations between chronic obstructive pulmonary disease (COPD) and periodontitis in the general population. COPD was diagnosed in cases where FEV (1)/FVC ratio was below 70% (non-COPD versus COPD; binary classification task). We used unsupervised learning utilizing k-means clustering to identify clusters in the data. COPD classes were predicted with logistic regression, a random forest classifier, a stochastic gradient descent (SGD) classifier, k-nearest neighbors, a decision tree classifier, Gaussian naive Bayes (GaussianNB), support vector machines (SVM), a custom-made convolutional neural network (CNN), a multilayer perceptron artificial neural network (MLP), and a radial basis function neural network (RBNN) in Python. We calculated the accuracy of the prediction and the area under the curve (AUC). The most important predictors were determined using feature importance analysis. Results: Overall, 15,868 participants and 19 feature variables were included. Based on k-means clustering, the data were separated into two clusters that identified two risk characteristic groups of patients. The algorithms reached AUCs between 0.608 (DTC) and 0.953% (CNN) for the classification of COPD classes. Feature importance analysis of deep learning algorithms indicated that age and mean attachment loss were the most important features in predicting COPD. Conclusions: Data analysis of a large population showed that machine learning and deep learning algorithms could predict COPD cases based on demographics and oral health feature variables. This study indicates that periodontitis might be an important predictor of COPD. Further prospective studies examining the association between periodontitis and COPD are warranted to validate the present results.
Artificial intelligence (AI) is predicted to play an increasingly important role in perioperative medicine in the very near future. However, little is known about what anesthesiologists know and think about AI in this context. This is important because the successful introduction of new technologies depends on the understanding and cooperation of end users. We sought to investigate how much anesthesiologists know about AI and what they think about the introduction of AI-based technologies into the clinical setting. In order to better understand what anesthesiologists think of AI, we recruited 21 anesthesiologists from 2 university hospitals for face-to-face structured interviews. The interview transcripts were subdivided sentence-by-sentence into discrete statements, and statements were then grouped into key themes. Subsequently, a survey of closed questions based on these themes was sent to 70 anesthesiologists from 3 university hospitals for rating. In the interviews, the base level of knowledge of AI was good at 86 of 90 statements (96%), although awareness of the potential applications of AI in anesthesia was poor at only 7 of 42 statements (17%). Regarding the implementation of AI in anesthesia, statements were split roughly evenly between pros (46 of 105, 44%) and cons (59 of 105, 56%). Interviewees considered that AI could usefully be used in diverse tasks such as risk stratification, the prediction of vital sign changes, or as a treatment guide. The validity of these themes was probed in a follow-up survey of 70 anesthesiologists with a response rate of 70%, which confirmed an overall positive view of AI in this group. Anesthesiologists hold a range of opinions, both positive and negative, regarding the application of AI in their field of work. Survey-based studies do not always uncover the full breadth of nuance of opinion amongst clinicians. Engagement with specific concerns, both technical and ethical, will prove important as this technology moves from research to the clinic.
Background
Colorectal cancer is a leading cause of cancer-related deaths worldwide. The best method to prevent CRC is a colonoscopy. However, not all colon polyps have the risk of becoming cancerous. Therefore, polyps are classified using different classification systems. After the classification, further treatment and procedures are based on the classification of the polyp. Nevertheless, classification is not easy. Therefore, we suggest two novel automated classifications system assisting gastroenterologists in classifying polyps based on the NICE and Paris classification.
Methods
We build two classification systems. One is classifying polyps based on their shape (Paris). The other classifies polyps based on their texture and surface patterns (NICE). A two-step process for the Paris classification is introduced: First, detecting and cropping the polyp on the image, and secondly, classifying the polyp based on the cropped area with a transformer network. For the NICE classification, we design a few-shot learning algorithm based on the Deep Metric Learning approach. The algorithm creates an embedding space for polyps, which allows classification from a few examples to account for the data scarcity of NICE annotated images in our database.
Results
For the Paris classification, we achieve an accuracy of 89.35 %, surpassing all papers in the literature and establishing a new state-of-the-art and baseline accuracy for other publications on a public data set. For the NICE classification, we achieve a competitive accuracy of 81.13 % and demonstrate thereby the viability of the few-shot learning paradigm in polyp classification in data-scarce environments. Additionally, we show different ablations of the algorithms. Finally, we further elaborate on the explainability of the system by showing heat maps of the neural network explaining neural activations.
Conclusion
Overall we introduce two polyp classification systems to assist gastroenterologists. We achieve state-of-the-art performance in the Paris classification and demonstrate the viability of the few-shot learning paradigm in the NICE classification, addressing the prevalent data scarcity issues faced in medical machine learning.
Synaptic vesicles (SVs) are a key component of neuronal signaling and fulfil different roles depending on their composition. In electron micrograms of neurites, two types of vesicles can be distinguished by morphological criteria, the classical “clear core” vesicles (CCV) and the typically larger “dense core” vesicles (DCV), with differences in electron density due to their diverse cargos. Compared to CCVs, the precise function of DCVs is less defined. DCVs are known to store neuropeptides, which function as neuronal messengers and modulators [1]. In C. elegans, they play a role in locomotion, dauer formation, egg-laying, and mechano- and chemosensation [2]. Another type of DCVs, also referred to as granulated vesicles, are known to transport Bassoon, Piccolo and further constituents of the presynaptic density in the center of the active zone (AZ), and therefore are important for synaptogenesis [3].
To better understand the role of different types of SVs, we present here a new automated approach to classify vesicles. We combine machine learning with an extension of our previously developed vesicle segmentation workflow, the ImageJ macro 3D ART VeSElecT. With that we reliably distinguish CCVs and DCVs in electron tomograms of C. elegans NMJs using image-based features. Analysis of the underlying ground truth data shows an increased fraction of DCVs as well as a higher mean distance between DCVs and AZs in dauer larvae compared to young adult hermaphrodites. Our machine learning based tools are adaptable and can be applied to study properties of different synaptic vesicle pools in electron tomograms of diverse model organisms.
Supraglacial lakes can have considerable impact on ice sheet mass balance and global sea-level-rise through ice shelf fracturing and subsequent glacier speedup. In Antarctica, the distribution and temporal development of supraglacial lakes as well as their potential contribution to increased ice mass loss remains largely unknown, requiring a detailed mapping of the Antarctic surface hydrological network. In this study, we employ a Machine Learning algorithm trained on Sentinel-2 and auxiliary TanDEM-X topographic data for automated mapping of Antarctic supraglacial lakes. To ensure the spatio-temporal transferability of our method, a Random Forest was trained on 14 training regions and applied over eight spatially independent test regions distributed across the whole Antarctic continent. In addition, we employed our workflow for large-scale application over Amery Ice Shelf where we calculated interannual supraglacial lake dynamics between 2017 and 2020 at full ice shelf coverage. To validate our supraglacial lake detection algorithm, we randomly created point samples over our classification results and compared them to Sentinel-2 imagery. The point comparisons were evaluated using a confusion matrix for calculation of selected accuracy metrics. Our analysis revealed wide-spread supraglacial lake occurrence in all three Antarctic regions. For the first time, we identified supraglacial meltwater features on Abbott, Hull and Cosgrove Ice Shelves in West Antarctica as well as for the entire Amery Ice Shelf for years 2017–2020. Over Amery Ice Shelf, maximum lake extent varied strongly between the years with the 2019 melt season characterized by the largest areal coverage of supraglacial lakes (~763 km\(^2\)). The accuracy assessment over the test regions revealed an average Kappa coefficient of 0.86 where the largest value of Kappa reached 0.98 over George VI Ice Shelf. Future developments will involve the generation of circum-Antarctic supraglacial lake mapping products as well as their use for further methodological developments using Sentinel-1 SAR data in order to characterize intraannual supraglacial meltwater dynamics also during polar night and independent of meteorological conditions. In summary, the implementation of the Random Forest classifier enabled the development of the first automated mapping method applied to Sentinel-2 data distributed across all three Antarctic regions.
Purpose
Machine learning based on radiomics features has seen huge success in a variety of clinical applications. However, the need for standardization and reproducibility has been increasingly recognized as a necessary step for future clinical translation. We developed a novel, intuitive open-source framework to facilitate all data analysis steps of a radiomics workflow in an easy and reproducible manner and evaluated it by reproducing classification results in eight available open-source datasets from different clinical entities.
Methods
The framework performs image preprocessing, feature extraction, feature selection, modeling, and model evaluation, and can automatically choose the optimal parameters for a given task. All analysis steps can be reproduced with a web application, which offers an interactive user interface and does not require programming skills. We evaluated our method in seven different clinical applications using eight public datasets: six datasets from the recently published WORC database, and two prostate MRI datasets—Prostate MRI and Ultrasound With Pathology and Coordinates of Tracked Biopsy (Prostate-UCLA) and PROSTATEx.
Results
In the analyzed datasets, AutoRadiomics successfully created and optimized models using radiomics features. For WORC datasets, we achieved AUCs ranging from 0.56 for lung melanoma metastases detection to 0.93 for liposarcoma detection and thereby managed to replicate the previously reported results. No significant overfitting between training and test sets was observed. For the prostate cancer detection task, results were better in the PROSTATEx dataset (AUC = 0.73 for prostate and 0.72 for lesion mask) than in the Prostate-UCLA dataset (AUC 0.61 for prostate and 0.65 for lesion mask), with external validation results varying from AUC = 0.51 to AUC = 0.77.
Conclusion
AutoRadiomics is a robust tool for radiomic studies, which can be used as a comprehensive solution, one of the analysis steps, or an exploratory tool. Its wide applicability was confirmed by the results obtained in the diverse analyzed datasets. The framework, as well as code for this analysis, are publicly available under https://github.com/pwoznicki/AutoRadiomics.
This paper describes the estimation of the body weight of a person in front of an RGB-D camera. A survey of different methods for body weight estimation based on depth sensors is given. First, an estimation of people standing in front of a camera is presented. Second, an approach based on a stream of depth images is used to obtain the body weight of a person walking towards a sensor. The algorithm first extracts features from a point cloud and forwards them to an artificial neural network (ANN) to obtain an estimation of body weight. Besides the algorithm for the estimation, this paper further presents an open-access dataset based on measurements from a trauma room in a hospital as well as data from visitors of a public event. In total, the dataset contains 439 measurements. The article illustrates the efficiency of the approach with experiments with persons lying down in a hospital, standing persons, and walking persons. Applicable scenarios for the presented algorithm are body weight-related dosing of emergency patients.
Bioimages frequently exhibit low signal-to-noise ratios due to experimental conditions, specimen characteristics, and imaging trade-offs. Reliable segmentation of such ambiguous images is difficult and laborious. Here we introduce deepflash2, a deep learning-enabled segmentation tool for bioimage analysis. The tool addresses typical challenges that may arise during the training, evaluation, and application of deep learning models on ambiguous data. The tool’s training and evaluation pipeline uses multiple expert annotations and deep model ensembles to achieve accurate results. The application pipeline supports various use-cases for expert annotations and includes a quality assurance mechanism in the form of uncertainty measures. Benchmarked against other tools, deepflash2 offers both high predictive accuracy and efficient computational resource usage. The tool is built upon established deep learning libraries and enables sharing of trained model ensembles with the research community. deepflash2 aims to simplify the integration of deep learning into bioimage analysis projects while improving accuracy and reliability.
In most countries, freight is predominantly transported by road cargo trucks. We present a new satellite remote sensing method for detecting moving trucks on roads using Sentinel-2 data. The method exploits a temporal sensing offset of the Sentinel-2 multispectral instrument, causing spatially and spectrally distorted signatures of moving objects. A random forest classifier was trained (overall accuracy: 84%) on visual-near-infrared-spectra of 2500 globally labelled targets. Based on the classification, the target objects were extracted using a developed recursive neighbourhood search. The speed and the heading of the objects were approximated. Detections were validated by employing 350 globally labelled target boxes (mean F\(_1\) score: 0.74). The lowest F\(_1\) score was achieved in Kenya (0.36), the highest in Poland (0.88). Furthermore, validated at 26 traffic count stations in Germany on in sum 390 dates, the truck detections correlate spatio-temporally with station figures (Pearson r-value: 0.82, RMSE: 43.7). Absolute counts were underestimated on 81% of the dates. The detection performance may differ by season and road condition. Hence, the method is only suitable for approximating the relative truck traffic abundance rather than providing accurate absolute counts. However, existing road cargo monitoring methods that rely on traffic count stations or very high resolution remote sensing data have limited global availability. The proposed moving truck detection method could fill this gap, particularly where other information on road cargo traffic are sparse by employing globally and freely available Sentinel-2 data. It is inferior to the accuracy and the temporal detail of station counts, but superior in terms of spatial coverage.
Background
Machine learning, especially deep learning, is becoming more and more relevant in research and development in the medical domain. For all the supervised deep learning applications, data is the most critical factor in securing successful implementation and sustaining the progress of the machine learning model. Especially gastroenterological data, which often involves endoscopic videos, are cumbersome to annotate. Domain experts are needed to interpret and annotate the videos. To support those domain experts, we generated a framework. With this framework, instead of annotating every frame in the video sequence, experts are just performing key annotations at the beginning and the end of sequences with pathologies, e.g., visible polyps. Subsequently, non-expert annotators supported by machine learning add the missing annotations for the frames in-between.
Methods
In our framework, an expert reviews the video and annotates a few video frames to verify the object’s annotations for the non-expert. In a second step, a non-expert has visual confirmation of the given object and can annotate all following and preceding frames with AI assistance. After the expert has finished, relevant frames will be selected and passed on to an AI model. This information allows the AI model to detect and mark the desired object on all following and preceding frames with an annotation. Therefore, the non-expert can adjust and modify the AI predictions and export the results, which can then be used to train the AI model.
Results
Using this framework, we were able to reduce workload of domain experts on average by a factor of 20 on our data. This is primarily due to the structure of the framework, which is designed to minimize the workload of the domain expert. Pairing this framework with a state-of-the-art semi-automated AI model enhances the annotation speed further. Through a prospective study with 10 participants, we show that semi-automated annotation using our tool doubles the annotation speed of non-expert annotators compared to a well-known state-of-the-art annotation tool.
Conclusion
In summary, we introduce a framework for fast expert annotation for gastroenterologists, which reduces the workload of the domain expert considerably while maintaining a very high annotation quality. The framework incorporates a semi-automated annotation system utilizing trained object detection models. The software and framework are open-source.
Highlights
• Brain connectivity states identified by cofluctuation strength.
• CMEP as new method to robustly predict human traits from brain imaging data.
• Network-identifying connectivity ‘events’ are not predictive of cognitive ability.
• Sixteen temporally independent fMRI time frames allow for significant prediction.
• Neuroimaging-based assessment of cognitive ability requires sufficient scan lengths.
Abstract
Human functional brain connectivity can be temporally decomposed into states of high and low cofluctuation, defined as coactivation of brain regions over time. Rare states of particularly high cofluctuation have been shown to reflect fundamentals of intrinsic functional network architecture and to be highly subject-specific. However, it is unclear whether such network-defining states also contribute to individual variations in cognitive abilities – which strongly rely on the interactions among distributed brain regions. By introducing CMEP, a new eigenvector-based prediction framework, we show that as few as 16 temporally separated time frames (< 1.5% of 10 min resting-state fMRI) can significantly predict individual differences in intelligence (N = 263, p < .001). Against previous expectations, individual's network-defining time frames of particularly high cofluctuation do not predict intelligence. Multiple functional brain networks contribute to the prediction, and all results replicate in an independent sample (N = 831). Our results suggest that although fundamentals of person-specific functional connectomes can be derived from few time frames of highest connectivity, temporally distributed information is necessary to extract information about cognitive abilities. This information is not restricted to specific connectivity states, like network-defining high-cofluctuation states, but rather reflected across the entire length of the brain connectivity time series.