Refine
Has Fulltext
- yes (51)
Is part of the Bibliography
- yes (51)
Document Type
- Journal article (51) (remove)
Keywords
- machine learning (51) (remove)
Institute
- Institut für Geographie und Geologie (9)
- Institut für Informatik (9)
- Center for Computational and Theoretical Biology (5)
- Institut für Klinische Epidemiologie und Biometrie (5)
- Betriebswirtschaftliches Institut (4)
- Medizinische Klinik und Poliklinik II (4)
- Pathologisches Institut (4)
- Theodor-Boveri-Institut für Biowissenschaften (4)
- Klinik und Poliklinik für Dermatologie, Venerologie und Allergologie (3)
- Klinik und Poliklinik für Mund-, Kiefer- und Plastische Gesichtschirurgie (3)
Sonstige beteiligte Institutionen
Objectives
Open-access cancer imaging datasets have become integral for evaluating novel AI approaches in radiology. However, their use in quantitative analysis with radiomics features presents unique challenges, such as incomplete documentation, low visibility, non-uniform data formats, data inhomogeneity, and complex preprocessing. These issues may cause problems with reproducibility and standardization in radiomics studies.
Methods
We systematically reviewed imaging datasets with public copyright licenses, published up to March 2023 across four large online cancer imaging archives. We included only datasets with tomographic images (CT, MRI, or PET), segmentations, and clinical annotations, specifically identifying those suitable for radiomics research. Reproducible preprocessing and feature extraction were performed for each dataset to enable their easy reuse.
Results
We discovered 29 datasets with corresponding segmentations and labels in the form of health outcomes, tumor pathology, staging, imaging-based scores, genetic markers, or repeated imaging. We compiled a repository encompassing 10,354 patients and 49,515 scans. Of the 29 datasets, 15 were licensed under Creative Commons licenses, allowing both non-commercial and commercial usage and redistribution, while others featured custom or restricted licenses. Studies spanned from the early 1990s to 2021, with the majority concluding after 2013. Seven different formats were used for the imaging data. Preprocessing and feature extraction were successfully performed for each dataset.
Conclusion
RadiomicsHub is a comprehensive public repository with radiomics features derived from a systematic review of public cancer imaging datasets. By converting all datasets to a standardized format and ensuring reproducible and traceable processing, RadiomicsHub addresses key reproducibility and standardization challenges in radiomics.
Critical relevance statement
This study critically addresses the challenges associated with locating, preprocessing, and extracting quantitative features from open-access datasets, to facilitate more robust and reliable evaluations of radiomics models.
Key points
- Through a systematic review, we identified 29 cancer imaging datasets suitable for radiomics research.
- A public repository with collection overview and radiomics features, encompassing 10,354 patients and 49,515 scans, was compiled.
- Most datasets can be shared, used, and built upon freely under a Creative Commons license.
- All 29 identified datasets have been converted into a common format to enable reproducible radiomics feature extraction.
Purpose
Machine learning based on radiomics features has seen huge success in a variety of clinical applications. However, the need for standardization and reproducibility has been increasingly recognized as a necessary step for future clinical translation. We developed a novel, intuitive open-source framework to facilitate all data analysis steps of a radiomics workflow in an easy and reproducible manner and evaluated it by reproducing classification results in eight available open-source datasets from different clinical entities.
Methods
The framework performs image preprocessing, feature extraction, feature selection, modeling, and model evaluation, and can automatically choose the optimal parameters for a given task. All analysis steps can be reproduced with a web application, which offers an interactive user interface and does not require programming skills. We evaluated our method in seven different clinical applications using eight public datasets: six datasets from the recently published WORC database, and two prostate MRI datasets—Prostate MRI and Ultrasound With Pathology and Coordinates of Tracked Biopsy (Prostate-UCLA) and PROSTATEx.
Results
In the analyzed datasets, AutoRadiomics successfully created and optimized models using radiomics features. For WORC datasets, we achieved AUCs ranging from 0.56 for lung melanoma metastases detection to 0.93 for liposarcoma detection and thereby managed to replicate the previously reported results. No significant overfitting between training and test sets was observed. For the prostate cancer detection task, results were better in the PROSTATEx dataset (AUC = 0.73 for prostate and 0.72 for lesion mask) than in the Prostate-UCLA dataset (AUC 0.61 for prostate and 0.65 for lesion mask), with external validation results varying from AUC = 0.51 to AUC = 0.77.
Conclusion
AutoRadiomics is a robust tool for radiomic studies, which can be used as a comprehensive solution, one of the analysis steps, or an exploratory tool. Its wide applicability was confirmed by the results obtained in the diverse analyzed datasets. The framework, as well as code for this analysis, are publicly available under https://github.com/pwoznicki/AutoRadiomics.
Deep convolutional generative adversarial networks (GAN) allow for creating images from existing databases. We applied a modified light-weight GAN (FastGAN) algorithm to cerebral blood flow SPECTs and aimed to evaluate whether this technology can generate created images close to real patients. Investigating three anatomical levels (cerebellum, CER; basal ganglia, BG; cortex, COR), 551 normal (248 CER, 174 BG, 129 COR) and 387 pathological brain SPECTs using N-isopropyl p-I-123-iodoamphetamine (123I-IMP) were included. For the latter scans, cerebral ischemic disease comprised 291 uni- (66 CER, 116 BG, 109 COR) and 96 bilateral defect patterns (44 BG, 52 COR). Our model was trained using a three-compartment anatomical input (dataset ‘A’; including CER, BG, and COR), while for dataset ‘B’, only one anatomical region (COR) was included. Quantitative analyses provided mean counts (MC) and left/right (LR) hemisphere ratios, which were then compared to quantification from real images. For MC, ‘B’ was significantly different for normal and bilateral defect patterns (P < 0.0001, respectively), but not for unilateral ischemia (P = 0.77). Comparable results were recorded for LR, as normal and ischemia scans were significantly different relative to images acquired from real patients (P ≤ 0.01, respectively). Images provided by ‘A’, however, revealed comparable quantitative results when compared to real images, including normal (P = 0.8) and pathological scans (unilateral, P = 0.99; bilateral, P = 0.68) for MC. For LR, only uni- (P = 0.03), but not normal or bilateral defect scans (P ≥ 0.08) reached significance relative to images of real patients. With a minimum of only three anatomical compartments serving as stimuli, created cerebral SPECTs are indistinguishable to images from real patients. The applied FastGAN algorithm may allow to provide sufficient scan numbers in various clinical scenarios, e.g., for “data-hungry” deep learning technologies or in the context of orphan diseases.
Highlights
• Brain connectivity states identified by cofluctuation strength.
• CMEP as new method to robustly predict human traits from brain imaging data.
• Network-identifying connectivity ‘events’ are not predictive of cognitive ability.
• Sixteen temporally independent fMRI time frames allow for significant prediction.
• Neuroimaging-based assessment of cognitive ability requires sufficient scan lengths.
Abstract
Human functional brain connectivity can be temporally decomposed into states of high and low cofluctuation, defined as coactivation of brain regions over time. Rare states of particularly high cofluctuation have been shown to reflect fundamentals of intrinsic functional network architecture and to be highly subject-specific. However, it is unclear whether such network-defining states also contribute to individual variations in cognitive abilities – which strongly rely on the interactions among distributed brain regions. By introducing CMEP, a new eigenvector-based prediction framework, we show that as few as 16 temporally separated time frames (< 1.5% of 10 min resting-state fMRI) can significantly predict individual differences in intelligence (N = 263, p < .001). Against previous expectations, individual's network-defining time frames of particularly high cofluctuation do not predict intelligence. Multiple functional brain networks contribute to the prediction, and all results replicate in an independent sample (N = 831). Our results suggest that although fundamentals of person-specific functional connectomes can be derived from few time frames of highest connectivity, temporally distributed information is necessary to extract information about cognitive abilities. This information is not restricted to specific connectivity states, like network-defining high-cofluctuation states, but rather reflected across the entire length of the brain connectivity time series.
In the past decades, various Earth observation-based time series products have emerged, which have enabled studies and analysis of global change processes. Besides their contribution to understanding past processes, time series datasets hold enormous potential for predictive modeling and thereby meet the demands of decision makers on future scenarios. In order to further exploit these data, a novel pixel-based approach has been introduced, which is the spatio-temporal matrix (STM). The approach integrates the historical characteristics of a specific land cover at a high temporal frequency in order to interpret the spatial and temporal information for the neighborhood of a given target pixel. The provided information can be exploited with common predictive models and algorithms. In this study, this approach was utilized and evaluated for the prediction of future urban/built-settlement growth. Random forest and multi-layer perceptron were employed for the prediction. The tests have been carried out with training strategies based on a one-year and a ten-year time span for the urban agglomerations of Surat (India), Ho-Chi-Minh City (Vietnam), and Abidjan (Ivory Coast). The slope, land use, exclusion, urban, transportation, hillshade (SLEUTH) model was selected as a baseline indicator for the performance evaluation. The statistical results from the receiver operating characteristic curve (ROC) demonstrate a good ability of the STM to facilitate the prediction of future settlement growth and its transferability to different cities, with area under the curve (AUC) values greater than 0.85. Compared with SLEUTH, the STM-based model achieved higher AUC in all of the test cases, while being independent of the additional datasets for the restricted and the preferential development areas.
Associations between periodontitis and COPD: An artificial intelligence-based analysis of NHANES III
(2022)
A number of cross-sectional epidemiological studies suggest that poor oral health is associated with respiratory diseases. However, the number of cases within the studies was limited, and the studies had different measurement conditions. By analyzing data from the National Health and Nutrition Examination Survey III (NHANES III), this study aimed to investigate possible associations between chronic obstructive pulmonary disease (COPD) and periodontitis in the general population. COPD was diagnosed in cases where FEV (1)/FVC ratio was below 70% (non-COPD versus COPD; binary classification task). We used unsupervised learning utilizing k-means clustering to identify clusters in the data. COPD classes were predicted with logistic regression, a random forest classifier, a stochastic gradient descent (SGD) classifier, k-nearest neighbors, a decision tree classifier, Gaussian naive Bayes (GaussianNB), support vector machines (SVM), a custom-made convolutional neural network (CNN), a multilayer perceptron artificial neural network (MLP), and a radial basis function neural network (RBNN) in Python. We calculated the accuracy of the prediction and the area under the curve (AUC). The most important predictors were determined using feature importance analysis. Results: Overall, 15,868 participants and 19 feature variables were included. Based on k-means clustering, the data were separated into two clusters that identified two risk characteristic groups of patients. The algorithms reached AUCs between 0.608 (DTC) and 0.953% (CNN) for the classification of COPD classes. Feature importance analysis of deep learning algorithms indicated that age and mean attachment loss were the most important features in predicting COPD. Conclusions: Data analysis of a large population showed that machine learning and deep learning algorithms could predict COPD cases based on demographics and oral health feature variables. This study indicates that periodontitis might be an important predictor of COPD. Further prospective studies examining the association between periodontitis and COPD are warranted to validate the present results.
Background: Oro-antral communication (OAC) is a common complication following the extraction of upper molar teeth. The Archer and the Root Sinus (RS) systems can be used to classify impacted teeth in panoramic radiographs. The Archer classes B-D and the Root Sinus classes III, IV have been associated with an increased risk of OAC following tooth extraction in the upper molar region. In our previous study, we found that panoramic radiographs are not reliable for predicting OAC. This study aimed to (1) determine the feasibility of automating the classification (Archer/RS classes) of impacted teeth from panoramic radiographs, (2) determine the distribution of OAC stratified by classification system classes for the purposes of decision tree construction, and (3) determine the feasibility of automating the prediction of OAC utilizing the mentioned classification systems. Methods: We utilized multiple supervised pre-trained machine learning models (VGG16, ResNet50, Inceptionv3, EfficientNet, MobileNetV2), one custom-made convolutional neural network (CNN) model, and a Bag of Visual Words (BoVW) technique to evaluate the performance to predict the clinical classification systems RS and Archer from panoramic radiographs (Aim 1). We then used Chi-square Automatic Interaction Detectors (CHAID) to determine the distribution of OAC stratified by the Archer/RS classes to introduce a decision tree for simple use in clinics (Aim 2). Lastly, we tested the ability of a multilayer perceptron artificial neural network (MLP) and a radial basis function neural network (RBNN) to predict OAC based on the high-risk classes RS III, IV, and Archer B-D (Aim 3). Results: We achieved accuracies of up to 0.771 for EfficientNet and MobileNetV2 when examining the Archer classification. For the AUC, we obtained values of up to 0.902 for our custom-made CNN. In comparison, the detection of the RS classification achieved accuracies of up to 0.792 for the BoVW and an AUC of up to 0.716 for our custom-made CNN. Overall, the Archer classification was detected more reliably than the RS classification when considering all algorithms. CHAID predicted 77.4% correctness for the Archer classification and 81.4% for the RS classification. MLP (AUC: 0.590) and RBNN (AUC: 0.590) for the Archer classification as well as MLP 0.638) and RBNN (0.630) for the RS classification did not show sufficient predictive capability for OAC. Conclusions: The results reveal that impacted teeth can be classified using panoramic radiographs (best AUC: 0.902), and the classification systems can be stratified according to their relationship to OAC (81.4% correct for RS classification). However, the Archer and RS classes did not achieve satisfactory AUCs for predicting OAC (best AUC: 0.638). Additional research is needed to validate the results externally and to develop a reliable risk stratification tool based on the present findings.
Background
Medical resource management can be improved by assessing the likelihood of prolonged length of stay (LOS) for head and neck cancer surgery patients. The objective of this study was to develop predictive models that could be used to determine whether a patient's LOS after cancer surgery falls within the normal range of the cohort.
Methods
We conducted a retrospective analysis of a dataset consisting of 300 consecutive patients who underwent head and neck cancer surgery between 2017 and 2022 at a single university medical center. Prolonged LOS was defined as LOS exceeding the 75th percentile of the cohort. Feature importance analysis was performed to evaluate the most important predictors for prolonged LOS. We then constructed 7 machine learning and deep learning algorithms for the prediction modeling of prolonged LOS.
Results
The algorithms reached accuracy values of 75.40 (radial basis function neural network) to 97.92 (Random Trees) for the training set and 64.90 (multilayer perceptron neural network) to 84.14 (Random Trees) for the testing set. The leading parameters predicting prolonged LOS were operation time, ischemia time, the graft used, the ASA score, the intensive care stay, and the pathological stages. The results revealed that patients who had a higher number of harvested lymph nodes (LN) had a lower probability of recurrence but also a greater LOS. However, patients with prolonged LOS were also at greater risk of recurrence, particularly when fewer (LN) were extracted. Further, LOS was more strongly correlated with the overall number of extracted lymph nodes than with the number of positive lymph nodes or the ratio of positive to overall extracted lymph nodes, indicating that particularly unnecessary lymph node extraction might be associated with prolonged LOS.
Conclusions
The results emphasize the need for a closer follow-up of patients who experience prolonged LOS. Prospective trials are warranted to validate the present results.
The identification of biomarker signatures is important for cancer diagnosis and prognosis. However, the detection of clinical reliable signatures is influenced by limited data availability, which may restrict statistical power. Moreover, methods for integration of large sample cohorts and signature identification are limited. We present a step-by-step computational protocol for functional gene expression analysis and the identification of diagnostic and prognostic signatures by combining meta-analysis with machine learning and survival analysis. The novelty of the toolbox lies in its all-in-one functionality, generic design, and modularity. It is exemplified for lung cancer, including a comprehensive evaluation using different validation strategies. However, the protocol is not restricted to specific disease types and can therefore be used by a broad community. The accompanying R package vignette runs in ~1 h and describes the workflow in detail for use by researchers with limited bioinformatics training.
Objectives
Embedded in the Collaborative Research Center “Fear, Anxiety, Anxiety Disorders” (CRC‐TRR58), this bicentric clinical study aims at identifying biobehavioral markers of treatment (non‐)response by applying machine learning methodology with an external cross‐validation protocol. We hypothesize that a priori prediction of treatment (non‐)response is possible in a second, independent sample based on multimodal markers.
Methods
One‐session virtual reality exposure treatment (VRET) with patients with spider phobia was conducted on two sites. Clinical, neuroimaging, and genetic data were assessed at baseline, post‐treatment and after 6 months. The primary and secondary outcomes defining treatment response are as follows: 30% reduction regarding the individual score in the Spider Phobia Questionnaire and 50% reduction regarding the individual distance in the behavioral avoidance test.
Results
N = 204 patients have been included (n = 100 in Würzburg, n = 104 in Münster). Sample characteristics for both sites are comparable.
Discussion
This study will offer cross‐validated theranostic markers for predicting the individual success of exposure‐based therapy. Findings will support clinical decision‐making on personalized therapy, bridge the gap between basic and clinical research, and bring stratified therapy into reach. The study is registered at ClinicalTrials.gov (ID: NCT03208400).