Refine
Has Fulltext
- yes (45)
Is part of the Bibliography
- yes (45)
Document Type
- Journal article (44)
- Doctoral Thesis (1)
Language
- English (45) (remove)
Keywords
- machine learning (45) (remove)
Institute
- Institut für Geographie und Geologie (9)
- Institut für Informatik (8)
- Center for Computational and Theoretical Biology (5)
- Institut für Klinische Epidemiologie und Biometrie (4)
- Pathologisches Institut (4)
- Theodor-Boveri-Institut für Biowissenschaften (4)
- Betriebswirtschaftliches Institut (3)
- Klinik und Poliklinik für Mund-, Kiefer- und Plastische Gesichtschirurgie (3)
- Klinik und Poliklinik für Nuklearmedizin (3)
- Medizinische Klinik und Poliklinik I (3)
Sonstige beteiligte Institutionen
An approach to aerodynamically optimizing cycling posture and reducing drag in an Ironman (IM) event was elaborated. Therefore, four commonly used positions in cycling were investigated and simulated for a flow velocity of 10 m/s and yaw angles of 0–20° using OpenFoam-based Nabla Flow CFD simulation software software. A cyclist was scanned using an IPhone 12, and a special-purpose meshing software BLENDER was used. Significant differences were observed by changing and optimizing the cyclist’s posture. Aerodynamic drag coefficient (CdA) varies by more than a factor of 2, ranging from 0.214 to 0.450. Within a position, the CdA tends to increase slightly at yaw angles of 5–10° and decrease at higher yaw angles compared to a straight head wind, except for the time trial (TT) position. The results were applied to the IM Hawaii bike course (180 km), estimating a constant power output of 300 W. Including the wind distributions, two different bike split models for performance prediction were applied. Significant time saving of roughly 1 h was found. Finally, a machine learning approach to deduce 3D triangulation for specific body shapes from 2D pictures was tested.
Snow is a vital environmental parameter and dynamically responsive to climate change, particularly in mountainous regions. Snow cover can be monitored at variable spatial scales using Earth Observation (EO) data. Long-lasting remote sensing missions enable the generation of multi-decadal time series and thus the detection of long-term trends. However, there have been few attempts to use these to model future snow cover dynamics. In this study, we, therefore, explore the potential of such time series to forecast the Snow Line Elevation (SLE) in the European Alps. We generate monthly SLE time series from the entire Landsat archive (1985–2021) in 43 Alpine catchments. Positive long-term SLE change rates are detected, with the highest rates (5–8 m/y) in the Western and Central Alps. We utilize this SLE dataset to implement and evaluate seven uni-variate time series modeling and forecasting approaches. The best results were achieved by Random Forests, with a Nash–Sutcliffe efficiency (NSE) of 0.79 and a Mean Absolute Error (MAE) of 258 m, Telescope (0.76, 268 m), and seasonal ARIMA (0.75, 270 m). Since the model performance varies strongly with the input data, we developed a combined forecast based on the best-performing methods in each catchment. This approach was then used to forecast the SLE for the years 2022–2029. In the majority of the catchments, the shift of the forecast median SLE level retained the sign of the long-term trend. In cases where a deviating SLE dynamic is forecast, a discussion based on the unique properties of the catchment and past SLE dynamics is required. In the future, we expect major improvements in our SLE forecasting efforts by including external predictor variables in a multi-variate modeling approach.
Predicting hypertension subtypes with machine learning using targeted metabolites and their ratios
(2022)
Hypertension is a major global health problem with high prevalence and complex associated health risks. Primary hypertension (PHT) is most common and the reasons behind primary hypertension are largely unknown. Endocrine hypertension (EHT) is another complex form of hypertension with an estimated prevalence varying from 3 to 20% depending on the population studied. It occurs due to underlying conditions associated with hormonal excess mainly related to adrenal tumours and sub-categorised: primary aldosteronism (PA), Cushing’s syndrome (CS), pheochromocytoma or functional paraganglioma (PPGL). Endocrine hypertension is often misdiagnosed as primary hypertension, causing delays in treatment for the underlying condition, reduced quality of life, and costly antihypertensive treatment that is often ineffective. This study systematically used targeted metabolomics and high-throughput machine learning methods to predict the key biomarkers in classifying and distinguishing the various subtypes of endocrine and primary hypertension. The trained models successfully classified CS from PHT and EHT from PHT with 92% specificity on the test set. The most prominent targeted metabolites and metabolite ratios for hypertension identification for different disease comparisons were C18:1, C18:2, and Orn/Arg. Sex was identified as an important feature in CS vs. PHT classification.
In the past decades, various Earth observation-based time series products have emerged, which have enabled studies and analysis of global change processes. Besides their contribution to understanding past processes, time series datasets hold enormous potential for predictive modeling and thereby meet the demands of decision makers on future scenarios. In order to further exploit these data, a novel pixel-based approach has been introduced, which is the spatio-temporal matrix (STM). The approach integrates the historical characteristics of a specific land cover at a high temporal frequency in order to interpret the spatial and temporal information for the neighborhood of a given target pixel. The provided information can be exploited with common predictive models and algorithms. In this study, this approach was utilized and evaluated for the prediction of future urban/built-settlement growth. Random forest and multi-layer perceptron were employed for the prediction. The tests have been carried out with training strategies based on a one-year and a ten-year time span for the urban agglomerations of Surat (India), Ho-Chi-Minh City (Vietnam), and Abidjan (Ivory Coast). The slope, land use, exclusion, urban, transportation, hillshade (SLEUTH) model was selected as a baseline indicator for the performance evaluation. The statistical results from the receiver operating characteristic curve (ROC) demonstrate a good ability of the STM to facilitate the prediction of future settlement growth and its transferability to different cities, with area under the curve (AUC) values greater than 0.85. Compared with SLEUTH, the STM-based model achieved higher AUC in all of the test cases, while being independent of the additional datasets for the restricted and the preferential development areas.
In most countries, freight is predominantly transported by road cargo trucks. We present a new satellite remote sensing method for detecting moving trucks on roads using Sentinel-2 data. The method exploits a temporal sensing offset of the Sentinel-2 multispectral instrument, causing spatially and spectrally distorted signatures of moving objects. A random forest classifier was trained (overall accuracy: 84%) on visual-near-infrared-spectra of 2500 globally labelled targets. Based on the classification, the target objects were extracted using a developed recursive neighbourhood search. The speed and the heading of the objects were approximated. Detections were validated by employing 350 globally labelled target boxes (mean F\(_1\) score: 0.74). The lowest F\(_1\) score was achieved in Kenya (0.36), the highest in Poland (0.88). Furthermore, validated at 26 traffic count stations in Germany on in sum 390 dates, the truck detections correlate spatio-temporally with station figures (Pearson r-value: 0.82, RMSE: 43.7). Absolute counts were underestimated on 81% of the dates. The detection performance may differ by season and road condition. Hence, the method is only suitable for approximating the relative truck traffic abundance rather than providing accurate absolute counts. However, existing road cargo monitoring methods that rely on traffic count stations or very high resolution remote sensing data have limited global availability. The proposed moving truck detection method could fill this gap, particularly where other information on road cargo traffic are sparse by employing globally and freely available Sentinel-2 data. It is inferior to the accuracy and the temporal detail of station counts, but superior in terms of spatial coverage.
Deep convolutional generative adversarial networks (GAN) allow for creating images from existing databases. We applied a modified light-weight GAN (FastGAN) algorithm to cerebral blood flow SPECTs and aimed to evaluate whether this technology can generate created images close to real patients. Investigating three anatomical levels (cerebellum, CER; basal ganglia, BG; cortex, COR), 551 normal (248 CER, 174 BG, 129 COR) and 387 pathological brain SPECTs using N-isopropyl p-I-123-iodoamphetamine (123I-IMP) were included. For the latter scans, cerebral ischemic disease comprised 291 uni- (66 CER, 116 BG, 109 COR) and 96 bilateral defect patterns (44 BG, 52 COR). Our model was trained using a three-compartment anatomical input (dataset ‘A’; including CER, BG, and COR), while for dataset ‘B’, only one anatomical region (COR) was included. Quantitative analyses provided mean counts (MC) and left/right (LR) hemisphere ratios, which were then compared to quantification from real images. For MC, ‘B’ was significantly different for normal and bilateral defect patterns (P < 0.0001, respectively), but not for unilateral ischemia (P = 0.77). Comparable results were recorded for LR, as normal and ischemia scans were significantly different relative to images acquired from real patients (P ≤ 0.01, respectively). Images provided by ‘A’, however, revealed comparable quantitative results when compared to real images, including normal (P = 0.8) and pathological scans (unilateral, P = 0.99; bilateral, P = 0.68) for MC. For LR, only uni- (P = 0.03), but not normal or bilateral defect scans (P ≥ 0.08) reached significance relative to images of real patients. With a minimum of only three anatomical compartments serving as stimuli, created cerebral SPECTs are indistinguishable to images from real patients. The applied FastGAN algorithm may allow to provide sufficient scan numbers in various clinical scenarios, e.g., for “data-hungry” deep learning technologies or in the context of orphan diseases.
Background
Germinal center-derived B cell lymphomas are tumors of the lymphoid tissues representing one of the most heterogeneous malignancies. Here we characterize the variety of transcriptomic phenotypes of this disease based on 873 biopsy specimens collected in the German Cancer Aid MMML (Molecular Mechanisms in Malignant Lymphoma) consortium. They include diffuse large B cell lymphoma (DLBCL), follicular lymphoma (FL), Burkitt’s lymphoma, mixed FL/DLBCL lymphomas, primary mediastinal large B cell lymphoma, multiple myeloma, IRF4-rearranged large cell lymphoma, MYC-negative Burkitt-like lymphoma with chr. 11q aberration and mantle cell lymphoma.
Methods
We apply self-organizing map (SOM) machine learning to microarray-derived expression data to generate a holistic view on the transcriptome landscape of lymphomas, to describe the multidimensional nature of gene regulation and to pursue a modular view on co-expression. Expression data were complemented by pathological, genetic and clinical characteristics.
Results
We present a transcriptome map of B cell lymphomas that allows visual comparison between the SOM portraits of different lymphoma strata and individual cases. It decomposes into one dozen modules of co-expressed genes related to different functional categories, to genetic defects and to the pathogenesis of lymphomas. On a molecular level, this disease rather forms a continuum of expression states than clearly separated phenotypes. We introduced the concept of combinatorial pattern types (PATs) that stratifies the lymphomas into nine PAT groups and, on a coarser level, into five prominent cancer hallmark types with proliferation, inflammation and stroma signatures. Inflammation signatures in combination with healthy B cell and tonsil characteristics associate with better overall survival rates, while proliferation in combination with inflammation and plasma cell characteristics worsens it. A phenotypic similarity tree is presented that reveals possible progression paths along the transcriptional dimensions. Our analysis provided a novel look on the transition range between FL and DLBCL, on DLBCL with poor prognosis showing expression patterns resembling that of Burkitt’s lymphoma and particularly on ‘double-hit’ MYC and BCL2 transformed lymphomas.
Conclusions
The transcriptome map provides a tool that aggregates, refines and visualizes the data collected in the MMML study and interprets them in the light of previous knowledge to provide orientation and support in current and future studies on lymphomas and on other cancer entities.
Associations between periodontitis and COPD: An artificial intelligence-based analysis of NHANES III
(2022)
A number of cross-sectional epidemiological studies suggest that poor oral health is associated with respiratory diseases. However, the number of cases within the studies was limited, and the studies had different measurement conditions. By analyzing data from the National Health and Nutrition Examination Survey III (NHANES III), this study aimed to investigate possible associations between chronic obstructive pulmonary disease (COPD) and periodontitis in the general population. COPD was diagnosed in cases where FEV (1)/FVC ratio was below 70% (non-COPD versus COPD; binary classification task). We used unsupervised learning utilizing k-means clustering to identify clusters in the data. COPD classes were predicted with logistic regression, a random forest classifier, a stochastic gradient descent (SGD) classifier, k-nearest neighbors, a decision tree classifier, Gaussian naive Bayes (GaussianNB), support vector machines (SVM), a custom-made convolutional neural network (CNN), a multilayer perceptron artificial neural network (MLP), and a radial basis function neural network (RBNN) in Python. We calculated the accuracy of the prediction and the area under the curve (AUC). The most important predictors were determined using feature importance analysis. Results: Overall, 15,868 participants and 19 feature variables were included. Based on k-means clustering, the data were separated into two clusters that identified two risk characteristic groups of patients. The algorithms reached AUCs between 0.608 (DTC) and 0.953% (CNN) for the classification of COPD classes. Feature importance analysis of deep learning algorithms indicated that age and mean attachment loss were the most important features in predicting COPD. Conclusions: Data analysis of a large population showed that machine learning and deep learning algorithms could predict COPD cases based on demographics and oral health feature variables. This study indicates that periodontitis might be an important predictor of COPD. Further prospective studies examining the association between periodontitis and COPD are warranted to validate the present results.
Colorectal cancer (CRC) is a leading cause of cancer-related deaths worldwide. The best method to prevent CRC is with a colonoscopy. During this procedure, the gastroenterologist searches for polyps. However, there is a potential risk of polyps being missed by the gastroenterologist. Automated detection of polyps helps to assist the gastroenterologist during a colonoscopy. There are already publications examining the problem of polyp detection in the literature. Nevertheless, most of these systems are only used in the research context and are not implemented for clinical application. Therefore, we introduce the first fully open-source automated polyp-detection system scoring best on current benchmark data and implementing it ready for clinical application. To create the polyp-detection system (ENDOMIND-Advanced), we combined our own collected data from different hospitals and practices in Germany with open-source datasets to create a dataset with over 500,000 annotated images. ENDOMIND-Advanced leverages a post-processing technique based on video detection to work in real-time with a stream of images. It is integrated into a prototype ready for application in clinical interventions. We achieve better performance compared to the best system in the literature and score a F1-score of 90.24% on the open-source CVC-VideoClinicDB benchmark.
Purpose
Machine learning based on radiomics features has seen huge success in a variety of clinical applications. However, the need for standardization and reproducibility has been increasingly recognized as a necessary step for future clinical translation. We developed a novel, intuitive open-source framework to facilitate all data analysis steps of a radiomics workflow in an easy and reproducible manner and evaluated it by reproducing classification results in eight available open-source datasets from different clinical entities.
Methods
The framework performs image preprocessing, feature extraction, feature selection, modeling, and model evaluation, and can automatically choose the optimal parameters for a given task. All analysis steps can be reproduced with a web application, which offers an interactive user interface and does not require programming skills. We evaluated our method in seven different clinical applications using eight public datasets: six datasets from the recently published WORC database, and two prostate MRI datasets—Prostate MRI and Ultrasound With Pathology and Coordinates of Tracked Biopsy (Prostate-UCLA) and PROSTATEx.
Results
In the analyzed datasets, AutoRadiomics successfully created and optimized models using radiomics features. For WORC datasets, we achieved AUCs ranging from 0.56 for lung melanoma metastases detection to 0.93 for liposarcoma detection and thereby managed to replicate the previously reported results. No significant overfitting between training and test sets was observed. For the prostate cancer detection task, results were better in the PROSTATEx dataset (AUC = 0.73 for prostate and 0.72 for lesion mask) than in the Prostate-UCLA dataset (AUC 0.61 for prostate and 0.65 for lesion mask), with external validation results varying from AUC = 0.51 to AUC = 0.77.
Conclusion
AutoRadiomics is a robust tool for radiomic studies, which can be used as a comprehensive solution, one of the analysis steps, or an exploratory tool. Its wide applicability was confirmed by the results obtained in the diverse analyzed datasets. The framework, as well as code for this analysis, are publicly available under https://github.com/pwoznicki/AutoRadiomics.