TY - JOUR A1 - Borchers, Svenja A1 - Müller, Laura A1 - Synofzik, Matthis A1 - Himmelbach, Marc T1 - Guidelines and quality measures for the diagnosis of optic ataxia JF - Frontiers in Human Neuroscience N2 - Since the first description of a systematic mis-reaching by Balint in 1909, a reasonable number of patients showing a similar phenomenology, later termed optic ataxia (OA), has been described. However, there is surprising inconsistency regarding the behavioral measures that are used to detect OA in experimental and clinical reports, if the respective measures are reported at all. A typical screening method that was presumably used by most researchers and clinicians, reaching for a target object in the peripheral visual space, has never been evaluated. We developed a set of instructions and evaluation criteria for the scoring of a semi-standardized version of this reaching task. We tested 36 healthy participants, a group of 52 acute and chronic stroke patients, and 24 patients suffering from cerebellar ataxia. We found a high interrater reliability and a moderate test-retest reliability comparable to other clinical instruments in the stroke sample. The calculation of cut-off thresholds based on healthy control and cerebellar patient data showed an unexpected high number of false positives in these samples due to individual outliers that made a considerable number of errors in peripheral reaching. This study provides first empirical data from large control and patient groups for a screening procedure that seems to be widely used but rarely explicitly reported and prepares the grounds for its use as a standard tool for the description of patients who are included in single case or group studies addressing optic ataxia similar to the use of neglect, extinction, or apraxia screening tools. KW - systems KW - deficit KW - target KW - damage KW - delay KW - posterior cortical atrophy KW - Balints-Syndrome KW - hand KW - impairments KW - reliability KW - cerebellar atrophy KW - cerebellar ataxia KW - cerebellum KW - parietal lobe KW - optic ataxia KW - beside test Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-122439 SN - 1662-5161 VL - 7 IS - 324 ER - TY - JOUR A1 - Grube, Maike Miriam A1 - Koennecke, Hans-Christian A1 - Walter, Georg A1 - Meisel, Andreas A1 - Sobesky, Jan A1 - Nolte, Christian Hans A1 - Wellwood, Ian A1 - Heuschmann, Peter Ulrich T1 - Influence of Acute Complications on Outcome 3 Months after Ischemic Stroke JF - PLOS ONE N2 - Background: Early medical complications are potentially modifiable factors influencing in-hospital outcome. We investigated the influence of acute complications on mortality and poor outcome 3 months after ischemic stroke. Methods: Data were obtained from patients admitted to one of 13 stroke units of the Berlin Stroke Registry (BSR) who participated in a 3-months-follow up between June 2010 and September 2012. We examined the influence of the cumulative number of early in-hospital complications on mortality and poor outcome (death, disability or institutionalization) 3 months after stroke using multivariable logistic regression analyses and calculated attributable fractions to determine the impact of early complications on mortality and poor outcome. Results: A total of 2349 ischemic stroke patients alive at discharge from acute care were included in the analysis. Older age, stroke severity, pre-stroke dependency and early complications were independent predictors of mortality 3 months after stroke. Poor outcome was independently associated with older age, stroke severity, pre-stroke dependency, previous stroke and early complications. More than 60% of deaths and poor outcomes were attributed to age, pre-stroke dependency and stroke severity and in-hospital complications contributed to 12.3% of deaths and 9.1% of poor outcomes 3 months after stroke. Conclusion: The majority of deaths and poor outcomes after stroke were attributed to non-modifiable factors. However, early in-hospital complications significantly affect outcome in patients who survived the acute phase after stroke, underlining the need to improve prevention and treatment of complications in hospital. KW - hospital medical complications KW - quality-of-care KW - term mortality KW - Barthel-Index KW - rankin scale KW - risk-factors KW - trial KW - reliability KW - dependency KW - predictors Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-128362 SN - 1932-6203 VL - 8 IS - 9 ER - TY - JOUR A1 - Gupta, Shishir K. A1 - Srivastava, Mugdha A1 - Osmanoglu, Oezge A1 - Dandekar, Thomas T1 - Genome-wide inference of the Camponotus floridanus protein-protein interaction network using homologous mapping and interacting domain profile pairs JF - Scientific Reports N2 - Apart from some model organisms, the interactome of most organisms is largely unidentified. High-throughput experimental techniques to determine protein-protein interactions (PPIs) are resource intensive and highly susceptible to noise. Computational methods of PPI determination can accelerate biological discovery by identifying the most promising interacting pairs of proteins and by assessing the reliability of identified PPIs. Here we present a first in-depth study describing a global view of the ant Camponotus floridanus interactome. Although several ant genomes have been sequenced in the last eight years, studies exploring and investigating PPIs in ants are lacking. Our study attempts to fill this gap and the presented interactome will also serve as a template for determining PPIs in other ants in future. Our C. floridanus interactome covers 51,866 non-redundant PPIs among 6,274 proteins, including 20,544 interactions supported by domain-domain interactions (DDIs), 13,640 interactions supported by DDIs and subcellular localization, and 10,834 high confidence interactions mediated by 3,289 proteins. These interactions involve and cover 30.6% of the entire C. floridanus proteome. KW - interaction map KW - drosophila KW - identification KW - evolutionary KW - reliability KW - annotation KW - database KW - target KW - cycle Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-229406 VL - 10 IS - 1 ER - TY - JOUR A1 - Huflage, Henner A1 - Fieber, Tabea A1 - Färber, Christian A1 - Knarr, Jonas A1 - Veldhoen, Simon A1 - Jordan, Martin C. A1 - Gilbert, Fabian A1 - Bley, Thorsten Alexander A1 - Meffert, Rainer H. A1 - Grunz, Jan-Peter A1 - Schmalzl, Jonas T1 - Interobserver reliability of scapula fracture classifications in intra- and extra-articular injury patterns JF - BMC Musculoskeletal Disorders N2 - Background Morphology and glenoid involvement determine the necessity of surgical management in scapula fractures. While being present in only a small share of patients with shoulder trauma, numerous classification systems have been in use over the years for categorization of scapula fractures. The purpose of this study was to evaluate the established AO/OTA classification in comparison to the classification system of Euler and Rüedi (ER) with regard to interobserver reliability and confidence in clinical practice. Methods Based on CT imaging, 149 patients with scapula fractures were retrospectively categorized by two trauma surgeons and two radiologists using the classification systems of ER and AO/OTA. To measure the interrater reliability, Fleiss kappa (κ) was calculated independently for both fracture classifications. Rater confidence was stated subjectively on a five-point scale and compared with Wilcoxon signed rank tests. Additionally, we computed the intraclass correlation coefficient (ICC) based on absolute agreement in a two-way random effects model to assess the diagnostic confidence agreement between observers. Results In scapula fractures involving the glenoid fossa, interrater reliability was substantial (κ = 0.722; 95% confidence interval [CI] 0.676–0.769) for the AO/OTA classification in contrast to moderate agreement (κ = 0.579; 95% CI 0.525–0.634) for the ER classification system. Diagnostic confidence for intra-articular fracture patterns was superior using the AO/OTA classification compared to ER (p < 0.001) with higher confidence agreement (ICC: 0.882 versus 0.831). For extra-articular fractures, ER (κ = 0.817; 95% CI 0.771–0.863) provided better interrater reliability compared to AO/OTA (κ = 0.734; 95% CI 0.692–0.776) with higher diagnostic confidence (p < 0.001) and superior agreement between confidence ratings (ICC: 0.881 versus 0.912). Conclusions The AO/OTA classification is most suitable to categorize intra-articular scapula fractures with glenoid involvement, whereas the classification system of Euler and Rüedi appears to be superior in extra-articular injury patterns with fractures involving only the scapula body, spine, acromion and coracoid process. KW - confidence KW - scapula KW - glenoid KW - fracture KW - classification KW - reliability Y1 - 2022 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-299795 VL - 23 IS - 1 ER - TY - JOUR A1 - Manchia, Mirko A1 - Adli, Mazda A1 - Akula, Nirmala A1 - Arda, Raffaella A1 - Aubry, Jean-Michel A1 - Backlund, Lena A1 - Banzato, Claudio E. M. A1 - Baune, Bernhard T. A1 - Bellivier, Frank A1 - Bengesser, Susanne A1 - Biernacka, Joanna M. A1 - Brichant-Petitjean, Clara A1 - Bui, Elise A1 - Calkin, Cynthia V. A1 - Cheng, Andrew Tai Ann A1 - Chillotti, Caterina A1 - Cichon, Sven A1 - Clark, Scott A1 - Czerski, Piotr M. A1 - Dantas, Clarissa A1 - Del Zompo, Maria A1 - DePaulo, J. Raymond A1 - Detera-Wadleigh, Sevilla D. A1 - Etain, Bruno A1 - Falkai, Peter A1 - Frisén, Louise A1 - Frye, Mark A. A1 - Fullerton, Jan A1 - Gard, Sébastien A1 - Garnham, Julie A1 - Goes, Fernando S. A1 - Grof, Paul A1 - Gruber, Oliver A1 - Hashimoto, Ryota A1 - Hauser, Joanna A1 - Heilbronner, Urs A1 - Hoban, Rebecca A1 - Hou, Liping A1 - Jamain, Stéphane A1 - Kahn, Jean-Pierre A1 - Kassem, Layla A1 - Kato, Tadafumi A1 - Kelsoe, John R. A1 - Kittel-Schneider, Sarah A1 - Kliwicki, Sebastian A1 - Kuo, Po-Hsiu A1 - Kusumi, Ichiro A1 - Laje, Gonzalo A1 - Lavebratt, Catharina A1 - Leboyer, Marion A1 - Leckband, Susan G. A1 - López Jaramillo, Carlos A. A1 - Maj, Mario A1 - Malafosse, Alain A1 - Martinsson, Lina A1 - Masui, Takuya A1 - Mitchell, Philip B. A1 - Mondimore, Frank A1 - Monteleone, Palmiero A1 - Nallet, Audrey A1 - Neuner, Maria A1 - Novák, Tomás A1 - O'Donovan, Claire A1 - Ösby, Urban A1 - Ozaki, Norio A1 - Perlis, Roy H. A1 - Pfennig, Andrea A1 - Potash, James B. A1 - Reich-Erkelenz, Daniela A1 - Reif, Andreas A1 - Reininghaus, Eva A1 - Richardson, Sara A1 - Rouleau, Guy A. A1 - Rybakowski, Janusz K. A1 - Schalling, Martin A1 - Schofield, Peter R. A1 - Schubert, Oliver K. A1 - Schweizer, Barbara A1 - Seemüller, Florian A1 - Grigoroiu-Serbanescu, Maria A1 - Severino, Giovanni A1 - Seymour, Lisa R. A1 - Slaney, Claire A1 - Smoller, Jordan W. A1 - Squassina, Alessio A1 - Stamm, Thomas A1 - Steele, Jo A1 - Stopkova, Pavla A1 - Tighe, Sarah K. A1 - Tortorella, Alfonso A1 - Turecki, Gustavo A1 - Wray, Naomi R. A1 - Wright, Adam A1 - Zandi, Peter P. A1 - Zilles, David A1 - Bauer, Michael A1 - Rietschel, Marcella A1 - McMahon, Francis J. A1 - Schulze, Thomas G. A1 - Alda, Martin T1 - Assessment of Response to Lithium Maintenance Treatment in Bipolar Disorder: A Consortium on Lithium Genetics (ConLiGen) Report JF - PLoS ONE N2 - Objective: The assessment of response to lithium maintenance treatment in bipolar disorder (BD) is complicated by variable length of treatment, unpredictable clinical course, and often inconsistent compliance. Prospective and retrospective methods of assessment of lithium response have been proposed in the literature. In this study we report the key phenotypic measures of the "Retrospective Criteria of Long-Term Treatment Response in Research Subjects with Bipolar Disorder" scale currently used in the Consortium on Lithium Genetics (ConLiGen) study. Materials and Methods: Twenty-nine ConLiGen sites took part in a two-stage case-vignette rating procedure to examine inter-rater agreement [Kappa (\(\kappa\))] and reliability [intra-class correlation coefficient (ICC)] of lithium response. Annotated first-round vignettes and rating guidelines were circulated to expert research clinicians for training purposes between the two stages. Further, we analyzed the distributional properties of the treatment response scores available for 1,308 patients using mixture modeling. Results: Substantial and moderate agreement was shown across sites in the first and second sets of vignettes (\(\kappa\) = 0.66 and \(\kappa\) = 0.54, respectively), without significant improvement from training. However, definition of response using the A score as a quantitative trait and selecting cases with B criteria of 4 or less showed an improvement between the two stages (\(ICC_1 = 0.71\) and \(ICC_2 = 0.75\), respectively). Mixture modeling of score distribution indicated three subpopulations (full responders, partial responders, non responders). Conclusions: We identified two definitions of lithium response, one dichotomous and the other continuous, with moderate to substantial inter-rater agreement and reliability. Accurate phenotypic measurement of lithium response is crucial for the ongoing ConLiGen pharmacogenomic study. KW - age KW - observer agreement KW - prophylactic lithium KW - mapping susceptibility genes KW - mood disorders KW - onset KW - association KW - reliability KW - morality KW - illness Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-130938 VL - 8 IS - 6 ER - TY - JOUR A1 - Mayr, Stefan A1 - Klein, Igor A1 - Rutzinger, Martin A1 - Kuenzer, Claudia T1 - Determining temporal uncertainty of a global inland surface water time series JF - Remote Sensing N2 - Earth observation time series are well suited to monitor global surface dynamics. However, data products that are aimed at assessing large-area dynamics with a high temporal resolution often face various error sources (e.g., retrieval errors, sampling errors) in their acquisition chain. Addressing uncertainties in a spatiotemporal consistent manner is challenging, as extensive high-quality validation data is typically scarce. Here we propose a new method that utilizes time series inherent information to assess the temporal interpolation uncertainty of time series datasets. For this, we utilized data from the DLR-DFD Global WaterPack (GWP), which provides daily information on global inland surface water. As the time series is primarily based on optical MODIS (Moderate Resolution Imaging Spectroradiometer) images, the requirement of data gap interpolation due to clouds constitutes the main uncertainty source of the product. With a focus on different temporal and spatial characteristics of surface water dynamics, seven auxiliary layers were derived. Each layer provides probability and reliability estimates regarding water observations at pixel-level. This enables the quantification of uncertainty corresponding to the full spatiotemporal range of the product. Furthermore, the ability of temporal layers to approximate unknown pixel states was evaluated for stratified artificial gaps, which were introduced into the original time series of four climatologic diverse test regions. Results show that uncertainty is quantified accurately (>90%), consequently enhancing the product's quality with respect to its use for modeling and the geoscientific community. KW - Earth observation KW - interpolation KW - MODIS KW - optical remote sensing KW - probability KW - reliability KW - validation KW - variability Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-245234 SN - 2072-4292 VL - 13 IS - 17 ER - TY - JOUR A1 - Meule, Adrian A1 - Hermann, Tina A1 - Kübler, Andrea T1 - A short version of the Food Cravings Questionnaire—Trait: the FCQ-T-reduced N2 - One of the most often used instruments for the assessment of food cravings is the Food Cravings Questionnaire (FCQ), which consists of a trait (FCQ-T; 39 items) and state (FCQ-S; 15 items) version. Scores on the FCQ-T have been found to be positively associated with eating pathology, body mass index (BMI), low dieting success and increases in state food craving during cognitive tasks involving appealing food stimuli. The current studies evaluated reliability and validity of a reduced version of the FCQ-T consisting of 15 items only (FCQ-T-r). Study 1 was a questionnaire study conducted online among students (N = 323). In study 2, female students (N = 70) performed a working memory task involving food and neutral pictures. Study 1 indicated a one-factorial structure and high internal consistency (α = 0.94) of the FCQ-T-r. Scores of the FCQ-T-r were positively correlated with BMI and negatively correlated with dieting success. In study 2, participants reported higher state food craving after the task compared to before. This increase was positively correlated with the FCQ-T-r. Hours since the last meal positively predicted food craving before the task when controlling for FCQ-T-r scores and the interaction of both variables. Contrarily, FCQ-T-r scores positively predicted food craving after the task when controlling for food deprivation and the interaction term. Thus, trait food craving was specifically associated with state food craving triggered by palatable food-cues, but not with state food craving related to plain hunger. Results indicate high reliability of the FCQ-T-r. Replicating studies that used the long version, small-to-medium correlations with BMI and dieting success could be found. Finally, scores on the FCQ-T-r predicted cue-elicited food craving, providing further support of its validity. The FCQ-T-r constitutes a succinct, valid and reliable self-report measure to efficiently assess experiences of food craving as a trait. KW - food carving KW - Food Carvings Questionnaire KW - psychometric properties KW - validity KW - reliability KW - body mass index KW - dieting success KW - food-cues Y1 - 2014 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-112748 ER - TY - JOUR A1 - Müller, Christina A1 - Domokos, Bruno A1 - Amersbach, Tanja A1 - Hausmayer, Eva-Maria A1 - Roßmann, Christin A1 - Wallmann-Sperlich, Birgit A1 - Bucksch, Jens T1 - Development and reliability testing of an audit toolbox for the assessment of the physical activity friendliness of urban and rural environments in Germany JF - Frontiers in Public Health N2 - Background: According to socio-ecological theories, physical activity behaviors are linked to the physical and social neighborhood environment. Reliable and contextually adapted instruments are needed to assess environmental characteristics related to physical activity. This work aims to develop an audit toolbox adapted to the German context, to urban and rural settings, for different population groups, and different types of physical activity; and to evaluate its inter-rater reliability. Methods: We conducted a systematic literature search to collect existing audit tools and to identify the latest evidence of environmental factors influencing physical activity in general, as well as in German populations. The results guided the construction of a category system for the toolbox. Items were assigned to the categories based on their relevance to physical activity and to the German context as well as their comprehensibility. We piloted the toolbox in different urban and rural areas (100 street segments, 15 parks, and 21 playgrounds) and calculated inter-rater reliability by Cohen's Kappa. Results: The audit toolbox comprises a basic streetscape audit with seven categories (land use and destinations, traffic safety, pedestrian infrastructure, cycling infrastructure, attractiveness, social environment, and subjective assessment), as well as supplementary tools for children and adolescents, seniors and people with impaired mobility, parks and public open spaces, playgrounds, and rural areas. 76 % of all included items had moderate, substantial, or almost perfect inter-rater reliability (κ > 0.4). Conclusions: The audit toolbox is an innovative and reliable instrument for the assessment of the physical activity friendliness of urban and rural environments in Germany. KW - built environment KW - physical activity KW - reliability KW - rural KW - urban KW - walkability Y1 - 2023 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-326116 SN - 2296-2565 VL - 11 ER - TY - JOUR A1 - Smith, Craig J. A1 - Bray, Benjamin D. A1 - Hoffman, Alex A1 - Meisel, Andreas A1 - Heuschmann, Peter U. A1 - Wolfe, Charles D. A. A1 - Tyrrell, Pippa J. A1 - Rudd, Anthony G. T1 - Can a novel clinical risk score improve pneumonia prediction in acute stroke care? A UK multicenter cohort study JF - Journal of the American Heart Association N2 - Background Pneumonia frequently complicates stroke and has amajor impact on outcome. We derived and internally validated a simple clinical risk score for predicting stroke-associated pneumonia (SAP), and compared the performance with an existing score (A\(^{2}\)DS\(^{2}\)). Methods and Results We extracted data for patients with ischemic stroke or intracerebral hemorrhage from the Sentinel Stroke National Audit Programme multicenter UK registry. The data were randomly allocated into derivation (n=11 551) and validation (n=11 648) samples. A multivariable logistic regression model was fitted to the derivation data to predict SAP in the first 7 days of admission. The characteristics of the score were evaluated using receiver operating characteristics (discrimination) and by plotting predicted versus observed SAP frequency in deciles of risk (calibration). Prevalence of SAP was 6.7% overall. The final 22-point score (ISAN: prestroke Independence [modified Rankin scale], Sex, Age, National Institutes of Health Stroke Scale) exhibited good discrimination in the ischemic stroke derivation (C-statistic 0.79; 95% CI 0.77 to 0.81) and validation (C-statistic 0.78; 95% CI 0.76 to 0.80) samples. It was well calibrated in ischemic stroke and was further classified into meaningful risk groups (low 0 to 5, medium6 to 10, high 11 to 14, and very high >= 15) associated with SAP frequencies of 1.6%, 4.9%, 12.6%, and 26.4%, respectively, in the validation sample. Discrimination for both scores was similar, although they performed less well in the intracerebral hemorrhage patients with an apparent ceiling effect. Conclusions The ISAN score is a simple tool for predicting SAP in clinical practice. External validation is required in ischemic and hemorrhagic stroke cohorts. KW - acute ischemic stroke KW - medical complications KW - infection KW - diagnosis KW - stroke-associated pneumonia KW - clinical risk score KW - pneumonia KW - stroke, acute KW - metaanalysis KW - reliability KW - dysphagia KW - scale KW - mortality KW - intracerebral hemorrhage Y1 - 2015 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-144602 VL - 4 IS - 1 ER - TY - JOUR A1 - Strahl, André A1 - Gerlich, Christian A1 - Alpers, Georg W. A1 - Gehrke, Jörg A1 - Müller-Garnn, Annette A1 - Vogel, Heiner T1 - An instrument for quality assurance in work capacity evaluation: development, evaluation, and inter-rater reliability JF - BMC Health Services Research N2 - Background: Employees insured in pension insurance, who are incapable of working due to ill health, are entitled to a disability pension. To assess whether an individual meets the medical requirements to be considered as disabled, a work capacity evaluation is conducted. However, there are no official guidelines on how to perform an external quality assurance for this evaluation process. Furthermore, the quality of medical reports in the field of insurance medicine can vary substantially, and systematic evaluations are scarce. Reliability studies using peer review have repeatedly shown insufficient ability to distinguish between high, moderate and low quality. Considering literature recommendations, we developed an instrument to examine the quality of medical experts’reports. Methods: The peer review manual developed contains six quality domains (formal structure, clarity, transparency, completeness, medical-scientific principles, and efficiency) comprising 22 items. In addition, a superordinate criterion (survey confirmability) rank the overall quality and usefulness of a report. This criterion evaluates problems of innerlogic and reasoning. Development of the manual was assisted by experienced physicians in a pre-test. We examined the observable variance in peer judgements and reliability as the most important outcome criteria. To evaluate inter-rater reliability, 20 anonymous experts’ reports detailing the work capacity evaluation were reviewed by 19 trained raters (peers). Percentage agreement and Kendall’s W, a reliability measure of concordance between two or more peers, were calculated. A total of 325 reviews were conducted. Results: Agreement of peer judgements with respect to the superordinate criterion ranged from 29.2 to 87.5%. Kendall’s W for the quality domain items varied greatly, ranging from 0.09 to 0.88. With respect to the superordinate criterion, Kendall’s W was 0.39, which indicates fair agreement. The results of the percentage agreement revealed systemic peer preferences for certain deficit scale categories. Conclusion: The superordinate criterion was not sufficiently reliable. However, in comparison to other reliability studies, this criterion showed an equivalent reliability value. This report aims to encourage further efforts to improve evaluation instruments. To reduce disagreement between peer judgments, we propose the revision of the peer review instrumentand the development and implementation of a standardized rater training to improve reliability. KW - work capacity evaluation KW - insurance medicine KW - quality assurance KW - peer review KW - reliability Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-200289 VL - 19 ER - TY - JOUR A1 - Waltmann, Maria A1 - Schlagenhauf, Florian A1 - Deserno, Lorenz T1 - Sufficient reliability of the behavioral and computational readouts of a probabilistic reversal learning task JF - Behavior Research Methods N2 - Task-based measures that capture neurocognitive processes can help bridge the gap between brain and behavior. To transfer tasks to clinical application, reliability is a crucial benchmark because it imposes an upper bound to potential correlations with other variables (e.g., symptom or brain data). However, the reliability of many task readouts is low. In this study, we scrutinized the retest reliability of a probabilistic reversal learning task (PRLT) that is frequently used to characterize cognitive flexibility in psychiatric populations. We analyzed data from N = 40 healthy subjects, who completed the PRLT twice. We focused on how individual metrics are derived, i.e., whether data were partially pooled across participants and whether priors were used to inform estimates. We compared the reliability of the resulting indices across sessions, as well as the internal consistency of a selection of indices. We found good to excellent reliability for behavioral indices as derived from mixed-effects models that included data from both sessions. The internal consistency was good to excellent. For indices derived from computational modeling, we found excellent reliability when using hierarchical estimation with empirical priors and including data from both sessions. Our results indicate that the PRLT is well equipped to measure individual differences in cognitive flexibility in reinforcement learning. However, this depends heavily on hierarchical modeling of the longitudinal data (whether sessions are modeled separately or jointly), on estimation methods, and on the combination of parameters included in computational models. We discuss implications for the applicability of PRLT indices in psychiatric research and as diagnostic tools. KW - probabilistic reversal learning KW - reliability KW - reinforcement learning KW - computational modeling KW - hierarchical modeling Y1 - 2022 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:20-opus-324246 SN - 1554-3528 VL - 54 IS - 6 ER -