Agreement Between Nail Psoriasis Severity Index Scores by a Convolutional Neural Network and Dermatologists: A Retrospective Study at an Academic New York City Institution.
Academic Article
Overview
abstract
BACKGROUND: Nail psoriasis (NP) affects up to 90% and 86% of patients with cutaneous psoriasis and psoriatic arthritis, respectively, with a significant impact on quality-of-life. The Nail Psoriasis Severity Index (NAPSI) is infrequently used in clinical practice owing to its labor-intensive nature and variable interobserver reliability. OBJECTIVE: The objective of this study was to assess performance and inter-reader agreement between artificial intelligence (AI)-determined NAPSI scores and dermatologist-assigned scores. METHODS: This cross-sectional study used clinical images of psoriatic fingernails captured retrospectively at a specialized nail clinic in New York City. A convolutional neural network (CNN) model was trained and utilized for NAPSI classification of psoriatic fingernail clinical images, with seven dermatologist nail experts scoring identical images. The primary outcome was the interclass correlation coefficient (ICC), using a one-way analysis of variance (ANOVA) fixed effects model for the single-rater absolute agreement, between the average NAPSI score determined by the dermatologists and the AI. RESULTS: In total, 240 images of psoriatic fingernails were included. The ICC for overall NAPSI, matrix (NAPSIm), and bed (NAPSIb) scores among the dermatologists were 0.43 (95% confidence interval [CI] 0.33-0.55), 0.56 (95% CI 0.46-0.67), and 0.53 (95% CI 0.43-0.65), respectively. Comparing the AI algorithm-assigned NAPSI, NAPSIm, and NAPSIb scores with the average dermatologist-assigned scores, ICCs were 0.81 (95% CI 0.74-0.86), 0.75 (95% CI 0.65-0.82), and 0.81 (95% CI 0.74-0.86), respectively. CONCLUSIONS: We found an excellent correlation between AI-derived NAPSI scores and dermatologist-assigned scores, underscoring the potential of CNNs to improve accuracy and reliability in NAPSI scoring. The limitations of this study include the small sample size, undetermined CNN diagnostic accuracy, incomplete data, and potential racial/ethnic minority group underrepresentation.