MedReadr: Development and Evaluation of an In-Browser, Rule-Based Natural Language Processing Algorithm to Estimate the Reliability of Consumer Health Articles.

Overview

abstract

INTRODUCTION: The internet is a major source of medical information for patients, yet the quality of online health content remains highly variable. Existing assessment tools are often labor-intensive, invalidated, or limited in scope. We developed and validated MedReadr, an in-browser, rule-based natural language processing (NLP) algorithm that automatically estimates the reliability of consumer health articles for patients and providers. METHODS: Thirty-five consumer medical articles were independently assessed by two reviewers using validated manual scoring systems (QUEST and Sandvik). Interrater reliability was evaluated with Cohen's κ, and metrics with κ > 0.6 were selected for model fitting. MedReadr extracted key features from article text and metadata using predefined NLP rules. A multivariable linear regression model was trained to predict manual reliability scores, with internal validation performed on an independent set of 20 articles. RESULTS: High interrater reliability was achieved across all QUEST and most Sandvik domains (Cohen's κ > 0.6). The MedReadr model demonstrated strong performance, achieving R ² = 0.90 and RMSE = 0.05 on the development set and R ² = 0.83 and RMSE = 0.07 on the validation set. All model coefficients were statistically significant (p < 0.05). Key predictive features included currency and reference scores, sentiment polarity, engagement content, and the frequency of provider contact, intervention endorsement, intervention mechanism, and intervention uncertainty phrases. CONCLUSION: MedReadr demonstrates that structural reliability scoring of online health articles can be automated using a transparent, rule-based NLP approach. Applied to English-language articles from mainstream search results on common medical conditions, the tool showed strong agreement with validated manual scoring systems. However, it has only been validated on a narrow scope of content and is not designed to analyze search results for specific questions or detect misinformation. Future research should assess its performance across a broader range of web content and evaluate whether its integration improves patient comprehension, digital health literacy, and clinician-patient communication.