Shamsudeen, Mohamed Arsath and Ahmad, Arqam Mibsaam and Kazi, Faaiza and Kazi, Syed Faazil and Khanday, Ayesha Zaffer and Arif, Shifan (2025) Evaluating Diagnostic Performance of Laypersons, Physicians, and AI-Augmented Physicians Across Clinical Complexity Levels. International Journal of Innovative Science and Research Technology, 10 (7): 25jul620. pp. 1048-1056. ISSN 2456-2165
Background Large language models (LLMs) like ChatGPT are rapidly entering clinical contexts. While these models can generate fluent, guideline-aligned responses and perform well on exams, linguistic fluency does not equal clinical competence. Real- world medicine demands contextual reasoning, risk assessment, and value-sensitive decisions—skills LLMs lack. The growing public access to LLMs raises safety concerns, particularly when untrained users interpret AI outputs as medical advice. Objective This study evaluated whether AI’s clinical value depends on the expertise of its user. We compared three groups: laypersons using ChatGPT, physicians acting independently, and physicians using ChatGPT for decision support. Methods In a simulation-based study, 150 participants (50 per group) assessed 15 clinical cases of varying complexity. For each case, participants provided a diagnosis, a next step, and a brief justification. Responses were scored by blinded physicians using standardized rubrics. Analyses included ANOVA, effect size estimation, and content review of reasoning quality. Results Diagnostic accuracy was highest among physicians using ChatGPT (94.4%), followed by physicians alone (88.0%) and laypersons with ChatGPT (60.7%). Management quality mirrored this pattern. AI-assisted physicians submitted more comprehensive plans and took more time, suggesting deeper engagement. Laypersons often reproduced AI outputs uncritically, lacking contextual understanding and raising safety risks. Conclusion AI does not equalize clinical skill—it magnifies it. When used by trained professionals, ChatGPT enhances diagnostic accuracy and decision quality. In untrained hands, it can lead to error and overconfidence. Integrating LLMs into healthcare demands thoughtful oversight, clinician training, and safeguards to prevent misuse. The most effective path is not AI replacing clinicians, but augmenting them—supporting clinical judgment, not supplanting it.
Altmetric Metrics
Dimensions Matrics
Downloads
Downloads per month over past year
![]() |