WEDNESDAY, Feb. 28, 2024 (HealthDay News) -- A large language model (LLM) chatbot outperforms glaucoma and retina specialists for diagnostic accuracy, according to a study published online Feb. 22 in JAMA Ophthalmology.
Andy S. Huang, M.D., from the Icahn School of Medicine at Mount Sinai in New York City, and colleagues conducted a comparative cross-sectional study recruiting 15 participants aged 31 to 67 years, including 12 attending physicians and three senior trainees, to compare the diagnostic accuracy and comprehensiveness of responses from an LLM chatbot with those of fellowship-trained glaucoma and retina specialists. Responses were assessed via a Likert scale for glaucoma and retina questions (10 of each type) in deidentified glaucoma and retina cases (10 of each type).
The researchers found that the combined question-case mean rank for accuracy was 506.2 and 403.4 for the LLM chatbot and glaucoma specialists, respectively; the corresponding mean ranks for completeness were 528.3 and 398.7. The mean rank for accuracy was 235.3 and 216.1 for the LLM chatbot and retina specialists, respectively; the corresponding mean ranks for completeness were 258.3 and 208.7. A significant difference was seen between all pairwise comparisons -- except for specialist versus trainee -- in rating chatbot completeness in the Dunn test. Compared with their specialist counterparts, both trainees and specialists rated the chatbot's accuracy and completeness more favorably, with specialists noting a significant difference in the accuracy and completeness of the chatbot.
"These findings support the possibility that artificial intelligence tools could play a pivotal role as both diagnostic and therapeutic adjuncts," the authors write.