Can AI Reliably Diagnose Glaucoma?

Large language models (LLMs), such as ChatGPT, have skyrocketed in popularity in the last year due to their ability to utilize vast amounts of information, but could they be used to diagnose ocular disease?

Ophthalmology researchers have begun digging into the question. Earlier this year, Malik Kahook, MD, professor in the Department of Ophthalmology and the Slater Family Endowed Chair in Ophthalmology at the University of Colorado School of Medicine, joined researchers at the University of Tennessee Memphis to test the accuracy of ChatGPT compared to senior ophthalmology residents.

In the study, published in September in the journal Ophthalmology and Therapy, researchers input detailed text case reports of 11 clinical scenarios with primary and secondary glaucoma from a public database into ChatGPT and asked for a diagnosis. The same case information was presented to three senior ophthalmology residents for comparison.

The results?

ChatGPT gave a correct diagnosis in eight of the 11 cases. The three resident ophthalmologists were correct in 6, 8, and 8 cases, respectively. In cases with common glaucoma presentation, the residents and ChatGPT were able to give an accurate diagnosis, but cases considered atypical or complex were less accurate for both the LLM and residents.

The team of researchers concluded that ChatGPT’s accuracy was comparable to the residents, and “with further development, ChatGPT may have the potential to be used in clinical care settings, such as primary care offices, for triaging and in eye care clinical practices to provide objective and quick diagnoses of patients with glaucoma.”

The researchers also say the technology still has deficiencies, like needing detailed and structured information that might not always be available in real-world settings.

Here’s how Kahook explains the future of LLMs in ophthalmology and glaucoma diagnosis as the technology continues to evolve.

Let’s start with how artificial intelligence (AI) is already being utilized in ophthalmology.

AI is increasingly being used in ophthalmology practice, but there are many more examples of AI being used in the research setting at this stage.

For instance, Moorfields Eye Hospital in London collaborated with Google’s DeepMind on research to develop an AI system analyzing visual field tests and optical coherence tomography (OCT) scans. The AI system detected eye conditions, including glaucoma and macular degeneration, from scans and is slowly finding a path toward more general clinical practices. IBM’s Watson has been employed to analyze medical records, aiding in treatment decisions.

It’s important to note that AI technology is present in many of the diagnostic devices we use in clinic today, such as visual field machines and OCT devices, but these uses take the form of data analysis and presentation without the aspects that the general public identifies as AI, such as the conversational ChatGPT LLM system of which we are now familiar. Growing real world applications of AI, including LLMs, highlights the future potential in enhancing diagnostic testing and improving accuracy and efficiency of patient care interventions within ophthalmology and beyond.

In this study, you focus on how useful AI might be in diagnosing glaucoma because it adds another layer of interpretation. Why is diagnosing this disease still considered relatively subjective?

Glaucoma diagnosis variability stems from the complexity of the disease. It has diverse presentations and lacks clear symptoms in early stages. Additionally, diagnostic tools like visual field testing and imaging have limitations. Interpretation differences, patient factors, and evolving diagnostic criteria also contribute. Standardizing glaucoma assessment remains a challenge, leading to subjective and varied diagnoses.

Do you see a future where LLMs could act as an aid in diagnosing glaucoma?

Yes, I anticipate LLMs will aid glaucoma diagnosis and could be of particular benefit in areas where expert glaucoma expertise is not available. This could include low- and middle- income countries with less access to ophthalmologists and subspecialty trained glaucoma experts, primary care settings where ocular disease is not easy to diagnose, and in residency programs where trainees can benefit from augmenting their knowledge through LLM-based digital assistants.

One of the strengths of LLMs is the ability to process vast data, which could assist in early glaucoma detection and more nuanced detection of disease at all stages of glaucoma. In primary care, quick LLM assessments can triage patients efficiently, ensuring timely referrals to specialists. This approach can possibly lead to preventing vision loss, making it highly beneficial across many health care settings.

Limitations from earlier versions of ChatGPT, as is the case with version 3.5, included not being able to assess multimodal data. Why is that specific feature important to diagnosing glaucoma?

Assessing multimodal data, including imaging and visual field tests, provides comprehensive insights that allow for proper care of glaucoma. Combining imaging data with historical and clinical exam data helps to refine diagnosis accuracy. Imaging, like OCT, reveals optic nerve changes, while visual field tests track visual field loss, these aspects are not obtainable by review of text-based input data. Patient history informs risk factors such as age, ethnicity, medication use, and more. Integrating these data types offers a holistic view, which is crucial for precise glaucoma diagnosis and personalized treatment planning.

What kind of information or data would contribute to smarter AI diagnosis?

In the future, the goal would be to integrate several aspects that are important to the identification and treatment of glaucoma. These include data from the following:

Imaging data: High-resolution optic nerve imaging, like OCT, for precise assessment of nerve fiber layer thickness and optic disc changes. This is more of an objective test of disease presence and longitudinal follow-up.

Visual field tests: Monitoring visual field loss through perimetry tests for functional testing, which is more of a subjective test of disease presence and longitudinal follow-up.

Patient history: Including age, ethnicity, family history, medical conditions, medication use, and lifestyle habits, which influence glaucoma risk.

Genetic information: Understanding genetic predispositions aids in early detection for at-risk individuals.

Intraocular pressure (IOP) measurements: Monitoring IOP levels and fluctuations is crucial in glaucoma treatment.

Corneal thickness: A factor that influences IOP measurements and stands as a risk factor for disease progression.

Other biomarkers: Exploring biomarkers and adding verified reliable data to AI algorithms that can then help enhance diagnosis and gauging risk for progression would be very beneficial.