ChatGPT Passes Radiology Board Exam

Hana M May 22, 2023 | 10:00 AM Technology

Two recently published research studies in Radiology shed light on the immense potential of large language models like the latest version of ChatGPT. These studies showed that ChatGPT successfully cleared a radiology board-style exam, demonstrating its ability to analyze medical imaging data. However, these studies also emphasized the importance of acknowledging the limitations that currently hinder the reliability of such models in the field of radiology. [1]

Figure 1. Robot Thinking.

Figure 1 shows an illustration of a robot’s thinking. ChatGPT, an AI chatbot, utilizes a deep learning model to generate human-like responses by recognizing patterns and relationships between words. However, due to the lack of a definitive source of truth in its training data, the tool may produce factually incorrect responses. [1]

“The use of large language models like ChatGPT is exploding and only going to increase,” said lead author Rajesh Bhayana, MD, FRCPC, an abdominal radiologist and technology lead at University Medical Imaging Toronto, Toronto General Hospital. “Our research provides insight into ChatGPT’s performance in a radiology context, highlighting the incredible potential of large language models, along with the current limitations that make it unreliable.” [1]

Dr. Bhayana highlighted that ChatGPT, recognized as the fastest-growing consumer application in history, is being integrated into popular search engines such as Google and Bing. These chatbots are increasingly utilized by physicians and patients to access medical information. [1]

Dr. Bhayana and colleagues conducted a study to evaluate the performance of ChatGPT, specifically the widely used GPT-3.5 version, on radiology board exam questions. The researchers administered 150 multiple-choice questions that were carefully crafted to align with the style, content, and level of difficulty seen in the Canadian Royal College and American Board of Radiology exams. This approach allowed them to explore the strengths and limitations of the model in a controlled setting. [1]

The questions used in the study did not involve images and were categorized based on the type of thinking required for answering them. This categorization included lower-order thinking, which focused on knowledge recall and basic understanding, and higher-order thinking, which involved applying, analyzing, and synthesizing information. Furthermore, the higher-order thinking questions were subcategorized based on the type of skills required, such as describing imaging findings, clinical management, calculation and classification, and disease associations. [1]

The performance of ChatGPT was evaluated overall and by question type and topic. Confidence of language in responses was also assessed. [1]

The researchers observed that ChatGPT using the GPT-3.5 model achieved a 69% accuracy rate in answering the questions, correctly responding to 104 out of 150 questions. This performance was close to the passing grade of 70% employed by the Royal College in Canada. Notably, the model demonstrated relatively strong performance in answering questions that involved lower-order thinking, with an 84% accuracy rate (51 out of 61). However, it faced challenges in responding to questions requiring higher-order thinking, achieving a 60% accuracy rate (53 out of 89) in that category. [1]

More specifically, it struggled with higher-order questions involving the description of imaging findings (61%, 28 of 46), calculation and classification (25%, 2 of 8), and application of concepts (30%, 3 of 10). Its poor performance on higher-order thinking questions was not surprising given its lack of radiology-specific pretraining. [1]

In a subsequent study, GPT-4, which was released in limited form to paid users in March 2023, demonstrated notable advancements in its advanced reasoning capabilities compared to GPT-3.5. GPT-4 correctly answered 81% (121 out of 150) of the same questions, surpassing both the performance of GPT-3.5 and the passing threshold of 70%. Particularly, GPT-4 excelled in higher-order thinking questions, achieving an 81% accuracy rate, with specific strengths in describing imaging findings (85%) and applying concepts (90%). [1]

These findings indicate that GPT-4's claimed enhancements in advanced reasoning capabilities have tangible benefits in the field of radiology. The improved contextual understanding of radiology-specific terminology, including imaging descriptions, holds promise for future applications in this domain. [1]

“Our study demonstrates an impressive improvement in the performance of ChatGPT in radiology over a short period, highlighting the growth potential of large language models in this context,” Dr. Bhayana said. [1]

GPT-4 showed no improvement on lower-order thinking questions (80% vs 84%) and answered 12 questions incorrectly that GPT-3.5 answered correctly, raising questions related to its reliability for information gathering. [1]

“We were initially surprised by ChatGPT’s accurate and confident answers to some challenging radiology questions, but then equally surprised by some very illogical and inaccurate assertions,” Dr. Bhayana said. “Of course, given how these models work, the inaccurate responses should not be particularly surprising.” [1]

ChatGPT’s dangerous tendency to produce inaccurate responses, termed hallucinations, is less frequent in GPT-4 but still limits usability in medical education and practice at present. [1]

Both studies showed that ChatGPT used confident language consistently, even when incorrect. This is particularly dangerous if solely relied on for information, Dr. Bhayana notes, especially for novices who may not recognize confident incorrect responses as inaccurate. [1]

“To me, this is its biggest limitation. At present, ChatGPT is best used to spark ideas, and help start the medical writing process, and in data summarization. If used for quick information recall, it always needs to be fact-checked,” Dr. Bhayana said. [1]

Source: RSNA

References:

  1. https://www.rsna.org/news/2023/may/chatgpt-passes-board-exam

Cite this article:

Hana M (2023), ChatGPT Passes Radiology Board Exam, AnaTechmaz, pp.257

Recent Post

Blog Archive