Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Glucose-Lowering Medication Under Conditions of Clinical Uncertainty

waddelma

Feb 24, 2025, 4:37 PM

Flory, James H.; Ancker, Jessica S.; Kim, Scott Y. H.; Kuperman, Gilad; Petrov, Aleksandr; Vickers, Andrew. “Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Glucose-Lowering Medication Under Conditions of Clinical Uncertainty.” Diabetes Care, vol. 48, no. 2, 2025, pp. 185-192, https://doi.org/10.2337/dc24-1067.

This study explores how GPT-4, a commercially available large language model (LLM), compares to endocrinologists when answering medical questions where there is no clear-cut answer. Specifically, the research examines how GPT-4 and endocrinologists approach prescribing metformin versus alternative treatments for diabetes.

To assess this, researchers analyzed responses from GPT-4 alongside those from 31 endocrinologists, using hypothetical clinical scenarios. The primary focus was on whether metformin or another treatment was chosen. With a basic prompt, GPT-4 selected metformin in 12% of cases, while endocrinologists chose it 31% of the time. When the prompt was adjusted to encourage metformin use, GPT-4’s selection increased to 25%. The model rarely recommended metformin for patients with impaired kidney function or a history of gastrointestinal distress (2.9% of cases), whereas endocrinologists still prescribed it in these situations 21% of the time. GPT-4’s responses remained consistent across multiple trials, except when kidney function was at an intermediate level.

Overall, GPT-4 provided reasonable responses, but its decisions differed from those of endocrinologists in clinically important ways. These differences highlight the need for careful evaluation of AI-generated medical advice. Before relying on LLMs in healthcare, their recommendations should not only align with clinical guidelines but also reflect patient and clinician preferences or demonstrate improved outcomes over standard care.

Figure 2

Rate of metformin prescribing by eGFR and respondent type. Solid line represents endocrinologist responses; dashed line is original GPT-4 prompt; dash-dotted line is GPT-4 prompt with default to metformin.

Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Glucose-Lowering Medication Under Conditions of Clinical Uncertainty

waddelma

Share

Explore Story Topics