Skip to Main Content

Rewards, risks with AI chatbots in chronic disease care

3 Minute Read

New research has found that an AI chatbot outperformed human doctors in some tasks but also created safety risks and amplified social inequities.

Human physicians in China were evaluated against ERNIE Bot—China’s most widely used AI chatbot—and two other advanced AI systems, ChatGPT-4o and DeepSeek R1. The large simulation study—conducted by an international research team including scientists from the Yale School of Public Health—involved 384 simulated patient consultations for unstable angina and asthma. Heart and respiratory conditions are non-communicable chronic diseases, one of the leading causes of death and illness worldwide.

Such conditions are responsible for 41 million deaths annually, with 77% occurring in low- and middle-income countries. The promise of AI bots is that they could bridge crucial gaps in low- and middle-income countries, where many patients remain undiagnosed or poorly managed due to a shortage of qualified health care providers.

In the study, ERNIE (Enhanced Representation through kNowledge IntEgration) Bot achieved a 77.3% diagnostic accuracy rate and 94.3% accuracy in prescribing correct medications, far outperforming frontline doctors in China’s primary care system, who gave correct diagnoses only 25% of the time and correct prescriptions only 10% of the time.

The chatbot also ordered unnecessary lab tests in 91.9% of the cases and prescribed potentially inappropriate or harmful medications to 57.8% of the simulated patients. In addition, older patients (65 vs. 55) and wealthier patients received more accurate diagnoses and more intensive—but often excessive—treatment than their younger or poorer counterparts.

The ERNIE Bot also followed only 14.5% of standard full diagnostic checklists and 20.3% of essential steps, raising concerns about its thoroughness and interest in higher efficiency.

Despite the mixed results, the study’s authors, an international team from Australia, China, and the United States, wrote that the chatbots “hold promise in alleviating the burden of non-communicable chronic diseases by extending diagnostic and treatment capabilities in settings where resources are scarce.” But they cautioned that “our findings also emphasize balancing AI’s potential with necessary safeguards.”

We must prioritize safety, equity, and human oversight if we want AI to strengthen global health systems.

Xi Chen, PhD
Associate Professor of Public Health (Health Policy) and Associate Professor at Institution for Social and Policy Studies

In a smaller comparison study, ChatGPT-4o and DeepSeek R1 scored higher than ERNIE Bot in diagnostic and prescribing accuracy but also showed higher rates of overprescription. ChatGPT scored 92.5% in diagnostic accuracy and 100% in prescription accuracy. DeepSeek had perfect 100% scores in both diagnosis and prescription accuracy. However, ChatGPT ordered unnecessary lab tests in 92.5% of the cases and inappropriate prescriptions 67.5% of the time. Similarly, DeepSeek ordered unnecessary tests in 95% of the cases and inappropriate medications in 60% of the trials.

Chatbots have gained rapid adoption in China--ERNIE Bot has over 200 million users--and are widely used for medical consultation. ERNIE Bot is uniquely developed for Chinese language, cultural contexts, and medical literature, and the AI passed the Chinese National Medical Licensing Exam. The current study is believed to be one of the first to empirically evaluate ERNIE Bot for quality, safety, and disparity in care.

“Our findings suggest that integrating AI into health care requires much more than technical accuracy,” said Dr. Xi Chen, PhD, associate professor of public health (health policy) at the Yale School of Public Health and a co-author of the study. “We must prioritize safety, equity, and human oversight if we want AI to strengthen global health systems.”

The researchers said chatbots could help doctors triage patients and support clinicians in making treatment decisions. But robust safeguards, accountability, and equity-focused systems must also be in place.

The study was published in the Springer Nature journal npj Digital Medicine on September 25, 2025.

Article outro

Author

Colin Poitras
Senior Communications Officer

The Chen Lab

Learn more

Explore More

Featured in this article

Related Links