As artificial intelligence chatbots become embedded in daily life, health professionals and researchers are raising urgent questions about whether the public can — or should — rely on AI tools for medical guidance. A new study published in the journal Digital Health Quarterly found that while leading AI chatbots answered basic clinical questions with roughly 79 percent accuracy, that figure dropped sharply to 54 percent when questions involved drug interactions, dosage thresholds, or conditions requiring differential diagnosis. The findings are prompting renewed calls for clearer labeling and regulatory oversight of consumer AI health tools.
The proliferation of AI chatbots capable of synthesizing medical literature has been rapid. In the past three years, dozens of platforms have been launched or updated with health-specific features, ranging from symptom checkers to detailed explanations of laboratory results. Proponents argue that AI tools can democratize access to health information, particularly for patients in underserved regions with limited access to primary care physicians. Critics counter that the technology’s limitations are poorly communicated to users and that the gap between accurate information and confident-sounding misinformation can be clinically dangerous.
The Digital Health Quarterly study evaluated six major publicly available AI chatbots against a standardized battery of 1,200 clinical questions developed by a panel of board-certified physicians across internal medicine, emergency medicine, and pharmacology. Questions ranged from “What are the symptoms of appendicitis?” to “What is the maximum daily dose of acetaminophen for an adult with mild liver disease?” While chatbots performed well on broad factual queries, they struggled consistently with nuanced scenarios requiring patient-specific clinical judgment. In 12 percent of cases involving medication questions, the chatbots provided responses that researchers categorized as potentially harmful if followed without professional consultation.
“These tools are extraordinarily useful for general health literacy,” said Dr. Amara Nwachukwu, a hospitalist at a mid-sized academic medical center in Ohio and one of the study’s co-authors. “Where they become dangerous is when users treat them as a substitute for an actual clinician who can examine them, review their full history, and account for comorbidities that the chatbot simply has no way of knowing about.” Dr. Nwachukwu noted that patients with chronic conditions were particularly vulnerable to acting on AI advice that failed to account for their unique medication regimens or contraindications.
Industry representatives from several AI platform companies emphasized that their products are designed to complement, not replace, professional medical care. Most chatbots include disclaimers recommending that users consult a physician before acting on any health information provided. However, consumer behavior research suggests those disclaimers have limited effect. A survey of 3,000 adults conducted alongside the Digital Health Quarterly study found that 41 percent of respondents who used a chatbot for a health question reported taking some action — such as adjusting medication dosage or delaying a medical appointment — based on the AI’s response, without first consulting a healthcare provider.
Regulatory bodies in several countries are beginning to examine how AI health advice tools should be classified. In the European Union, discussions are underway about whether chatbots that provide individualized health guidance should be subject to the same regulatory pathway as medical devices. In the United States, the Food and Drug Administration has issued draft guidance indicating that AI tools offering “clinical decision support” may require oversight depending on the specificity of their outputs, though the framework remains in development.
Health literacy advocates argue that rather than discouraging AI use entirely, the priority should be educating the public about its appropriate scope. “People are going to use these tools regardless,” said Celeste Obinna, director of a nonprofit that promotes digital health education in low-income communities. “Our job is to help them understand what questions AI can reliably answer and when they absolutely need a real doctor.” She recommended that health systems integrate AI literacy into patient education programs and that clinicians proactively discuss with patients how they are using digital tools between appointments.
Researchers say the next phase of study will focus on specific populations — including elderly patients managing multiple chronic conditions and parents making pediatric health decisions — to assess whether AI health advice poses greater risks in those groups. They also called on AI developers to publish accuracy benchmarks for health queries on a standardized basis, a practice currently observed by only a minority of platforms. Without transparent benchmarking, experts warn, neither patients nor regulators have sufficient information to make informed decisions about when AI health advice can be trusted.