ChatGPT can be a useful tool for patients who are seeking medical information and guidance, but the artificial intelligence tool can’t fully replace the value of a human physician – it says so itself.
“While I am a language model that has been trained on a vast amount of information, I am not a licensed medical professional and I am not capable of providing medical diagnoses, treatments, or advice,” the chatbot wrote in response to a question from CNN.
Still, new research published this week suggests that physicians may have some things to learn from the chatbot when it comes to patient communication.
A panel of licensed health care professionals assessed responses to about 200 different medical questions posed to a public online forum, including patient inquiries about medical diagnoses, need for medical attention and more.
Responses from ChatGPT were “preferred over physician responses and rated significantly higher for both quality and empathy,” according to a study published Friday.
More than a quarter of responses from physicians were considered to be less than acceptable in quality compared with less than 3% of those from ChatGPT. Conversely, nearly half of responses from ChatGPT were considered to be empathetic (45%) compared with less than 5% of those from physicians.
On average, ChatGPT scored 21% higher than physicians for the quality of responses and 41% more empathetic, according to the study.
In one example provided in the study, a patient posed a question to a social media forum about the risk of going blind after a splash of bleach in the eye. ChatGPT started its response by apologizing for the scare, followed by seven more sentences of advice and encouragement about the “unlikely” result of going blind. Meanwhile, one physician responded with “sounds like you will be fine,” followed by the phone number for Poison Control. All clinicians evaluating these responses preferred ChatGPT’s response.
As in this example, experts note that responses from ChatGPT were typically much longer than those from physicians, which could affect perceptions of quality and empathy.
“Without controlling for the length of the response, we cannot know for sure whether the raters judged for style (e.g., verbose and flowery discourse) rather than content,” wrote Mirella Lapata, professor of natural language processing at the University of Edinburgh.
Earlier this month, Dr. David Asch, a professor of medicine and senior vice dean at the University of Pennsylvania, asked ChatGPT how it could be useful in health care. He found the responses to be thorough, but verbose.
“It turns out ChatGPT is sort of chatty,” he said. “It didn’t sound like someone talking to me. It sounded like someone trying to be very comprehensive.”
Asch, who ran Penn Medicine Center for Health Care Innovation for 10 years, says he’d be excited to meet a young physician who answered questions as comprehensively and thoughtfully as ChatGPT answered his questions, but warns that the AI tool isn’t yet ready to fully entrust patients to.
“I think we worry about the garbage in, garbage out problem. And because I don’t really know what’s under the hood with ChatGPT, I worry about the amplification of misinformation. I worry about that with any kind of search engine,” he said. “A particular challenge with ChatGPT is it really communicates very effectively. It has this kind of measured tone and it communicates in a way that instills confidence. And I’m not sure that that confidence is warranted.”
Additional research published this week compared postoperative care instructions for eight common pediatric procedures that were provided by ChatGPT, Google and Stanford University. The responses were analyzed based on a standardized scale for understandability, actionability and specificity.
Overall, instructions directly from the medical institution received the highest scores. ChatGPT and Google were about even in terms of understandability, both scoring better than 80%. And while ChatGPT scored well in actionability (73%), Google responses were rated higher (83%).
While ChatGPT did not outperform other resources, the researchers say it still has value and some advantages – including the ability to customize responses to different literacy levels. For this analysis, ChatGPT was asked to provide instructions at a fifth grade reading level.
“ChatGPT provides direct answers that are often well-written, detailed, and in if-then format, which give patients access to immediate information while waiting to reach a clinician,” the researchers wrote.
Get CNN Health's weekly newsletter
- Sign up here to get The Results Are In with Dr. Sanjay Gupta every Friday from the CNN Health team.
Still, Asch says that ChatGPT is better viewed as a support for doctors than as a guide for patients. It’s best used “one step removed from the clinical encounter,” in situations that are low-risk to the patient, he said.
“I have a very optimistic sense of this, but it’s all predicated on operating within the guardrails of truth. And at the moment, I don’t know that guardrails of truth exist in the way in which ChatGPT constructs its answers,” he said.