Barbara, a 52-year-old woman, appears one morning at a Baltimore hospital emergency room complaining of a sore foot. After the EMS physicians examine her, they diagnose dry gangrene of the toe and admit her to a general ward for monitoring.
On the third day, Barbara shows signs of what looks like pneumonia. They inspect her and prescribe the usual treatment, a course of antibiotics. However, three days later, Barbara’s heart rate suddenly rises, and in the afternoon, she starts breathing faster. (Barbara is not her real name.)
During the following twelve hours, her condition deteriorates rapidly until she enters septic shock—a severe condition with mortality rates as high 50%. Barbara’s doctors hurriedly prepare her for transfer to the intensive care unit.
But, unfortunately for Barbara, it is already too late. Despite receiving every possible treatment during her next three weeks in the ICU, the woman grows steadily worse until her kidneys then lungs finally fail, and on the 22nd day of her hospitalization, she dies.
Barbara’s mild foot infection had escalated into fatal septic shock nearly entirely undetected.
Barbara’s story is far from unusual. Every year, about 250,000 people in the U.S. die from sepsis, which occurs when the body exhibits an extreme reaction to an infection. Diagnosis is particularly difficult because sepsis often shows no obvious signs until organ dysfunction has already begun, says Suchi Saria, a machine learning researcher at Johns Hopkins University whose current focus is precision health care. “Early interventions such as antibiotics have been shown to improve outcomes,” Saria says, “but the disease often goes undetected for too long.”
Driven by the nation’s aging population, the rise of drug-resistant bacteria, and the growth of medical interventions, sepsis has become one of the most common reasons for hospitalization in the U.S. More than 1.5 million domestic sepsis cases now occur annually, which accounts for more than $20 billion a year in health-care spending, stemming mainly from the need for more intensive care and longer hospital stays.
Worse, sepsis now contributes to up to half of all deaths of hospitalized patients, “making it the eleventh biggest cause of death in the U.S., killing more people than breast cancer, prostate cancer, and AIDS combined,” Saria says.
She says she tells Barbara’s story because it helps cut through the scary statistics to personalize the ongoing tragedy of sepsis in America. But it’s also because she and her team at the Hopkins’ Machine Augmented Decision Making Lab think that Barbara is just the kind of patient that their new computer system should be able to help save, perhaps soon.
“The signs that help pinpoint the diagnosis may already be in your data, but the data in electronic medical records are really messy,” Saria says. “We’re designing computer algorithms based on statistical and computational techniques that were recently developed to allow clinical experts to identify sepsis faster.”
When Delay Means Death
Researchers have shown that every hour a sepsis victim goes untreated adds about 8% to their mortality rate, says Dr. Peter J. Pronovost, who directs Johns Hopkins Medicine’s Armstrong Institute for Patient Safety and Quality and is collaborating with Saria on this work. “So, every second’s delay in recognizing it is critical—literally a matter of life or death.”
“The key with sepsis is early diagnosis,” says Dr. Emanuel Rivers, a professor of critical care medicine at Henry Ford Hospital in Detroit and a pioneer in the development of the procedures widely used today to detect and treat the condition. “You need to closely monitor high-risk patients, to recognize any signs of sepsis as soon as possible, and then have something to do about it that works.”
Since Rivers and his colleagues published that sepsis work in 2001, “what had been ignored has become a hot area,” he says. “A whole group of scientists have focused on early diagnosis, bringing to bear different expertise, different technologies and approaches.”
To improve sepsis care, Rivers continues, many health research organizations have begun to implement various screening tools that can send an alert or alarm to a clinician or nurse when a patient meets certain criteria that classify them as high risk. He says that at least five healthcare companies are planning to introduce commercial sepsis-alert products soon, noting that they resemble existing monitoring/alert products for cardiology patients. “A computer algorithm analyzes all the health signals, such as vital signs and metabolic levels, from the patients’ electronic records to tell which have the greatest likelihood of getting sepsis.”
“The trick is to monitor all the variables that really mean something,” Rivers says, “and to, as they say, ‘get rid of the noise in the data’ that could lead to system hiccups where the monitoring data somehow gets translated into some unwanted action.”
“One key issue with these screening systems is false alarms,” Pronovost says. “You can’t have the alarm system crying wolf or it will be ignored.” So-called alarm fatigue results when people learn to ignore computerized flags, alerts, and notifications because too many turn out to be false.
In 2015, Saria and her team first showed that a computer algorithm they developed could sift through patients’ records and predict septic shock—the deadliest version of sepsis—in 85% of cases, usually more than a day before onset. Two-thirds of the time the system predicted sepsis before it inflicted any damage. That’s 60% better detection performance compared to current screening tests without raising the rate of false alerts.
Saria believes that reducing the rate of false alerts further is crucial for practical adoption. “Doctors are busy people, and their every moment is valuable.”
The Bayesian Way
Medical data is challenging to work with because of widespread omissions, inputs from various types of instrumentation with different measurement practices, and other dissimilarities that had always made such computer analysis seemingly intractable. The team’s tool, Targeted Real-time Early Warning System, or TREWS, aims to deploy these machine-learning methods inside hospitals to help save lives.
The researchers realized early on that AI algorithms that leverage so-called Bayesian techniques in probability and statistics might do the trick because “they are fundamentally suited to dealing with uncertainty,” Saria says.
Bayesian machine learning allows researchers to encode in models their prior beliefs about what those models should look like and how they should behave. Then, as additional information comes in, they can update those beliefs. To get a feel for how Bayesian inference works, consider the example of a physician who sees a patient complaining of chest pain. The doctor will have a very different expectation of whether the diagnosis is heart disease if the patient is a 65-year-old man versus a 12-year-old boy. As she receives new evidence such as electrocardiogram (ECG) test results, the doctor will update her beliefs to incorporate the new information. Bayesian algorithms can similarly adjust to new information, but at the same time are able to operate in domains where data is patchy and sometimes inaccurate.
This flexibility is critical for models that are used in healthcare because “medical data comprise hundreds of different signals each with its own significance and effect—or not,” Saria says. In fact, it’s a moving haystack, hiding a few, potentially fake needles. While there are many signals, not all of them are relevant to forecasting for a specific patient, plus many readings are missing at various points, and data errors are often frequent. The probabilistic foundations for Bayesian machine-learning techniques make it well equipped for characterizing the uncertainties that accompany flawed data sets, allowing better, cleaner evaluation of the quality of the modeling results.
“It’s a question of signal-to-noise, of competently estimating and always propagating the uncertainties associated with the entire system and each part,” Saria says. Essentially, each data point is always linked with a certain uncertainty, or error bar, that serves as a gauge of the signal’s trustworthiness. That uncertainty can be incorporated into the future modeling analyses, ensuring that it’s not lost along the way. Knowing the uncertainties associated with each data point ultimately makes the final result that much more valuable.
TREWS uses Bayesian inference methods to propagate the uncertainties of each medical data point throughout the analysis. For example, this approach is able to start out by asking—and then accommodating—something as basic as, how accurate and reliable are all those blood pressure measurements in the first place? It automatically weights more accurate or trustworthy variables more heavily in the output, and it reduces the weight given to less reliable measures—exactly the sort of strategy that well-trained humans use.
This machine learning approach differs from the more familiar AI technique of deep-learning, which are interconnected stacks of mathematical optimization functions that can operate in parallel and “learn” by tackling similar tasks or problems over and over again. The latter have shown tremendous success in diverse imaging-based, pattern-recognition, gaming, and other applications.
“But in medicine, where reliability is paramount,” Saria says, “we need techniques that are well-adapted to solving time-based prediction tasks with messy, limited data.” In joint work with Hossein Soleimani, a postdoctoral fellow in Saria’s lab, they show that their newer Bayesian way boosts “true positive” rates of detection by 200% to 300% over alternate state-of-the-art approaches, she says.
“What is also novel about this system that is that it could recommend the best personalized course of treatment for each sepsis patient,” Pronovost says. When the system recognizes that the patient is at risk, it gives care providers the option to tailor their therapy based upon the type of infection they have.
Rivers, the sepsis expert, thinks that improved, TREWS-type analysis of patient monitoring could in time “turn up some new clues about sepsis” that lead to more effective therapies. He suspects, for instance, that close monitoring of patients’ biorhythms could provide useful clues that would justify triggering alarms at certain points.
Introducing a new predictive tool into the complex, often-interrupted and high-stakes environment of a hospital is challenging, something that the TREWS team is now focusing on as it prepares for real-world trials at Howard County General Hospital in Columbia, Maryland. These trials will determine whether the interactive system actually works in the field. The prototype algorithm is being embedded into the hospital’s electronic medical records system.
There’s still lots to be worked out, from physician and nurse workflows, general usability tweaks, data validation, and training. They all have to be addressed beforehand if the implementation is to be successful. Ultimately, Saria says, TREWS is designed to seamlessly integrate into doctors’ routines, saving them time in sifting through the information they need to make decisions.
“We need the systems to produce answers that can be relied upon,” she says. “We designed the system to operate in real-time and be highly interactive. But to start, we conducted a pilot trial in the background that allows us to measure physicians’ behavior and how the system could affect their practice.”
“Right now we’re digging into where it should live within the clinician’s workflow,” says Dr. Anirudh Sridharan, a hospitalist and geriatrician at Howard County General Hospital who is working with Saria’s team. According Sridharan, the researchers are currently homing in on the best way to notify clinicians of potential sepsis cases and develop effective protocols for responding.
“We’re addressing things as basic as ensuring we’re alerting the right person on the team,” Sridharan says. “Asking questions like: Should the alerts be active? Passive? Do you want a text notification on your phone? It’s all part of deploying the tool to best effect.”
“We’re deploying a tool that can potentially deliver orders of magnitude of better care by surfacing the affected patients earlier so we can help them before it’s too late,” he adds. “And from what I’ve seen, we’re going to see a lot more AI in hospitals and health care.”
Making Use of Electronic Medical Records
It’s been a long wait for electronic medical data technology to truly pay off, Sridharan points out. “The old care model basically has been to replicate what was on paper records, and it certainly brought advantages such as better, updatable, trackable records that, for example, eliminated mistakes from chicken-scratch scripts and so forth,” he says. “But what everyone’s been waiting for is for everybody to stop just entering digital data, and somebody to finally take all this digital information and turn it into a real aid in making medical decisions.” Earlier systems provided “answers that were so simplistic as to be useless,” he adds.
“Electronic medical records are the single greatest, most disagreeable, disappointment of modern practice,” says Dr. Michael Gropper, chairman of critical care medicine at the University of California, San Francisco’s School of Medicine. “We haven’t been smart about how we created the electronic records, and now we’re paying the price.”
Meanwhile, the practice of medicine has changed, Gropper says. “Medicine used to be based on the hero model with the physician in the role of single, all-knowing decision-maker. That’s all changed; medicine is a team effort now.”
“Improved outcomes rely on leveraging all the members as well as all the technology to deliver the very best care. But when we’ve introduced these kinds of screening tools into our ICU, we’ve found that the transition is usually not straightforward,” he warns.
Changing the System
Dr. Jonathan H. Chen, an expert at medical informatics at Stanford University, agrees. “Trying to unlock the power of electronic medical data sources to improve clinical care outcomes is a great goal, but I believe implementation will be harder than some may expect,” he says. “First, the high accuracy measures that researchers report may just be predicting things the doctors already know or are false alarms that are only distracting them, leading to alert fatigue.” Second, “just because an accurate prediction is made, doesn’t mean we have an intervention to change the outcomes.”
“For example, the TREWS model was trained on data from patients already in the ICU receiving the highest level of monitoring,” Chen points out. “But having an ‘early warning system’ doesn’t make much sense at this point.” In addition, TREWS appears to be predicting sepsis in cases where doctors had already recognized the signs and were intervening. “The number you care about is positive predictive value,” he says. “Given that the alarm goes off, what is the chance the patient will end up having septic shock?” After a look at the TREWS papers, he calculates that roughly two out of three times the algorithm would be giving doctors a false alarm.
“Of course, it’s easy to point out these technical issues,” Chen says, “because I commend Professor Saria’s team for continuing to push through and actually get systems implemented and evaluated in real clinical settings. This takes an enormous amount of work, resources, and political will to accomplish, far more than the relatively sterile steps of data analytics to produce the prediction model.”
Just how Saria’s AI agent has gotten good enough to even try to wade through mountains of heterogeneous hospital data to sniff out something as subterranean as sepsis has a lot do to with her frustration with the so-far failed promise of electronic record-keeping. Like the other researchers, she longs for progress. “For seven or eight years now, we’ve had the digitized medical record data, but not much has happened yet to really exploit it,” she says. “I want to change that.”