Disease Trackers Examine Search Engine Data for Clues

A research team demonstrated the potential by looking at Google searches around the time of an outbreak of listeria, an infection caused by food-borne bacteria, that resulted in nearly 60 cases and 21 deaths in Canada in 2008.

The listeria outbreak was caused by an accumulation of bacteria in meat-slicing equipment at a Toronto plant.

The researchers found that nearly a month before federal officials announced a public outbreak on Aug. 17, searches for the term “listeriosis” spiked. The results of the analysis, published in the Canadian Medical Association Journal last month, show that the timing of those searches lines up more closely with the peak of the outbreak than news reports and the official announcement, which came as the number of new cases was on the decline.

“Our findings were a bit unexpected,” said Dr. Kumanan Wilson, co-author of the paper and research chair of public health policy at the Ottawa Health Research Institute. The spike was seen in searching for “listeriosis,” a more technical term than listeria, which did not spike as early.

“It could have been medical professionals [performing the searches], it could have been public health officials at early stages, so there are a variety of possibilities,” said Wilson.

Analyzing search term data is difficult because researchers do not necessarily know who is searching the disease-related terms or why. However, health workers in a region might be able to respond to an outbreak more quickly by having access to aggregated search trends and seeing that others nearby are searching for the same diseases or symptoms, Wilson said.

“With any of these types of instruments, you are going to get a lot more signals than actual events. There are going to be a lot of false positives; there can be spikes for unrelated reasons,” Wilson said.

“So it requires vetting by public health officials, but nevertheless it could be worth it … with the listeria outbreak it could have allowed us to bring in a recall earlier and prevented several deaths.”

In January, Google.org and the Centers for Disease Control and Prevention published results in Nature demonstrating the public health potential of search-term surveillance for tracking influenza in the United States.

The Google Flu Trends project revealed that the relative frequency of flu-related searches is highly correlated with the percentage of doctor visits by patients who have flu-like symptoms.

Through Flu Trends, it is possible to accurately estimate the current level of flu-related cases in each region of the United States, “with a reporting lag of about one day,” the researchers reported. These estimates were consistently a week to two weeks ahead of the CDC’s flu surveillance reports.

“Although it doesn’t replace the need for real viral surveillance data, Flu Trends is a good model,” said the CDC’s Arleen Porcell-Pharr. The CDC currently uses data provided by a variety of state and local health departments, as well as clinics, labs and emergency departments around the country for its weekly flu surveillance reports.

The CDC will evaluate the Google Flu Trends data at the end of the flu season to see if it was useful in responding to changes in influenza incidence.

“We will have to wait and see how useful the tool was, but in theory I think it can be used for any disease,” Porcell-Pharr said.

Both research teams emphasized that search-term surveillance should not be a replacement for more traditional forms of disease reporting, or for news report aggregators, which show more specifics than search-term analysis.

Tools and systems that aggregate isolated health news reports, such as the Global Public Health Intelligence Network, created by Health Canada, started to pop up in the 1990s, and are now widely used by public health officials and the World Health Organization.

Larry Madoff, editor of ProMED-mail, an electronic reporting system for outbreaks of emerging diseases, says official sources “aren’t always the best, earliest warning of disease outbreaks.”

ProMED-mail uses a combination of news reports and local, on-the-ground sources to alert readers to emerging problems. The service has 50,000 subscribers around the world and was one of the first programs to utilize the Internet for disease surveillance.

The benefits of a more nimble system were on display when ProMED-mail provided the first public report of the SARS outbreak that began in China in 2002. One of its readers was participating in a teachers’ chat room in China and heard talk of a massive outbreak of pneumonia and that hospitals were closing and people were dying. That message was shared by the network days before other sources started to pick up on it.

Readers in Toronto became aware of the disease through the network and were able to identify the symptoms when cases of SARS started to appear there.

As in search-term surveillance, one of the biggest challenges is determining when media or on-the-ground reports of a possible outbreak have reached a significant level, because of the information overload provided by the Web, Madoff said.

The group is partnering with HealthMap, a group that maps health news reports and alerts, to highlight important trends and find a balance between human and automated power.

“I don’t think that any single method is ‘the answer,'” said Madoff, but the addition of search-term data to existing alert networks could be a good thing. “It’s new. It’s exciting. It’s another clue.”