Phlegm. Temperature. Sneezing. Apparently sick people are predictable. Before consulting a physician or shopping for DayQuil, we Google our symptoms – or, at least, we do so consistently enough to provide statistically valuable information.
Google researchers recognized this after comparing search query data to confirmed influenza data from the Centers for Disease Control and Prevention (CDC), and released Google Flu Trends in 2008 to much acclaim – and some alarm. The tool allows Regular Joes to view global flu trends, as illustrated by the frequency of these searches, geographically, and in near real-time.
This is exciting because the search engine data is processed in about a day, while the CDC has a one- to two-week reporting lag, according to Nature (pdf). The faster a potential epidemic is detected, the faster it can be contained. Google Flu Trends has shown to be effective for many strains of the flu, including H1N1. It also tracks bacterial infections including methicillin-resistant Staphylococcus aureus or MRSA.
The surprising success of this tool was largely the inspiration for another application released this week – Google Correlate. The idea is that you can expand your own personal search for meaning by including datasets. Provide it with one word or phrase, and it will chart others that have been requested with similar frequency over the past eight years. Provide it with an unrelated dataset (frog mating habits? consumer price index? — anything that can be quantified into weekly averages will do) and it will show you the search terms that correlate. Individuals’ querying patterns, once considered private, are aggregated to make a trend, and then compared with other measurable aspects of our world, in the hope that this could explain a few things.
I tried out this tool with a straightforward example — the Dow Jones industrial average, downloaded from Yahoo! Finance.
Correlating positively to the Dow (moving in the same direction, with the same relative amplitude) one finds a strikingly high percentage of search terms relating to finance and investment tools, and several seeming oddities, including: Biagio’s Restaurant (fine dining), TSP Talk (a government chat service), and Jordan Retros (a good-looking shoe.)
Correlating negatively to the Dow (moving in the opposite direction, with the same amplitude – a mirror image) one finds terms related directly to unemployment and bankruptcy, with one outlier being Air Terra Humara (another good-looking, but, actually, less pricey sneaker). If you shift those values one week (look for which searches mirror changes in the Dow a week later) you see, at the very top of the list of things Googlers wanted, a “funny site.” Indeed.
In short, people are always searching for all these terms, but to varying degrees. When the Dow is up, they search more frequently for ways to invest and spend their money, and when it is down, they are more often looking for ways to cope with economic hardship.
These scenarios all describe phenomena graphed against time. One can, alternately, use geography, as the flu application does. This is where privacy issues become thorniest. Possible paths of exploration include identifying movement of memes, consumer tendencies or perceived geographical and meteorological shifts. Plausible reactions to anything found there include targeted marketing, complex policy decisions and improved emergency responses.
To explain its tool to the non-statistician, Google uses a comic, which admonishes, “Correlation is not causation! ” This may be true, but inevitably some of us will be unable to resist trying to find a meaningful connection. Please, share your results.