Prediction by the Numbers

Discover why some predictions succeed and others fail as experts forecast the future. Airing February 28, 2018 at 9 pm on PBS Aired February 28, 2018 on PBS

Program Description

Predictions underlie nearly every aspect of our lives, from sports, politics, and medical decisions to the morning commute. With the explosion of digital technology, the internet, and “big data,” the science of forecasting is flourishing. But why do some predictions succeed spectacularly while others fail abysmally? And how can we find meaningful patterns amidst chaos and uncertainty? From the glitz of casinos and TV game shows to the life-and-death stakes of storm forecasts and the flaws of opinion polls that can swing an election, “Prediction by the Numbers” explores stories of statistics in action. Yet advances in machine learning and big data models that increasingly rule our lives are also posing big, disturbing questions. How much should we trust predictions made by algorithms when we don’t understand how they arrive at them? And how far ahead can we really forecast?


Prediction by the Numbers

PBS Airdate: February 28, 2018

NARRATOR: The future unfolds before our eyes, but is it always beyond our grasp? What was once the province of the gods has now come more clearly into view, through mathematics and data. Out of some early observations about gambling, arose tools that guide our scientific understanding of the world and more, through the power of prediction.

From our decisions about the weather

NEWSCASTER: …the strongest hurricane ever on record.

NARRATOR: …to finding someone lost at sea…

BOATSWAIN'S MATE 2 EDWARD NYGREN (United States Coast Guard): Commencing search pattern.

BOATSWAIN'S MATE 1 LUKE SCHAFFER (United States Coast Guard): Keep a good look out.

NARRATOR: …every day mathematics and data combine to help us envision what might be.

LIBERTY VITTERT (University of Glasgow): It's the best crystal ball that humankind can have.

NARRATOR: Take a trip on the wings of probability, into the future.

MONA CHALABI (The Guardian, United States Edition): We are thinking about luck or misfortune, but they just, basically, are a question of math, right?

NARRATOR: Prediction by the Numbers, right now, on NOVA.

The Orange County Fair, held in Southern California: in theory, these crowds hold a predictive power that can have startling accuracy, but it doesn't belong to any individual, only the group. And even then, it has to be viewed through the lens of mathematics. The theory is known as the "wisdom of crowds," a phenomenon first documented about a hundred years ago.

Statistician Talithia Williams is here to see if the theory checks out and to spend some time with the Fair's most beloved animal, Patches, a 14-year-old ox.

TALITHIA WILLIAMS (Harvey Mudd College): It was a fair, kind of like this one, where, in 1906, Sir Francis Galton came across a contest where you had to guess the weight of an ox, like Patches, you see here behind me.

NARRATOR: After the ox weight-guessing contest was over, Galton took all the entries home and analyzed them statistically. To his surprise, while none of the individual guesses were correct, the average of all the guesses was off by less than one percent. That's the wisdom of crowds.

But is it still true?

TALITHIA WILLIAMS: So, here's how I think we can test that today. What if we ask a random sample of people, here at the fair, if they can guess how many jellybeans they think are in the jar, and then we take those numbers and average them and see if that's actually close to the true number of jellybeans?

Guess how many jellybeans are in here.

Come on, guys. Everybody's got to have their guess.

I see your mind churning.

FAIR GOER 1: Twelve-twenty-seven.

FAIR GOER 2: Eight-forty-six.

FAIR GOER 3: Probably, like 925?

FAIR GOER 4: I think 1,000.

TALITHIA WILLIAMS: So, just write your number down. Uh huh, there you go.

CHILD AT FAIR: Can I have a jellybean?

NARRATOR: The 135 guesses gathered from the crowd vary wildly.

TALITHIA WILLIAMS: The range of our guesses was from…the smallest was 183, the largest was 12,000. So, you can tell, folks were really guessing.

But when we take the average of our guesses, we get 1,522. So, the question is, how close is our average to the actual number of jellybeans? Well, now's the moment of truth.

All right, so the real number of jellybeans was 1,676. The average of our guesses was off by less than 10 percent, so there actually was some wisdom in our crowd.

NARRATOR: Though off by about 10 percent, the average of the crowd's estimates was still more accurate than the vast majority of the individual guesses. Even so, the wisdom of crowds does have limits. It can be easily undermined by outside influences and tends to work best on questions with clear answers, like a number.

The steps Talithia took reflect a process going on all around us these days in the work of statisticians.

TALITHIA WILLIAMS: Thanks everybody.

So, we collected this data, right? We analyzed it mathematically, and we got an estimate that was pretty close to the actual true value. That's math and statistics at work.

NARRATOR: We didn't always use math and statistics to make predictions. The Romans studied the flights and cries of birds; the Chinese cracked "oracle" bones with a hot metal rod and read the results; nineteenth-century Russians used chickens. Throughout history, we've sought the future in moles on people's faces, clouds in the sky or a pearl cast into an iron pot. And that list of things used for predicting goes on and on.

But more recently, that is the last couple hundred years, to see into the future, we've turned to science and made some remarkable predictions, from the existence of Neptune or radio waves or black holes, to the future location of a comet, with such precision we could land a space probe on it.

But, if you pop the hood of science, inside you'll find a field of applied mathematics that's made many of those predictions possible: statistics.

REGINA NUZZO (Gallaudet University): Statistics is kind of unique. It's not an empirical science, itself, but it's not pure math, but it's not philosophy either. It's the, the framework, the language, the rules by which we do science.

RICHARD "DICK" DE VEAUX (Williams College): From that, we can make decisions, we can make conclusions, we can make predictions. That's what, that's what statisticians try to do.

LIBERTY VITTERT: Why I love statistics is that it predicts the likelihood of future occurrences, which really means it's the best crystal ball that humankind can have.

NARRATOR: Ultimately, all the predictive power of statistics rests on a revolutionary insight from about 500 years ago: that chance, itself, can be tamed through the mathematics of probability.

Viva Las Vegas! Here's a city full of palaces built on understanding probability and fueled by gambling, which may seem a funny place to find mathematician Keith Devlin. But mathematics and gambling have been tied together for centuries.

Today, in a casino you'll find roulette, slot machines, blackjack. Playing craps is also known as "rolling the bones," which is more accurate than you might think.

KEITH DEVLIN (Stanford University): Humans have been gambling since the beginnings of modern civilization. The ancient Greeks, the ancient Egyptians would use the anklebones of sheep as a form of early dice.

NARRATOR: Surprisingly, while the Greeks laid the foundation for our mathematics, they didn't spend any effort trying to analyze games of chance.

KEITH DEVLIN: It seems to have never occurred to them, or indeed to anybody, way up until the 15th, 16th century, that you could apply mathematics to calculate the way these games would come out.

NARRATOR: Sixteenth-century Italian mathematician, Gerolamo Cardano, made a key early observation that the more times a game of chance is played, the better mathematical probability predicts the outcome, later proven as the "law of large numbers." Examples of the law of large numbers at work surround us.

KEITH DEVLIN: When I flip this coin, we have no way of knowing whether it's going to come up heads or tails.

That time it was heads.

On the other hand, if I were to toss a coin a hundred times, roughly 50 percent of the time it would come up heads, and 50 percent of the time it would come up tails. We can't predict a single toss; we can predict the aggregate behavior over a hundred tosses. That's the law of large numbers.

NARRATOR: In fact, casinos are a testament to the iron hand of the law of large numbers. The games are designed to give the casinos a slight edge over the gambler.

Take American roulette: on the wheel are the numbers one through 36, half red and half black. Betting a dollar on one color or the other seems like a 50-50 proposition. But the wheel also has two green slots with zeros. If the ball lands in those, the casino wins all the bets on either red or black. And that's the kind of edge that makes the casino money over the long run.

KEITH DEVLIN: Customers are gambling. The casino is absolutely not gambling, because they may lose money, they may lose a lot of money to one or two players, but if you have thousands and thousands of players, by the law of large numbers, you are guaranteed to make money.

NARRATOR: The law of large numbers comes into play outside of gambling too.

In basketball, a field goal or shooting percentage is the number of baskets made, divided by the number of shots taken. But early in the season, when it's based on a low number of attempts, that percentage can be misleading.

JORDAN ELLENBERG (University of Wisconsin-Madison): At the beginning of the season, a less skilled player might get off a few lucky shots in a row, and at that point, they'd have a super-high shooting percentage.

NARRATOR: Meanwhile, a very skilled player might miss a few at the beginning of the season and have a low shooting percentage. But as the season goes on, and the total number of shots climbs, their shooting percentages will soon reflect their true skill level. That's the law of large numbers at work.

A small sample, like just a few shots, can be deceptive, while a large sample, like a lot of shots, gives you a better picture.

The gambling observations that led to the law of large numbers were a start. But what really launched probability theory and opened up a door to a whole new way of thinking about the future was a series of letters exchanged between two French mathematicians, Blaise Pascal and Pierre de Fermat, in the 1650s, about another gambling problem that had been kicking around for a few centuries.

A simplified version of the problem goes like this: two players, let's call them Blaise and Pierre, are flipping a coin. Blaise has chosen heads and Pierre, tails. The game is the best of five flips and each has put money into the pot. They flip the coin three times and Blaise is ahead two to one. But then the game is interrupted. What is the fair way to split the pot?

KEITH DEVLIN: The question is: how do they divide up the pot so that it's fair to what might have happened if they'd been able to complete the game?

NARRATOR: Fermat suggested imagining the possible future outcomes if the game had continued. There are just two more coin flips, creating four possible combinations: heads, heads; heads, tails; tails, heads; and tails, tails.

In the first three, Blaise wins with enough heads; Pierre only wins in the last case, so Fermat suggested that a three to one split was the correct solution.

The key breakthrough was imagining the future, mathematically, something even Pascal had trouble with.

KEITH DEVLIN: Because what Fermat did was say, "Let's look into the future. Look at possible futures, and we'll count the way things could have happened in different possible futures." It was a simple arithmetic issue, but the idea of counting things in the future was just completely new, and Pascal couldn't wrap his mind around it.

NARRATOR: Eventually, Pascal accepted Fermat's solution, as did others, and today, that exchange of letters is regarded as the birth of modern probability theory.

KEITH DEVLIN: People realized the future wasn't blank. You didn't know exactly what was going to happen, but you could calculate with great precision what the likelihood of things happening were. You could make all of the predictions we make today and take for granted. You could make them using mathematics.

NARRATOR: It was a fundamental insight and one of the doors that led to the modern world. Inherent in all our attempts to predict the future, from the stock market to insurance to web retailers trying figure out what you might buy next, is the idea that with the right data the likelihood of future events can be calculated. In fact, one of the great success stories in the science of prediction yields a forecast that many of us check every day, to answer the question, "Do I need an umbrella or a storm shelter?"

The Hurricane season of 2017 will be remembered for its ferocity and destruction.

NEWSCASTER: …the strongest ever on record.

CARMEN YULÍN CRUZ (Mayor of San Juan, Puerto Rico): The Puerto Rico and the San Juan that we knew yesterday is no longer there.

NARRATOR: The storms formed and gained in intensity with surprising speed, leaving forecasters to emphasize the uncertainty of where they might land.

NEWS ANCHOR: Maria is now a Category 3 hurricane.

MALE METEOROLOGIST: Exactly what it's going to look like, we just don't know yet.

FEMALE METEOROLOGIST: There's still great uncertainty.

NARRATOR: In weather forecasting, the only certainty is uncertainty.

LOUIS UCCELLINI (Director, National Weather Service): One thing we know for sure is we cannot give you a perfect forecast. Given the nature of how we make a forecast, from the global observations to equations running on computers, stepping out in time, I don't think there'll ever be a perfect forecast.

NARRATOR: To fight that uncertainty, forecasters have turned to more data, lots more data. Here at the National Weather Service Baltimore/Washington office, meteorologist Isha Renta, prepares for the afternoon launch of a weather balloon.

Twice a day, every day, all across the U.S. and around the world, at the very same time, balloons are released to take a package of instruments up through the atmosphere. It transmits readings about every 10 meters in height.

ISHA RENTA (National Weather Service): It's my understanding that they have developed other ways to get vertical profiles of the atmosphere, but still, the accuracy and the resolution that the weather balloon will give you is a lot higher, so that's why we still depend on them.

NARRATOR: The data from Isha's weather balloon ends up at the National Center for Environmental Prediction in College Park, Maryland, the starting point for nearly all weather forecasts in the United States. Her information becomes one drop in a very large bucket of data taken in each day.

GREG CARBIN (National Weather Service): Temperature, pressure, wind speed and direction in the atmosphere…tens of thousands of point observations are used every hour of every day as kind of a starting point. That's where we begin the simulation, from those observations.

NARRATOR: It all becomes part of a process, which has been described as one of the great intellectual achievements of the 20th century, numerical forecasting.

The first step in numerical forecasting is to break a nearly 40-mile-thick section of the atmosphere into a three-dimensional grid. Then, each grid point is assigned numerical values for different aspects of the weather, based on the billions of measurements continually pouring in to the weather service.

GREG CARBIN: So, you'll have an understanding of temperature, pressure and values in terms of wind and wind direction, at each one of these points, within this grid that covers the globe.

NARRATOR: From there, equations from the physics of fluids and thermodynamics are applied to each grid point.

GREG CARBIN: Not only do you change the characteristics at each grid point, but the changes at those grid points affect neighboring grid points, and the neighboring grid points affect other grid points, and so you evolve the atmosphere through time in this three-dimensional space.

NARRATOR: And remarkably, the approach works.

GREG CARBIN: It's amazingly crazy that it works. It's remarkable how well it does work, given that we're making grand assumptions about the initial state, so to speak, or the beginning state of any forecast.

NARRATOR: And that initial state turns out to be absolutely crucial. In the early days of numerical forecasting, it seemed like a definitive weather prediction extending far into the future might soon be possible, but research in the 1960s showed that slight errors in measuring the initial state grow larger over time, leading predictions astray.

LOUIS UCCELLINI: So, as you step ahead in time, the forecast will become less accurate.

NARRATOR: Ironically, that sensitivity to initial conditions also suggested a way to improve the accuracy of numerical weather forecasts. Thanks to the power of today's computers, forecasters can run their weather simulations not once but several times. For each run, they slightly alter the initial conditions to reflect the inherent error built into the measurements and the uncertainty in the model itself. The process is called ensemble forecasting, and the results are called spaghetti plots.

GREG CARBIN: We're looking at about a hundred different forecasts, here, for the jet stream at about six days ago. We have the actual jet stream drawn as the white line on here, today, and you can see how most of the forecasts six days ago were well north of where we actually find the jet stream this morning.

And then we'll go to a five-day forecast and a four-day forecast and a three-day forecast, and then down to two days and the day of the event. And you can see how the model forecasts all converge on that solution, which is what you would expect them to do.

But when you go back to the six-day forecast, you can see the large spread in the ensemble solutions for this particular pattern.

NARRATOR: In the end, meteorologists turn to statistical tools to analyze weather forecasts, and often use probabilities to express the uncertainty in the results. That's the "40 percent chance of rain" you might hear from your local forecaster.

GREG CARBIN: Meteorology is probabilistic at its very core, and I believe that the general public knows that there is uncertainty inherent in everything we say, but we're getting better. Our forecasts for three days out now are as accurate as one-day forecasts were about 10 years ago, and this continues to improve. So, the science has advanced beyond my wildest dreams, and it's hard to even see where it might go in the future.

NARRATOR: Just like in meteorology, for the rest of science, the ultimate test of our understanding is our ability to make accurate predictions. On a grand scale, scientific theories like Einstein's general theory of relativity have to make predictions that can be tested to become accepted. In that case, it took four years before a full solar eclipse revealed that light passing near the sun curved just as predicted by Einstein's theory, the first proof he was right that the sun's mass distorts the fabric of "space-time," what we experience as gravity.

In fact, the scientific method demands a hypothesis, which leads to a prediction of results from a carefully designed experiment that will test its claim. Surprisingly, it wasn't until the 1920s and '30s that a British scientist, Ronald A. Fisher, laid out guidelines for designing experiments using statistics and probability as a way of judging results.

As an example, he told the story of a lady who claimed to taste the difference between milk poured into her tea and tea poured into her milk. Fisher considered ways to test that. What if he presented her with just one cup to identify?

REBECCA GOLDIN (George Mason University): If she got it right one time, you'd probably…, "Well, yeah, but she, she had a 50-50 chance, just by guessing, of getting it right." So, you'd be pretty unconvinced that she has the skill.

NARRATOR: Fisher proposed that a reasonable test of her ability would be eight cups, four with milk into tea, four with tea into milk, each presented randomly. The lady then had to separate them back into the two groups.

Why eight? Because that produced 70 possible combinations of the cups, but only one with them separated correctly. If she got it right, that wouldn't "prove" she had a special ability, but Fisher could conclude if she was just guessing, it was an extremely unlikely result, a probability of just 1.4 percent.

Thanks mainly to Fisher, that idea became enshrined in experimental science as the "p-value", "p" for "probability." If you assume your results were just due to chance, that what you were testing had no effect, what's the probability you would see those results or something even more rare?

REBECCA GOLDIN: If you assume that there's a process that is completely random, and you find that it's pretty unlikely to get your data, then you might be suspicious that something is happening. You might conclude, in fact, that it's not a random process, that this is interesting to look at what else might be going on, and it passes some kind of sniff test.

NARRATOR: Fisher also suggested a benchmark; only experimental results where the p-value was under .05, a probability of less than 5 percent, were worth a second look. In other words, if you assume your results were just due to chance, you'd see them less than one time out of 20; not very likely.

He called those results "statistically significant."

JORDAN ELLENBERG: "Statistically significant," now this is a terrible word. It could be quite insignificant. You could be detecting a very, very, very small effect, but it would be called, in the mathematical lingo, "significant."

NARRATOR: Since Fisher's day, p-values have been used as a convenient yardstick for success by many, including most scientific journals. Since they prefer to publish successes, and getting published is critical to career advancement, the temptation to massage and manipulate experimental data into a good p-value is enormous. There's even a name for it: "p-hacking."

REGINA NUZZO: P-hacking is when researchers consciously or unconsciously guide their data analysis to get the results that they want, and since .05 is kind of the, the bar for being able to publish and call something real and get all your grant money, it's usually guiding the results so that you arrive at that p of .05.

NARRATOR: How much p-hacking really goes on is hard to know. What may be more important is to remember what was originally intended by a p-value.

JORDAN ELLENBERG: The p-value was always meant to be a detective, not a judge. If you do an experiment and find the result that is statistically significant, that is telling you, "that is an interesting place to look and research and understand further what's going on," not "don't study this anymore because the matter is settled."

NARRATOR: In a sense, a low p-value is an invitation to reproduce the experiment, to help validate the result, but that doesn't always happen. In fact, there are few career incentives for it. Journals and funders prefer novel research. There is no Nobel Prize for Replication.

Another solution to p-hacking and the overemphasis on p-values, may simply be greater transparency.

TALITHIA WILLIAMS: More and more, what people are doing is publishing their data, and so, it's becoming harder and harder to lie with statistics, because people will just probe and say, "Well, give me the set you analyzed, and let me see how you got this result."

NARRATOR: Statistics continues to play a fundamental role in science, but really, anywhere data is collected, you'll find statisticians are at work, looking for patterns, drawing conclusions and often, making predictions, though they don't always work out.

The Presidential Election of 2016 was a tough one for pollsters, the folks who conduct and analyze opinion polls. Hillary Clinton was the overwhelming favorite to beat Donald Trump right up to Election Day.

MALE POLL ANALYST 1: Trump is headed for a historic defeat.

MALE POLL ANALYST 2: He's going to lose by a landslide.

FEMALE NEWS COMMENTATOR: I think that she's going to have a very good night.

NARRATOR: The New York Times put Trump's chances at 15 percent. One pollster, on election night, gave him one percent.

NEWS ANCHOR: A projection of a 99 percent chance of winning, is that correct?

MALE NEWS COMMENTATOR: The odds are overwhelming of a Hillary Clinton victory on Tuesday. I would be very surprised if anything else happened.

NARRATOR: And, of course, Trump won and Clinton lost.

MONA CHALABI: People were repeatedly told Hillary Clinton is the candidate most likely to win this election, and she didn't. And I think that really left people feeling almost lied to, almost cheated by these numbers.

NARRATOR: So what was going on with the polls? And exactly how do people predict elections? One way, is just by asking people who they'll vote for.

PATRICK MURRAY (Monmouth University Polling Institute): One of the great things about polling is that we don't have to talk to everybody in order to find out what the opinions are of everybody. We can actually select something called a sample.

NARRATOR: Sampling is a familiar idea. To see if the soup is right, you taste a teaspoon, not the whole pot. To test your blood at the doctor, they typically draw less than an ounce, they don't drain you dry. But in many circumstances, finding a representative sample is harder than it sounds.

TALITHIA WILLIAMS: Let's suppose that this is the population of about 1,000 people in a city and we want to know, "Are people for or against converting a park into a dog park?"

And so these green beads, down here, are going to represent people who are for it, and the red beads are folks who are against it.

NARRATOR: Talithia's first step is to take advantage of an unlikely ally in sampling, "randomness."

PATRICK MURRAY: The beauty of randomization is that as long as you throw everything from your population into one pot and randomly pull it out, you can be sure that you're within a certain percentage point of the actual value that's in that pot.

NARRATOR: So the plan is to randomly sample the beads, but how many? That depends on how much accuracy Talithia wants. One measure is "the margin of error," the maximum amount the result from the sample can be expected to differ from that of the whole population. It's the plus or minus figure, often a percentage, you see in the fine print in polls.

But there's also "confidence level." Inherently there is uncertainty that any sample really represents a whole population. The confidence level tells you how sure you can be about your result. A 90 percent confidence level means, on average, if you ran your polling or sample 100 times, 90 of those times, it would be accurate within the margin of error.

Talithia knows the total number of beads is a thousand. And she's settled on a plus or minus five percent margin of error at a 90 percent confidence level. That means she needs a sample size of at least 214 beads.

TALITHIA WILLIAMS: Here are the results. We got a 103 red beads and 111 green, so about 48 percent of our population would vote against and about 52 percent would vote for. Now, remember that margin of error that we talked about, that plus or minus five percent? So, once you take that into account those numbers really aren't that different at all. So, I guess you could say this puppy is too close to call.

NARRATOR: In fact, within the margin of error, the "stats" got it right. There were an equal number of red and green beads in the jar.

While the sampling error built in from the mathematics can be quantified, there are other errors that can't.

PATRICK MURRAY: The other parts of the error-how we word our questions, how the respondents feel that day, the responsibility to predict what their behavior is going to be somewhere down a line-all those sources of error are something that we can't calculate.

NARRATOR: And there's a catch to random sampling for polls, too.

A few decades back when just about every household had a landline, finding a random sample meant randomly dialing phone numbers.

PATRICK MURRAY: Into the 1970s and the 1980s, we were getting, you know, 90 percent response rates. If we randomly chose a phone number, somebody on the other end of that phone would pick it up and would do the interview with us.

NARRATOR: Those days are over. Thanks to Caller I.D. and answering machines, people often don't answer their landlines anymore, if they even have one. Response rates are way down.

NATE SILVER (FiveThirtyEight): Only about 10 percent of people respond to polls. So, you're kind of crossing your fingers and hoping the people you reach are the same as the ones that are actually going to vote. For example, we found in 2016, pollsters were not reaching enough white voters without college degrees.

DICK DE VEAUX: If there's a bias in the data, you cannot recover from it, as we've seen from some recent elections.

NARRATOR: After Donald Trump's surprise win, many wondered if polling was broken. But if you look at the polls themselves, and not the headlines, on average, polls on the national and state level were off by historically typical amounts.

NATE SILVER: So, when I hear people say, "Oh, the polls were wrong," then it probably reflects people's interpretations about the polls being wrong, where people, for various reasons, looked at the polls and they said, "These numbers prove to me that Clinton's going to win." When we looked at the polls, we said, "These numbers certainly make her a favorite, but they point toward an election that's fairly close and quite uncertain, actually.

NARRATOR: And in 2016, the U.S. presidential election was just that close.

Trump's victory depended on fewer votes than the seating capacity of some college football stadiums, spread across three states: Pennsylvania, Wisconsin and Michigan. And there were some problems with the polls in those states that led to underestimating Trump's support, according to a postmortem by a consortium of pollsters.

Nate Silver, the founder of the website, FiveThirtyEight, is one of the biggest names in polling, even though he doesn't generally conduct polls.

NATE SILVER: Our job is to take other people's polls and to translate that in terms of a probability, to say, basically, whether…who's ahead, which is usually pretty easy to tell, but then how certain or uncertain is the election is the more difficult part.

NARRATOR: Like a meteorologist, Nate presents his predictions as probabilities. On the morning of Election Day 2016, he gave Clinton about a 70 percent chance of winning and Trump about a 30 percent chance. That's like rolling a ten-sided die, with seven sides that are Clinton and three that are Trump.

NATE SILVER: People who make probabilistic forecasts, they're not saying that politics is intrinsically random, they're saying that we have imperfect knowledge of it, and that if you think you can be more certain than that, you're probably fooling yourself based on how accurate polls…other types of political data are.

NARRATOR: Ultimately interpreting a probability depends on the situation. While a 30 percent chance might seem slim, if you learned the flight you were about to board crashed three out of every 10 trips, would you get on the plane?

FLIGHT ATTENDANT: (Dramatization) As this plane only makes it to its destination seven out of 10 times, please pay attention to our short safety briefing.

NARRATOR: Or if a weather forecaster said there's only a 30 percent chance of rain and then it rained, would you care?

JORDAN ELLENBERG: If it does rain, no one demands to know, "Why did it rain?" We have to get to the bottom of this. We can say, like, "It just did. It might have rained, it might not have rained, as it happened, it did." I do think there's a certain natural resistance to seeing things that maybe we care about more than whether it's going to rain or not, like elections, in that same way.

NARRATOR: As 2016 shows, predicting who will win the U.S. presidency, a one-time contest between two unique opponents, is far from easy. But in at least one field, there are literally decades of detailed statistics on how the contests played out: baseball.

Baseball has always been a game of numbers, box scores, batting averages, E.R.A.s, R.B.I.s…But while "stats" have always been part of baseball, in the last 20 years, their importance has skyrocketed due to sports analytics, the use of predictive models to improve a team's performance.

BILLY BEANE (Executive Vice President, Baseball Operations, Oakland Athletics): To some extent, every business, not just sports, is really trying to predict the next event, whether in Wall Street or if you're in the tech business, what's the new, new thing? And for us, it's future player performance.

NARRATOR: Billy Beane was one of the first to adopt the quantitative approach in the late '90s, when he was the general manager of the Oakland Athletics. Stuck with the low payroll of a small market team, he abandoned decades of subjective baseball lore and committed the organization to using statistical analyses to guide the team's decision-making.

BILLY BEANE: It very much became a mathematical equation, putting together a baseball team.

NARRATOR: Billy's stats-driven approach started to attract attention when the Oakland A's finished in the playoffs in four consecutive years and set a league record with 20 wins in a row.

Then it was lionized, and even given a name, in a best-selling book and movie, Moneyball.

Brad Pitt plays Billy.

BRAD PITT (As Billy Beane in Moneyball): If we win on our budget with this team, we'll have changed the game.

NARRATOR: While "moneyballing" didn't lead to a league championship for the Oakland A's, it did change the game. Today, every major league baseball team has a sports analytics department, trying to predict and enhance future player performance through data, analyzing everything from the angle and speed of the ball coming off the bat to which players should be brought up from the minor leagues or traded.

BILLY BEANE: I'll never pretend to be a math whiz. I just understand its powers and its application. When you run a major league baseball team, which is a great job and every kid who dreams of doing it, I can tell you it's everything you've thought of. But when they ask me, "What do I have to do to do that?" My, my answer is always the same, I say, "Go study and get an A in math."

NARRATOR: While sports analytics has transformed baseball, moneyballing has found its way into many unrelated fields. Proponents of data-driven decision-making and prediction have applied the approach to areas as diverse as popular music and law enforcement.

Moneyballing has been enabled by the vast amounts of information gathered through the internet, so-called "big data." Our current output of data is roughly 2.5 quintillion bytes a day.

But what about the opposite situation, when there's very little data, yet actions need to be taken, for example, when searching for people lost at sea? How do you even begin to predict where they might be?

The U.S. Coast Guard's Sector Boston Command Center: from this secure set of rooms, the Coast Guard coordinates all operations in the Boston area, including national security, drug enforcement and search and rescue.

BRIAN FLEMING (United States Coast Guard Sector Boston): Good morning, Coast Guard Sector Boston Command Center, Mr. Fleming speaking.

PHONE CALLER: Good morning, sir. My name is…

NARRATOR: A caller reports that a friend went paddleboarding earlier in the morning, but is now overdue. The Coast Guard initiates a search with a 45-foot response boat…


NARRATOR: …out of Boston Harbor.


NARRATOR: Unfortunately, a paddlecraft in trouble has grown increasingly common.

EDWARD NYGREN: You are required to have a lifejacket on. A reason for that is in 2015, I think we had 625 deaths nationwide, and a number of those people that were recovered were recovered without a lifejacket.

NARRATOR: The Command Center also launches another boat, out of Station Point Allerton, in Hull.

COAST GUARD CREW MEMBER: Short tack disconnected.

LUKE SCHAFFER: Stand clear of lines.

NARRATOR: The caller said the missing person typically paddled between Nantasket Beach and Boston Light, about three miles away. But with all the unknowns-where he got into trouble and how he may have drifted-the search area could be as large as 20 square miles.

Search and rescue operations are often based on unique circumstances and require action despite incomplete information. To attack problems like that, statisticians turn to an idea that originates with an 18th-century English clergyman interested in probability, Thomas Bayes.

Imagine you are given a coin to flip, and you want to know if it is fair, 50-50 heads or tails, or weighted to land more on heads than tails. The traditional approach in statistics and science doesn't assume either answer and uses experiments to find out. In this case, that involves flipping the coin a lot. Or you could approach the problem like a Bayesian.

Unlike traditional statistics, that means starting with an initial probability based on what you know. In this case, all the coins you've ever come across in a lifetime of flipping coins have been fair; seems likely this one is probably fair, too. Next, you also flip the coin, updating the probability as you go. Let's say it starts off with several heads in a row. That might make you wonder, increasing your probability estimate that it's weighted. But as you flip it more times, those start to look like chance. In the end, your best estimate is that it is probably a fair coin, but you are open to any new information, like it belongs to your uncle the con man, Crooked Larry.

EDWARD NYGREN: Sector 659. Our estimated time of arrival is 1158.

NARRATOR: Bayesian inference creates a rigorous mathematical approach to calculating probabilities based on new information, and it sits at the heart of the Coast Guard's Search and Rescue Optimal Planning System, SAROPS.

BRIAN FLEMING: He's been missing since 7:30 this morning, so I'm going to go ahead and do a SAROPS drift.

NARRATOR: SAROPS takes information about the last-known position of the object of the search…

BRIAN FLEMING: What's the direction of the wind?

NARRATOR: …along with the readings of currents and winds, and combines them with information about how objects drift in the water to simulate thousands of possible paths the target may have taken. These get processed into probabilities, indicated by color and turned into search plans to be executed.

LUKE SCHAFFER: SAROPS is really a workhorse for the Coast Guard. It does a lot of the calculations for us and provides us with a lot of valuable search patterns and search-planning options.

BOATER CALLING IN: I thought he was pretty far offshore, but you know, he said he was okay, so I kept going.

NARRATOR: Word of the search has spread. A boater calls in a sighting from earlier in the day.

BRIAN FLEMING: What I did is I went in and put that information into SAROPS, and it changed everything.

NARRATOR: SAROPS quickly recalculates all the probabilities and generates a new search plan. The area has shifted about three miles farther out to sea.

EDWARD NYGREN: We are on scene, commencing search pattern now.

LUKE SCHAFFER: Keep a good look out.


LUKE SCHAFFER: We're assessing the situation on scene.

Any object you see in the water, please take a closer look at.

Paddleboarder, portside.

VOICE OVER RADIO: Roger. We have located a paddleboarder with 01 person on board.

UNITED STATES COAST GUARD CREW: Off the port corridor.




NARRATOR: As it turns out, the search has been a drill.

Hours earlier, the paddleboard was placed in the water by another Coast Guard ship and allowed to drift. The instruments mounted on it are there to measure wind and record the path it's taken, information that will later be used to tweak the drift simulations in SAROPS, though the system performed quite well today.

LUKE SCHAFFER: The object was right in the middle of our search patterns. So SAROPS was actually dead-on accurate in predicting where we needed to search to find the missing paddleboarder.

BRIAN FLEMING: To be able to call a family and say your family and friends is coming home is absolutely a call that all of us should have the chance to make. And fortunately, because of stuff like this, we do get to make that call.

NARRATOR: The computational complexity of updating probabilities held the Bayesian approach back for most of the 20th century. But today's computing power has unleashed it on the world. It's in everything from your spam filter to the way Google searches work to self-driving cars.

Some even find in the Bayesian embrace of probability, similarities to how we learn from experience. And they've built it into computers, making it part of a powerful new force: machine learning.

SEBASTIAN THRUN (Udacity/Stanford University): In the past, when we programmed computers, we tended to really write down, in excruciating detail, a set of rules that would tell the computer what to do in every single contingencies.

NARRATOR: But there's another approach: to treat the computer like a child learning to ride a bike. No one teaches a child to ride using a set of rules. There may be some tips, but ultimately, it is trial and error, experience that's the instructor.

SEBASTIAN THRUN: The new thing, the new kid on the block is machine learning, specifically something called "deep learning." Here, we don't inform the computer of the rules but through examples. So, similar to like a small child that falls down and learns from this experience, we just let the computer learn from examples.

NARRATOR: Suppose you want to train a computer to recognize pictures of cats. By scanning through thousands of labeled pictures, some cats, some not, the computer can develop its own guidelines for assessing the probability that a picture is a cat.

And these days, computers are doing far more than just looking for cats.

SEBASTIAN THRUN: Some of the best computers, now, can learn how to beat the world's best Go champion; or to discover documents in stacks of documents, work that highly paid lawyers normally do; or diagnose diseases.

At Stanford, we recently ran a study to understand whether a machine-learning algorithm can compete with top-notch Stanford-level board-certified dermatologists in spotting things like skin cancer. And lo and behold, we found that our machine-learning algorithm, our little box, is as good as the best human doctor in finding skin cancer.

NARRATOR: That raises a lot of questions. Should we trust software over our doctors? Or are diagnostic programs like Sebastian's the intelligent medical assistants of tomorrow, a new tool but not a substitute?

And there are other concerns. If you asked a person riding a bike exactly how they do it, they'd be hard pressed to put it into words. The same is true with so-called "black box" machine learning applications like Sebastian's: no one, including Sebastian, knows how it detects skin cancer. Like the bicyclist, it just does, which may be fine for diagnostic software, but not for other aspects of medicine, like treatment decisions.

DANIELA WITTEN (University of Washington): If what you're doing is deciding what dose of chemotherapy to give a patient, I think most people would be uncomfortable with that being a black box. People would want to understand where those predictions are coming from.

NARRATOR: The same can be true for evaluating who should get a home loan or who should get fired from their job for poor performance or who gets paroled, all situations in which "black box" machine learning software are in use.

DANIELA WITTEN: These are algorithms that can have a big effect on people's lives. And we have to understand, as a society, what is going into those algorithms and what they're based on, in order to make sure that they're not perpetuating social problems that we already have.

NARRATOR: We live in an age when the fusion of data, computers, probability and statistics grants us more predictive power than we've ever known before. We can see the tangible benefits and some of the dangers, while also wondering where this will all go.

JORDAN ELLENBERG: We're really seeing a new science of statistics developing under our feet. That's exciting. And I think it must a little bit like what it was like when the theory of probability was first being developed by Pascal and Fermat, and people around them, that people were sort of saying, like "My god, these are questions that mathematics can really have something to say about." I think that must have been what it was like when statistics, in its traditional form, was being developed in the first part of the 20th century. And, suddenly, people were just asking whole new kinds of questions that they couldn't even have approached before, and I think we're having another moment like that now.

NARRATOR: While tomorrow will always remain uncertain, mathematics will continue to guide the way through the power of probability and Prediction by the Numbers.

Broadcast Credits

Daniel McCabe
Chris Schmidt
Cara Feinberg
Stephen McCarthy
Daniel McCabe
Jaro Savol
Ekin Akalin
Tony Kandalaft
Cathleen O'Connell
Jay O. Sanders
Austin de Besche
Nikki Bramley
Bob Burns
Andria Chamberlin
Aaron Frutman
Gary Henoch
Dana Kupper
Gilberto Nobrega
Jaro Savol
Zach Stauffer
Daniel Traub
Brett Wiley
Craig Capello
Jesse Kaltenbach
Zubeyir Mentese
Jose Araujo
Glenn Berkovitz
Steve Bores
Adriano Bravo
Dan Holden
Alex Jennings
Richard Pooler
Paul Schmitz
Tim Clarke
Hazim Muftic
Farley Crawford
Rod Fountain
Kat Williams
Jami Tennille
Brenda Coffey
Rafael Jaen
Pellet Productions
David Bigelow
Heart Punch Studio
Professor Alyssa Goodman
HarvardX/Harvard University
U.S. Coast Guard/DVIDS
ESA/ATG medialab
Fine Arts Museums of San Francisco
Getty Images
NASA/Goddard Space Flight Center, NOAA, Lockheed Martin
Science Photo Library
Shutterstock, Inc.
Joseph Blitzstein
Greg Carbin
Rebecca Goldin
Giles Hooker
Jeremy Klavans
Walter Piegorsch
Clifford Young
Art Allen
Zak Basch
Emery Brown
Emerson Costume Shop
Emerson Prop Warehouse
Anthony Deprimo
Matt Gould
Peter Kleppin
Brett Kuprel
Sheri Leone
Harvey Mudd College
Amanda Henderson
Tina Johnsen
Las Vegas Film Bureau
Linda Marshall
Lizzie Miller
Terry Moore
Monmouth University
Nichols House Museum
NOAA/National Weather Service
Oakland Athletics
Orange County Fair
Kimiko Peterson
Mike Petriello
Paul Roszkowski
Karen Ruccio
University of Wisconsin-Madison
United States Coast Guard
yU + co.
Walter Werzowa
John Luker
Musikvergnuegen, Inc.
Ray Loring
Rob Morsberger
The Caption Center
Lindsey Denault
Kristine Allington
Tim De Chant
Cory Allen
Eileen Campion
Eddie Ward
Jennifer Welsh
Dante Graves
Lindsey Chou
Linda Callahan
Vanessa Ly
Ariam McCrary
Sarah Erlandson
Janice Flood
Susan Rosen
Lauren Miller
Brian Kantor
Michael H. Amundson
Kevin Young
Nathan Gunner
Caitlin Saks
David Condon
Pamela Rosenstein
Elizabeth Benjes
Evan Hadingham
Melanie Wallace
Laurie Cahalane
Julia Cort
Paula S. Apsell

A NOVA Production by Big House Productions for WGBH Boston.

© 2018 WGBH Educational Foundation

All rights reserved

Additional Material © 2018 WGBH Educational Foundation

All rights reserved

This program was produced by WGBH, which is solely responsible for its content.

Original funding for this program was provided by Draper, the David H. Koch Fund for Science, the Simons Foundation, Margaret and Will Hearst and the Corporation for Public Broadcasting.


Image credit (dice)
© John Lund/Getty


Edward Nygren
USCG Sector Boston
Billy Beane
Exec. VP, Baseball Operations, Oakland Athletics
Greg Carbin
National Weather Service
Mona Chalabi
The Guardian US
Richard De Veaux
Williams College
Keith Devlin
Stanford University
Jordan Ellenberg
University of Wisconsin-Madison
Brian Fleming
USCG Sector Boston
Rebecca Goldin
George Mason University
Patrick Murray
Monmouth University Polling Institute
Regina Nuzzo
Gallaudet University
Isha Renta
National Weather Service
Luke Schaffer
USCG Station Point Allerton
Nate Silver
Louis Uccellini
Director, National Weather Service
Liberty Vittert
University of Glasgow
Talithia Williams
Harvey Mudd College
Daniela Witten
University of Washington

Preview | 0:30

Full Program | 53:06

Full program available for streaming through

Watch Online
Full program available