Visit Your Local PBS Station PBS Home PBS Home Programs A-Z TV Schedules Watch Video Support PBS Shop PBS Search PBS
 

LESSON: SAMPLING BIAS AND THE CALIFORNIA RECALL
Author Ryan Martine is a math educator at Preston Junior High School in Fort Collins, Colorado

Class Type: Approximately 9th grade level 2 integrated class (geometry level in a traditional program); appropriate for algebra I students as well.

Time: One 90-minute day or two 45 minute periods.

Objectives
Upon completing this lesson, students will be able to identify and differentiate between types of political samples, as well as select and use statistical and visual representations to describe a list of data. Furthermore, students will be able to identify sources of bias in samples and find ways of reducing and eliminating sampling bias.

Materials
1. Class set TI-83 or equivalent calculators. (If unavailable, it is possible to complete this using a spreadsheet program on a computer; the box-and-whisker plot is usually not an option in a spreadsheet program, however.)
2. Random Rectangles Sheet of 100 rectangles (printer-friendly pdf version)

Correlation to National Standards

Background

Californians are unhappy!! Many residents of California have become fed up with Governor Gray Davis, who was re-elected in November, 2002. Starting this past February, residents launched a recall petition, with the purpose of gathering enough signatures to force a special recall election, an election where citizens can force the Governor to leave office.

In July, these citizens, called the Recall Gray Davis Committee, turned in their petitions. The needed about 900,000 valid signatures. They obtained over 1.6 million signatures. Because the Committee obtained a sufficient number of signatures, an election date of October 7th, 2003, was set for this special recall election. For more information on the recall, see California Governor Gray Davis Faces Recall Vote at http://www.pbs.org/newshour/extra/features/july-dec03/davis_7-30.html.

During the season leading up to an election, news organizations attempt to quantify popular support of the candidates by employing polling organizations to assess the popular opinion. This is when you'll hear things like, "According to a recent poll, 48% of people support the president…"

The first major poll in the California recall race was conducted by the Gallup organization, in conjunction with USA Today and CNN on August 7-10. It polled 801 registered voters across California and asked whom they supported in the recall effort. A second major poll was released by the Los Angeles Times one week later. For more information, see The Recall of Gray Davis at http://www.pbs.org/newshour/local/davis_recall/.

Procedure

1. Give students the following background about polls. Use the definition of terms at the end of this lesson if necessary.

How are polls conducted?

The organization conducting the poll, such as Gallup, will buy a listing of phone numbers from a firm that keeps a list of all active phones in the region (or nation if it is a national poll). The phone numbers purchased will be proportionally divided among all counties within the polling region, with business phones eliminated. The organization will call the selected numbers and obtain opinion and demographic information about the person. After these are collected, the organization will ensure that men and women are equally represented, as well as racial, age, and educational groupings. Any imbalances are corrected by giving more mathematical weight to an underrepresented group. Formulas show that a poll of 600 randomly selected people out of 200 million will be accurate to within 4 percentage points of the true opinions 95% of the time. Checking this information against election results shows that this indeed is true. (Source: Jim Norman, column in USA Today, 9/2/98)

There are other types of polls, but their accuracy can be suspect. Many Web sites will have a survey that asks visitors to record their opinion, and then tabulates the results for those interested. However, these are very inaccurate, as only those people who feel strongly and/or happen by the Web site end up voting. Furthermore, most of these polls allow one person to vote multiple times, skewing results even more. Similarly, radio and television talk shows have call-in polls asking for people's opinion. Again, since only those who feel strongly will call, (and frequently only those whose opinion tends to match the show,) the results are not an accurate measure of the entire population.

The formal, random polls are more significant in this election than normal. On September 17th, there will be a televised debate between the candidates for the governor's seat. To be included in the debate, a candidate must have at least 10% of the vote in one of the three major polls conducted state-wide for the election. If California used a poll that was inaccurate, candidates could be unfairly excluded from the debate, or undeserving candidates included. So a candidate has a large stake in seeing that the polls are indeed accurate, so that s/he can be included to this debate and gain greater publicity.

For further information, visit the following links:
NewsHour main page about the Davis recall: http://www.pbs.org/newshour/local/davis_recall/
135 candidates certified: http://www.pbs.org/newshour/updates/recall_08-14-03.html
Televised debate to exclude Davis, most candidates: http://www.pbs.org/newshour/media/media_watch/july-dec03/recall_08-15-03.html
Rules and info about the recall: www.RecallGrayDavis.com/QA.asp

Definition of terms

  • Bias (also sampling bias): a sample that has skewed results caused by over representing or under representing a portion of the population.

  • Poll: an estimate of public opinion based on a representative sample.

  • Sample (also sample population): the process of selecting a few members of a population to represent the opinions or beliefs of the whole. This can be done several ways: cluster, convenience, random, stratified random, and systematic. A cluster sample is taking a defined group out of the whole (ex: a neighborhood out of a city). A convenience sample is taking those members of the population that are easiest to the researcher (ex: all of one's coworkers). A random sample is a commonly used sample, where everyone in a population has an equal chance of being selected. A stratified random sample is the most commonly used sample, where the population is divided into distinct categories, and then some are randomly selected from each (ex: dividing the nation into racial categories, and taking 100 people from each). A systematic sample is taking an organized list of a population, and taking every 10th (etc.) person from it.

  • Statistical representation: showing or displaying the opinions or beliefs of a population using mean, median, quartiles, etc. Typically purely numerically based.

  • Visual representation: showing or displaying the opinions or beliefs of a population using graphical means. Common uses are scatter plots for two variable data and box-and-whisker plots for one variable data.

  • Box-and-whisker plot: a graph of data where the first ¼ of the information is on a line (whisker), the middle half is shown with a box (the median is also usually marked with a vertical line), and the upper ¼ of the information is in another whisker. (see picture)

  • Scatter plot: a graph of data where each individual point is an ordered pair, and the data is many individual points scattered throughout the graph.
    Lower Quartile (first quartile): the number in a list of data which has ¼ of the data below it, and ¾ above it. The median of the first half.
    Upper Quartile (third quartile): the number in a list of data which has ¾ of the data below it, and ¼ above it. The median of the second half.

  • Mean: the additive average. Calculated by adding up all numbers in a list and dividing by the total number of numbers.

  • Median: the middle number in a set of data

2. Compare the methodology of a poll with that of a petition drive. How are the samples of the population different? How does that affect the results and accuracy of each? Ask the following questions:

  • What type of sample is a petition drive? (A typical phone poll is a stratified random sample of the population)
  • Which one is more representative of the entire population's opinion? Why?
  • What does that say about why there is a recall election, instead of the petition immediately ousting Gov. Davis?
  • Discuss Internet polls (whoever happens by the Web site chooses to vote; is this representative?)

3. After discussion, direct class into a sampling activity - Random Rectangles. Give students sheet of 100 rectangles of various sizes and instruct them to find the "average rectangle."

a. The first way to have students do this is by having students scan the sheet for 5 seconds and guess the average area (each box represents one square unit). Collect their guesses in list one of the calculator (or column one of the spreadsheet). To do so, on the TI-83 calculator, press STAT, choose EDIT. You should be able to enter in numbers now. If your lists have numbers in them before this, to clear, press 2nd, + (this is MEM), and choose option 4, ClrAllLists. Press ENTER.

b. Secondly, have each student in the class select five rectangles that appear to be representative of the whole, and find the mean average of the areas (ex: student chooses rectangles 26, 59, 74, 77, and 96. These have areas of 12, 8, 10, 3, and 18, respectively. The mean area would then be (12 + 8 + 10 + 3 + 18 = 51/5 rectangles) = 10.2. Collect these numbers in list 2 (just press a right arrow to get here).

c. Finally, have students use their random number generators (MATH button, choose PRB, and randInt(, ENTER. This randInt( should appear on the home screen. Complete the statement so that the screen reads randInt(1,100,5) (tells the calculator to choose 5 random numbers between 1 and 100.) The 5 numbers represent 5 rectangles. Students should find those five, determine each of the 5 rectangle's area, and calculate the mean as before. Enter the class data into list 3.

**Teacher note: when doing this in class before, I have often had several kids all get the same 5 "random" numbers from the calculator. I usually just have students check with a few of their neighbors, and have any that have repeated selections tell the calculator to redo the 5 numbers.**

4. Once the information is gathered, have students press STAT, choose CALC, 1-Var Stats, ENTER. The calculator then calculates the mean, (labeled as x with an overline, called "x-bar.") standard deviation (which I don't use), median (Med), minimum (minX), lower quartile (Q1), upper quartile (Q3), and maximum (maxX). To see all this information, use your up and down arrows. Students should record this information for later use, including graphing the box-and-whisker plots.

5. It is possible to have the calculator complete the box-and-whisker plots, if desired. To do this, press 2nd, Y= (which is STAT PLOT), ENTER, turn plot 1 on, choose the 5th type by arrowing to the right, and have alternately L1, L2, and L3 as the Xlist. (To change the list, press 2nd, 1 for L1, etc.)

Questions to ask:

1. List one was taken from your scanned guess of an average rectangle? Which does that best represent: an Internet poll, a collection of signatures, or a formal poll such as we read about? Why?

2. List two was taken from your five chosen rectangles. Which of our three does that best represent - Internet poll, a collection of signatures, or a formal poll?

3. List three, from the calculator's random list of numbers, represents which of the three options best? Explain.

4. Do you notice any difference in the box and whisker plots, or the values of the mean and median in each list? Explain.

5. (Give students the actual average size of the rectangles, 7.42 = mean, 6 = median) How does the actual averages compare to the 3 samples we took? To your individual numbers? What patterns and differences do you notice? What might have affected these differences? How does that influence your reading of stories such as this one?

Assessment
Obtain a map of the United States with state boundaries. The student's job is to show how to select a random sample of states for the purpose of studying land use. How would you select the sample? Where might be a source of possible bias? If instead you were studying population trends, how could you select your sample?

Correlation to National Standards
NCTM Principles and Standards for School Mathematics
http://standards.nctm.org

Standards and Benchmarks (from National Council of Teachers of Mathematics)

The student should:

  • understand the differences among various kinds of studies and which types of inferences can legitimately be drawn from each;
  • know the characteristics of well-designed studies, including the role of randomization in surveys and experiments;
  • understand histograms, parallel box plots, and scatter plots and use them to display data;
    for univariate measurement data, be able to display the distribution, describe its shape, and select and calculate summary statistics;
  • use simulations to explore the variability of sample statistics from a known population and to construct sampling distributions;
  • evaluate published reports that are based on data by examining the design of the study, the appropriateness of the data analysis, and the validity of conclusions;

Source: Random Rectangles is an activity taken from Key Curriculum's Activity-Based Statistics, www.keypress.com. Some "questions-to-ask" may have been influenced by their questions, as well.

Author Ryan Martine is a math educator at Preston Junior High School in Fort Collins, Colorado.

To find out more about opportunities to contribute to this site, contact Leah Clapman at extra@newshour.org.

 
 


 



Copyright © MacNeil-Lehrer Productions All Rights Reserved