| |

LESSON:
SAMPLING BIAS AND THE CALIFORNIA RECALL Author
Ryan Martine is a math educator at Preston Junior High School in Fort Collins,
Colorado
Class
Type: Approximately
9th grade level 2 integrated class (geometry level in a traditional program);
appropriate for algebra I students as well. Time:
One 90-minute day or two 45 minute periods. Objectives
Upon completing this lesson, students will be able to identify and differentiate
between types of political samples, as well as select and use statistical and
visual representations to describe a list of data. Furthermore, students will
be able to identify sources of bias in samples and find ways of reducing and eliminating
sampling bias. Materials
1. Class set TI-83 or equivalent calculators. (If unavailable, it is possible
to complete this using a spreadsheet program on a computer; the box-and-whisker
plot is usually not an option in a spreadsheet program, however.) 2. Random
Rectangles Sheet of 100 rectangles (printer-friendly pdf version) Correlation
to National Standards Background Californians
are unhappy!! Many residents of California have become fed up with Governor Gray
Davis, who was re-elected in November, 2002. Starting this past February, residents
launched a recall petition, with the purpose of gathering enough signatures to
force a special recall election, an election where citizens can force the Governor
to leave office.
In July, these citizens, called the Recall Gray Davis Committee, turned in their
petitions. The needed about 900,000 valid signatures. They obtained over 1.6 million
signatures. Because the Committee obtained a sufficient number of signatures,
an election date of October 7th, 2003, was set for this special recall election.
For more information
on the recall, see California Governor Gray Davis Faces Recall Vote at http://www.pbs.org/newshour/extra/features/july-dec03/davis_7-30.html.
During
the season leading up to an election, news organizations attempt to quantify popular
support of the candidates by employing polling organizations to assess the popular
opinion. This is when you'll hear things like, "According to a recent poll,
48% of people support the president
" The
first major poll in the California recall race was conducted by the Gallup organization,
in conjunction with USA Today and CNN on August 7-10. It polled 801 registered
voters across California and asked whom they supported in the recall effort. A
second major poll was released by the Los Angeles Times one week later.
For more information, see The Recall of Gray Davis at http://www.pbs.org/newshour/local/davis_recall/.
Procedure 1.
Give students the following background about polls. Use the definition of terms
at the end of this lesson if necessary. How
are polls conducted? The
organization conducting the poll, such as Gallup, will buy a listing of phone
numbers from a firm that keeps a list of all active phones in the region (or nation
if it is a national poll). The phone numbers purchased will be proportionally
divided among all counties within the polling region, with business phones eliminated.
The organization will call the selected numbers and obtain opinion and demographic
information about the person. After these are collected, the organization will
ensure that men and women are equally represented, as well as racial, age, and
educational groupings. Any imbalances are corrected by giving more mathematical
weight to an underrepresented group. Formulas show that a poll of 600 randomly
selected people out of 200 million will be accurate to within 4 percentage points
of the true opinions 95% of the time. Checking this information against election
results shows that this indeed is true. (Source: Jim Norman, column in USA
Today, 9/2/98) There are other types of polls, but their accuracy
can be suspect. Many Web sites will have a survey that asks visitors to record
their opinion, and then tabulates the results for those interested. However, these
are very inaccurate, as only those people who feel strongly and/or happen by the
Web site end up voting. Furthermore, most of these polls allow one person to vote
multiple times, skewing results even more. Similarly, radio and television talk
shows have call-in polls asking for people's opinion. Again, since only those
who feel strongly will call, (and frequently only those whose opinion tends to
match the show,) the results are not an accurate measure of the entire population.
The formal, random polls are more significant in this election than normal.
On September 17th, there will be a televised debate between the candidates for
the governor's seat. To be included in the debate, a candidate must have at least
10% of the vote in one of the three major polls conducted state-wide for the election.
If California used a poll that was inaccurate, candidates could be unfairly excluded
from the debate, or undeserving candidates included. So a candidate has a large
stake in seeing that the polls are indeed accurate, so that s/he can be included
to this debate and gain greater publicity. For further information, visit
the following links: NewsHour main page about the Davis recall: http://www.pbs.org/newshour/local/davis_recall/
135 candidates certified:
http://www.pbs.org/newshour/updates/recall_08-14-03.html Televised debate
to exclude Davis, most candidates: http://www.pbs.org/newshour/media/media_watch/july-dec03/recall_08-15-03.html
Rules and info about the recall: www.RecallGrayDavis.com/QA.asp Definition
of terms - Bias
(also sampling bias): a sample that has skewed results caused by over representing
or under representing a portion of the population.
- Poll:
an estimate of public opinion based on a representative sample.
-
Sample (also sample population): the process of selecting a few members
of a population to represent the opinions or beliefs of the whole. This can be
done several ways: cluster, convenience, random, stratified random, and systematic.
A cluster sample is taking a defined group out of the whole (ex: a neighborhood
out of a city). A convenience sample is taking those members of the population
that are easiest to the researcher (ex: all of one's coworkers). A random sample
is a commonly used sample, where everyone in a population has an equal chance
of being selected. A stratified random sample is the most commonly used sample,
where the population is divided into distinct categories, and then some are randomly
selected from each (ex: dividing the nation into racial categories, and taking
100 people from each). A systematic sample is taking an organized list of a population,
and taking every 10th (etc.) person from it.
-
Statistical representation: showing or displaying the opinions or beliefs
of a population using mean, median, quartiles, etc. Typically purely numerically
based.
-
Visual representation: showing or displaying the opinions or beliefs of
a population using graphical means. Common uses are scatter plots for two variable
data and box-and-whisker plots for one variable data.
-
Box-and-whisker plot: a graph of data where the first ¼ of the information
is on a line (whisker), the middle half is shown with a box (the median is also
usually marked with a vertical line), and the upper ¼ of the information
is in another whisker. (see picture)
-
Scatter plot: a graph of data where each individual point is an ordered
pair, and the data is many individual points scattered throughout the graph.
Lower Quartile (first quartile): the number in a list of data which has ¼
of the data below it, and ¾ above it. The median of the first half.
Upper Quartile (third quartile): the number in a list of data which has ¾
of the data below it, and ¼ above it. The median of the second half.
-
Mean: the additive average. Calculated by adding up all numbers in a list
and dividing by the total number of numbers.
-
Median: the middle number in a set of data
2.
Compare the methodology of a poll with that of a petition drive. How are the samples
of the population different? How does that affect the results and accuracy of
each? Ask the following questions: - What
type of sample is a petition drive? (A typical phone poll is a stratified random
sample of the population)
- Which
one is more representative of the entire population's opinion? Why?
- What
does that say about why there is a recall election, instead of the petition immediately
ousting Gov. Davis?
- Discuss
Internet polls (whoever happens by the Web site chooses to vote; is this representative?)
3.
After discussion, direct class into a sampling activity - Random Rectangles. Give
students sheet of 100 rectangles of various
sizes and instruct them to find the "average rectangle."
a. The first way
to have students do this is by having students scan the sheet for 5 seconds and
guess the average area (each box represents one square unit). Collect their guesses
in list one of the calculator (or column one of the spreadsheet). To do so, on
the TI-83 calculator, press STAT, choose EDIT. You should be able to enter in
numbers now. If your lists have numbers in them before this, to clear, press 2nd,
+ (this is MEM), and choose option 4, ClrAllLists. Press ENTER. b.
Secondly, have each student in the class select five rectangles that appear to
be representative of the whole, and find the mean average of the areas (ex: student
chooses rectangles 26, 59, 74, 77, and 96. These have areas of 12, 8, 10, 3, and
18, respectively. The mean area would then be (12 + 8 + 10 + 3 + 18 = 51/5 rectangles)
= 10.2. Collect these numbers in list 2 (just press a right arrow to get here).
c.
Finally, have students use their random number generators (MATH button, choose
PRB, and randInt(, ENTER. This randInt( should appear on the home screen. Complete
the statement so that the screen reads randInt(1,100,5) (tells the calculator
to choose 5 random numbers between 1 and 100.) The 5 numbers represent 5 rectangles.
Students should find those five, determine each of the 5 rectangle's area, and
calculate the mean as before. Enter the class data into list 3. **Teacher
note: when doing this in class before, I have often had several kids all get the
same 5 "random" numbers from the calculator. I usually just have students
check with a few of their neighbors, and have any that have repeated selections
tell the calculator to redo the 5 numbers.** 4.
Once the information is gathered, have students press STAT, choose CALC, 1-Var
Stats, ENTER. The calculator then calculates the mean, (labeled as x with an overline,
called "x-bar.") standard deviation (which I don't use), median (Med),
minimum (minX), lower quartile (Q1), upper quartile (Q3), and maximum (maxX).
To see all this information, use your up and down arrows. Students should record
this information for later use, including graphing the box-and-whisker plots.
5. It is possible to have the calculator complete the box-and-whisker plots,
if desired. To do this, press 2nd, Y= (which is STAT PLOT), ENTER, turn plot 1
on, choose the 5th type by arrowing to the right, and have alternately L1, L2,
and L3 as the Xlist. (To change the list, press 2nd, 1 for L1, etc.) Questions
to ask:
1. List one was taken from your scanned guess of an average rectangle? Which does
that best represent: an Internet poll, a collection of signatures, or a formal
poll such as we read about? Why? 2. List two was taken from your five
chosen rectangles. Which of our three does that best represent - Internet poll,
a collection of signatures, or a formal poll? 3. List three, from the
calculator's random list of numbers, represents which of the three options best?
Explain. 4. Do you notice any difference in the box and whisker plots,
or the values of the mean and median in each list? Explain. 5. (Give
students the actual average size of the rectangles, 7.42 = mean, 6 = median) How
does the actual averages compare to the 3 samples we took? To your individual
numbers? What patterns and differences do you notice? What might have affected
these differences? How does that influence your reading of stories such as this
one? Assessment
Obtain a map of the United States with state boundaries. The student's job is
to show how to select a random sample of states for the purpose of studying land
use. How would you select the sample? Where might be a source of possible bias?
If instead you were studying population trends, how could you select your sample?
Correlation
to National Standards NCTM Principles and Standards for School Mathematics
http://standards.nctm.org Standards
and Benchmarks (from
National Council of Teachers of Mathematics) The student should:
- understand
the differences among various kinds of studies and which types of inferences can
legitimately be drawn from each;
- know
the characteristics of well-designed studies, including the role of randomization
in surveys and experiments;
- understand
histograms, parallel box plots, and scatter plots and use them to display data;
for univariate measurement data, be able to display the distribution, describe
its shape, and select and calculate summary statistics; - use
simulations to explore the variability of sample statistics from a known population
and to construct sampling distributions;
- evaluate
published reports that are based on data by examining the design of the study,
the appropriateness of the data analysis, and the validity of conclusions;
Source:
Random Rectangles is an activity taken from Key Curriculum's Activity-Based Statistics,
www.keypress.com. Some "questions-to-ask" may have been influenced by
their questions, as well. Author
Ryan Martine is a math educator at Preston Junior High School in Fort Collins,
Colorado. To
find out more about opportunities to contribute to this site, contact Leah Clapman
at extra@newshour.org. | |