The mad rush to embrace high-stakes testing says to me that we are now reaping
what years of superficial indifference have sown. That is, for years educators
have not held themselves accountable, so now business leaders and politicians
are creating systems to hold schools accountable. As I will explain, the move
to create standards is out of synch, and we're now testing with a vengeance,
before the system has had time to get ready. ...
Behind the Standards
The push for standards began in force back in 1988, when President George Bush
called the nation's governors together for the first-ever National Education
Summit, held on the campus of the University of Virginia. Out of that largely
theatrical meeting came a set of national education goals, some of which were
actually written in the White House basement months later, some of which had
been decided upon beforehand.
Goals begot standards, and here the White House has taken a back seat. The
prime mover behind standards has been IBM's Louis V. Gerstner Jr., the
prominent businessman who has been an education reformer for more than 20
years. Gerstner was the principal organizer behind the National Education
Summit meetings in 1996 and 1999, meetings that involved nearly every state
governor, America's business leaders, and President Clinton. Gerstner is aware
of the growing backlash against high-stakes testing, but he's not backing off.
"We can't slow down, because we hurt everybody when we slow down."
I asked him about fears that some children are being put at a disadvantage when
new, tougher standards were suddenly imposed. He was visibly annoyed. "The
argument that we shouldn't put standards in because some children are going to
be hurt because they're not going to pass a test is fallacious. Children are
being hurt today because they're passing the tests, and the tests are
not recognizing that they cannot do what they need to do."
Reformer Ted Sizer is concerned not only about the speed with which the
standards movement is moving but also about the driving force behind it,
American business. "Business has put a lot of time into this, but in ways that
have been simplistic," Sizer says. "But why should we be surprised, because if
I were to presume that I could move in and run IBM, I'd get most of it wrong,
because I don't know enough." The process, Sizer says, has been arrogant and
costly, and the notion that there is one best curriculum to be decided upon by
a small group, is dangerous. "Do we want a small group of people deciding what
is supposed to go into our children's heads? I don't think so."
Gerstner laughed when I told him about Sizer's concerns. "Forty-nine states are
setting standards, and Iowa is doing it in every local community. So we'll have
60 or 70 or 80 institutions in our society creating these standards, and every
one of them is different." Gerstner says that in every state parents,
educators, business leaders, and others are involved, including experts on
standards in other countries.
Voting for standards is a lot easier than actually creating them.
That task involves two types of standards: content standards and performance
standards. Some person or group must decide on content: what, for example,
eleventh-graders should master in English. Let's say the group agrees that
eleventh-graders must be able to present a complex argument persuasively and
must be familiar with drama, poetry, and fiction. Let's go further and say that
they also agree that eleventh-graders should read and be able to understand a
Shakespearean play. Assuming that they've gotten that far (and that's a big
assumption, considering the cultural climate we live in), that's only halfway
home. Now it's time to decide what levels of performance are "satisfactory,"
"outstanding," and "unsatisfactory." How much of that play does the
eleventh-grader have to grasp to meet the new standard, and what does a
satisfactory essay look like? These questions are neither trivial nor easy to
We now enter the arbitrary process of standard-setting. Just what standards are
established depends on who is asked. Each expert will have an idea of what is
acceptable, outstanding, and insufficient. Are these ideas to be given
arbitrary numerical weights and then averaged? Somehow a number is arrived at,
and that number immediately takes on magical qualities -- it is what a student
must achieve to pass, or to be promoted to the next grade, or to graduate.
The next steps are not trivial or automatic either. The curriculum must be
adjusted, and then teachers have to be brought up to speed. Teachers who've
grown accustomed to teaching certain materials in set ways may have to, in
effect, start over. This may be beneficial for all concerned in the long run,
but it will not happen overnight.
Multiply that scenario by the number of grade levels, the number of subjects,
and the number of teachers (3,000,000), and you begin to understand the swamp
that educators, politicians, business leaders, and others have waded into. And
you also may begin to understand why testing has gotten ahead of developing and
then implementing standards in many places.
Gerstner is clear about what needs to be done, even as he acknowledges that
there will be casualties. "We need to make very significant investments now, to
protect the people that will get hurt, because we're imposing a new system of
high standards in an environment where there weren't any standards, and some
children are going to get caught in the middle. How do we help them? Massive
after school and summer training programs. We need to fund those. We need to
train teachers to develop ways to bring these students up quickly." ...
Today we are rushing headlong in search of the "Holy Grail" of rising test
scores. What seems to be happening is that the high-stakes testing movement has
picked up momentum and gotten well ahead of the slower process of developing
and implementing standards. Most policymakers are not as sophisticated as
Gerstner, and many unfortunate decisions are being made as pressure for
"accountability" overwhelms common sense. It's a whole lot easier to give a
test than to do the hard work of retraining teachers and preparing students.
What's a Passing Score Anyway?
When decisions are made on the basis of a single test, teacher judgment is
tossed out the window, along with a student's past performance. ... One Chicago
parent told me tearfully about her son having to go to summer school. "He's a B
student, but he missed the cutoff score by half a point on the test, because he
was so nervous," she said.
Setting cutoff scores ("cut scores" in the language of testing) is an inexact
science at best. That number which seems so firm and final may in fact be
wholly arbitrary and subjective. Why is a 65 passing, and a 64.5 failing? Who
made that decision, and on what basis? To George Madaus of Boston College,
these situations are "obscene." "We're just kidding ourselves," he told me.
"The technology is nowhere near being so precise that accurate decisions can be
made on the basis of one or two points, one way or the other."
This is not just test-bashing. As Bob Sexton of Kentucky's Pritchard Committee
(responsible for monitoring that state's reforms) says, "Test bashing with no
alternative will likely lead to weaker not stronger public schools and give
those who are opposed to improvement (such as ideologues or resistant
educators) exactly what they want -- the status quo."
Having educational standards -- as opposed to not having them -- makes sense,
of course, and most of the public seem to be enthusiastically behind the drive
to create meaningful standards and curriculum that is aligned with those
standards. But an unofficial "coalition" of frustrated business leaders,
misguided politicians, short-sighted citizens, and ideologues is pushing us
headlong toward the dangerous practice of making decisions based on single
scores on tests that those taking them have not had the opportunity to prepare
In too many schools (like my daughter's), students and their teachers are not
given a choice: It's pass the high-stakes achievement test or suffer the
consequences. I believe that the trend toward high-stakes testing, and the
related mind-numbing drill-drill-drill that often accompanies it, is behind the
growth in private schools and home schooling.
Defenders of high-stakes tests argue that they are fair because students have
multiple opportunities to pass them. Often this is true, but it doesn't
change matters one iota. ... What we need are multiple measures, not multiple
Behind the Muddle
The idea that student performance on standardized, norm-referenced,
machine-scored tests is the primary indicator of school quality, and the
principal measure of accountability, has been with us for about 40 years. It
shows few signs of going away. We've grown accustomed to international,
national, state, and local comparisons based on test scores, and we rarely look
into the "why" of a number. Some reformers talk bravely about using other
indicators of quality, such as attendance and dropout rates, college
attendance, and teacher turnover, but at the end of the day test scores seem to
push everything else aside.
American students are tested far more than their counterparts in other
industrialized nations. Our elementary and secondary school students took more
than 140 million standardized, machine-scored, multiple-choice tests in 1998,
and 42 states mandate standardized testing. Eighth-graders are tested most
often, with third- and fourth-graders just behind. Poor children face more of
these tests than middle-class kids, in part because federal programs mandate
testing. Monty Neill, director of an anti-testing group called FairTest, told
me about one city's excesses. "At one time the city of Newark was testing the
kids monthly. Every kid was tested virtually once a month."
The cruel irony is that more testing actually produces more reliable,
and therefore more valuable, information. That's because there can be so much
variability in individual test results, meaning your child's score may vary by
large margins from one day to the next. Stanford University statistician David
Rogosa has calculated that if the average fourth-grader were to take the widely
used Stanford Nine (sometimes called the SAT 9), twice, he would have a 43
percent chance of having scores that are more than 10 percentile points apart.
That is, he could be in the 75th percentile on one day and in the 60th the
next. "If you could give the test a lot of times and take the average score,
that would be approaching a gold-standard measurement," said Rogosa, who
recently published an accuracy guide to the Stanford Nine. "In testing, because
it's so expensive, we only get one shot."
According to the same story, the Orange County School District has informed
parents that a student who ranks at the 50th percentile in reading actually
could belong somewhere from the 40th to the 60th percentile, for example.
Orange County spent at least $28 million last year to run the Stanford Nine
Richard Rothstein in the New York Times compared high-stakes testing to
evaluating a baseball player's season with his performance in a randomly chosen
week of the season, instead of his total performance. We don't do that in
athletics for the most part, so why are we willing to treat our kids that
A single number spit out by a machine is powerful and seductive (even if some
small portion of that test involves writing, which is graded by humans, not
machines). What's more, that number is easy to understand. The fact that it is
inevitably misleading does not seem to count for very much. ...
But the fundamental problem is that many schools and school districts use
standardized test results more for accountability than understanding or
diagnosis. I'm not blaming educators for this situation, because they're only
H. D. Hoover of the University of Iowa defends testing but agrees we've gone
overboard. He places the blame squarely on politicians. "They want quick fixes,
and they like tests because they're cheap. They mandate external tests because
to the public it looks like they're doing something about education when all
they're doing is actually a very inexpensive 'quick fix.'"
Hayes Mizell, the thoughtful director of the Program for Student Achievement at
the Edna McConnell Clark Foundation, has a different view. He says that
educators have come to expect others outside the schools to hold them
accountable, instead of taking the initiative and holding themselves
accountable. "They obsess over their students' performance on the state
test, rather than over what their students really know and can do." He argues
that educators ought to find and present school-based evidence, rather than
obsessing over state tests and allowing those standardized, multiple-choice,
machine-scored instruments to be the ultimate yardstick. He concludes, "Perhaps
it is unrealistic to think that public education can do better, but I worry
that if educators are focused more on their accountability to the state or
school district than on their accountability to their students, their internal
professionalism will wither."
Testing is not evil, of course. A primary purpose of school is academic
learning, and we must know whether, and how much, students are learning.
Well-made tests are an excellent way to measure learning and diagnose
weaknesses. Excellent teachers create good tests, grade them carefully, and get
them back to their students in a matter of days. Parents searching for
excellence would be wise to ask the better students to describe the kinds of
tests they take and the lag time between taking the test and getting it
Machine-scored, multiple-choice tests are rarely the best descriptive tool,
and, as noted earlier, they're usually not intended as such. George Madaus of
the National Board on Educational Testing and Public Policy at Boston College
sums up the situation this way. "There are only three ways to test people. I
can have you select an answer from a list -- that's a multiple-choice. Second,
I can ask you to produce an answer in essay form. Third, I can ask you to do
something -- fix a carburetor, or do a dive off a diving board, whatever -- and
I can rate you on it."
In our adult lives, most of us take the third kind of test, that is, we're
evaluated on performance. Some schools, notably those inspired by the work of
Theodore Sizer's Coalition of Essential Schools, require students to
demonstrate their mastery, by standing up in front of a group of adults or
their own peers to "exhibit" what they have learned.
That's a far better way of evaluating, describing, and diagnosing, but it's
also time-consuming and expensive, which means that it's unlikely to ever be
more important than machine-scored, multiple-choice tests.
I am not criticizing standardized tests, because standardization is the
key to fairness. When a test is standardized, it simply means that everyone has
to take it under the same conditions. That is, you and I have to answer the
same questions, in the same amount of time. As H. D. Hoover of Iowa notes, "It
would not be fair to make comparisons if one student has three days to complete
the test, and another has only ten minutes. Or if one student has the test
questions read to him, while the other does not." Properly used, standardized
tests are a source of useful information that helps teachers do a better
Tests, whether standardized or teacher-made, must also be both valid and
reliable. Both adjectives are technical terms in testing. Valid tests
measure what they are supposed to. For example, actually performing a series of
dives would be a valid test of one's diving ability, while writing an essay
about how these dives are performed or taking a multiple-choice test about
diving would not. A test is reliable if it can be trusted (relied upon) to
produce the same score, or nearly so, when it is given to the same group or
The argument is against multiple-choice tests and their impact on the
curriculum. George Madaus supports testing, but he's well aware of the
weaknesses of multiple-choice questions. "The adults who write the questions
sometimes lose sight of the way kids will read those questions. There's a
standardized test question that shows a cactus in a pot, a rose in a pot, and a
cabbage, and the question is which needs the least amount of water. To the
item-writer 'cactus' was the right answer, but some kids pick the cabbage. And
the reason they gave was that the cabbage had been picked and so it didn't need
water anymore. That's a perfectly good answer, but the machine had been set to
score it as wrong." Students who get the "right" answer have demonstrated,
perhaps, that they think like an adult -- or like the test-maker. The kid who
thinks differently, or whose frame of reference is different, is marked down,
and perhaps eliminated from the competition.
As Sizer notes, the real world is not "a series of set, pre-digested answers"
but a set of questions. "Take the issue of cloning, an issue so difficult that
very few teachers want to talk about it, or know how to talk about it. Cloning
raises all sorts of difficult questions. What are the right answers? That can't
be put on a multiple choice test."
"At the worst," Sizer adds, "these standardized tests provoke a kind of
drilling mentality. It's a game. And so students learn the game. What they
learn is to hire people to teach you how to figure the test out. Not the
substance, but the test."
And that teaches cynicism. "The lesson learned is 'to get a high score, this is
what you have to do. If you want to get ahead in life, jiggle the system.' And
that's anti-intellectual and pernicious."
High-stakes tests now begin as early as first and second grades. Where
high-stakes tests are being imposed by states, it has thrown many teachers and
students into a state of anxiety. This is counter-productive, says E. D. Hirsch
Jr. "They start prepping for tests, cramming for tests, and teaching for tests,
and none of these things are educationally productive. My hope is that's a
transitional phase." Hirsch believes in testing but not in studying for the
test. "If the tests are good," he says, "the way to prepare for them is to have
a good education. High test scores are a by-product of a good education."
I've seen first hand what an obsession with tests and test scores does to real
learning. In the early '90s I spent three years at Woodward High School in
Cincinnati, watching (and videotaping) as some teachers there attempted to
adopt the philosophy and practices of the reform known as "The Coalition of
Essential Schools." The ideas are easy to grasp: (l) "Less is more," meaning
that students will dig deeply into a small number of topics instead of taking a
broad survey approach; (2) for the most part teachers will not lecture but will
guide and encourage learning; (3) teachers will work together across
disciplines; and (4) students will be evaluated on the basis of their collected
work (portfolios) and public demonstration of their knowledge.
For nearly two years the reform effort proceeded like most reforms, two steps
forward and a step-and-a-half back, as students grudgingly learned that they
couldn't merely regurgitate whatever the teacher said and expect to pass.
In response to growing public dissatisfaction with school outcomes, the state
of Ohio had instituted its own exam, a high-stakes test that students had to
pass to graduate. The test did not set the bar very high, and students were
given at least eight chances to pass, beginning in tenth grade. Nevertheless,
as test day drew near, the reform simply stopped in its tracks. No longer were
teachers encouraging students to ask questions, to dig, to work together on
projects, and to stand in front of their classes demonstrating their knowledge.
Instead, classtime was given over to drill. "How many branches of government
are there?" "Which branch institutes new laws?" "What is the role of the
Executive Branch?" If I'd listened long enough, I'm sure I would have heard
some teacher ask, "How many wives did Henry the Eighth have?"
The kids got the message: all this fancy talk about portfolios and
demonstrations is just so much gas. Coalition teachers grew dispirited, and
those Woodward high faculty members who preferred the old ways grew stronger in
their resolve to keep on doing clings in the same old ways. ...
Some Sensible Approaches
The last word on accountability belongs to others. We need a clearer
understanding of accountability and we need more measures of school outcomes
that are not simply test-based. For example, apart from test scores,
shouldn't we also seek to know how many young people finish school and
graduate? How many continue their education after high school? How many
students are put into special education classes and never leave them? What the
attendance rates are for both teachers and students? The list of possible and
knowable outcome measures goes on and on, but instead we seem to be willing to
settle for that one simple test score.
Walt Haney of Boston College reminds us that accountability refers to more than
consequences, but also to conduct, by which he means what actually happens
inside schools between children and teachers. These transactions
are tougher to measure; they don't lend themselves to easy numbers, but surely
Emphasizing testing may eventually drive parents away from public schools,
particularly from high-quality schools. Monty Neill of FairTest says he's
already hearing that from parents in Massachusetts, where his organization is
located. "This is not a major factor yet, but it could become one, as parents
who think they have quality schools (and often do) recognize the damage that is
done when schooling revolves around the search for higher test scores."
I may get in trouble here, because I want to suggest what constitutes
excellence in testing and assessment. First of all, an excellent testing policy
is transparent, that is, it is open for inspection by all who are
interested and it is presented in clear English. It is understandable and
defensible. It is connected to the curriculum and the goals of the school.
Excellent teachers have such policies. They explain in advance to students just
what is expected, how they will be assessed, and why.
Excellent schools do not -- repeat, do not -- attempt to evaluate, promote, or
hold back students on the basis of a single test, particularly a
machine-scored, multiple-choice exam. That is, they reject high-stakes testing
insofar as it is possible.
Teacher-made tests, constructed by excellent teachers, remain the best means of
assessing student progress and weakness. The best teacher I ever had routinely
tested his students by having us write short essays. He called them "2-8-2s"
because we were given a topic and two minutes to think about it, then exactly
eight minutes to write, followed by two minutes to make changes and
corrections. He made the rules very clear: A major error (such as a sentence
fragment) meant a grade of "zero." We could expect a 2-8-2 at least two or
three times a week, but at the end of the semester he would discard our 10
lowest grades. We would get the papers back the next day! Often we wrote on
some aspect of the play or poem we were studying, but he was just as likely to
give us an obscure quote and instruct us to reflect on it.
These were excellent tests, aligned with his curriculum and the goals of his
class. His policy was transparent. Without using any of today's jargon, he made
clear to us what his standards (of content and performance) were, he allowed no
lag time between the test and its return, and he used the results to diagnose
and correct our weaknesses. That is excellence at work, and it is what all
schools should strive for.
Holding schools or (especially) students accountable almost solely on the basis
of student scores on machine-scored tests establishes a "whips and chains"
system. When we do that, we're using tests as a weapon, nothing more.
Questions Worth Asking
- What is the district's policy on high-stakes testing? If no policy,
why not? How many machine-scored, multiple-choice tests will my child take each
year? Are these "high-stakes" tests, and if so what is at stake?
- Who mandates these tests (state, county, the National Assessment of
- Do the results have an impact on specific children, or is it the school that
is being measured and rated?
- If individual students are not being evaluated, has any consideration been
given to testing only a sample of students? After all, political pollsters
question only a small sample of voters and predict results with uncanny
accuracy. Why not take the same approach with educational testing that measures
a school's or a system's health?
- How much time is devoted to test preparation and practice?
- How long does it take for the teachers to get the results, and are they
returned in usable form?
- How are the results used? Does the school share the results with students,
parents, and the community?
- Are the data carefully analyzed (disaggregated) to make sure that all
students are learning? (A few high scores by outstanding students can give a
misleading picture about the overall health of the system.)
- Are students doing better over time? That is, do the data indicate that the
longer a student goes to school, the more he or she learns (or the opposite)?
Do teachers use a variety of assessments, including portfolios and
- How significant are machine-scored tests as a part of a student's semester
and final grades?
- How much money is the district spending on outside testing? Does this dollar
amount include the cost of the time that teachers and administrators spend on
the tests and test preparation?
- What are the district, the school, and individual teachers not doing because
of these tests?
- How does any particular test influence the curriculum? Has anyone in
authority explored the influences of tests on the curriculum? If not, why not?
Do teachers rely on their own tests for the most part, or do they use
instruments created by others?
- Is the student-teacher ratio sufficiently low to allow teachers time to
create their own tests, grade them thoroughly, and discuss the results with
- Regarding testing, how much can one child's score vary?
- What are the academic standards for each grade? (Some parents will be
astounded that some kids were reading and writing in kindergarten.)
Excerpted from Choosing Excellence: "Good Enough" Schools Are Not Good
Enough (Scarecrow Press,
2001) by John Merrow. © 2001 by John Merrow. All rights reserved.
Reprinted by permission of the author.
See the Merrow Report website for
information on how to order Choosing Excellence.
home · no child left behind · challenge of standards · testing. teaching. learning?
introduction · in your state · parents' guide · producer's chat · interviews
video excerpts · discussion · tapes & transcripts · press
some photographs ©2002 getty images all rights reserved
web site copyright 1995-2013 WGBH educational foundation