The Leaning (Toppling?) Tower of PISA: Facts and Doubts about International Comparisons in Education

Doubting Intl Test Comparisons

That the U.S., the world’s top economic performing country, was found to have schooling attainments that are only middling casts fundamental doubts about the value and approach of these [international] assessments.
—S. J. Prais, National Institute of Economic and Social Research, London

International comparisons in reading, math, and science have become the principal means of evaluating school quality in various nations. For school bashers and fearmongers, nothing proves the low state of American public schools better than these studies. But it’s hard to not notice that professional anxiety disseminators about education in this country—people like Roy Romer (Strong American Schools), Bob Wise (former governor of West Virginia), Bob Compton (the Two Million Minutes video), Bill Gates of Microsoft, and Craig Barrett of Intel—always talk about Program of International Student Assessment (PISA), seldom mention Trends in International Mathematics and Science Study (TIMSS), and never speak of Progress in International Reading Literacy (PIRLS).

Why is that? Could it be because the U. S. ranked low in PISA, at least among the nations of the Organization for Economic Cooperation and Development (OECD); showed more improvement in TIMSS’s 8th grade assessment between 1995 and 2003 than all but Latvia, Lithuania, and Hong Kong; and ranked quite high on PIRLS?

Or could it be because PISA’s sponsor is the OECD, a neo-liberal outfit headquartered in Paris that gets 25 percent of its budget from the U.S., and as such, plumps for competition in the global economy, worships the gods of Science, Technology, Engineering, and Mathematics (STEM), and ignores the continuing Wal-Martization of American jobs?

In many ways, PISA is just the latest instrument that critics have used to bash public schools. This criticism has increased dramatically since the start of the Cold War. Many people today believe any news about the public schools as long as it is bad. The facts have never justified the attacks. For example, the schools took the blame for letting the Russians get into space first with Sputnik in 1957. However, the U. S. had a satellite-capable rocket in space over a year before Sputnik. President Eisenhower simply wanted the Russians to go first to establish a precedent that deep space was free and international.

In 1983, A Nation at Risk contained a treasury of spin and distorted statistics. It glorified Japan whose economy collapsed a few years later even as Japanese kids continued to ace tests in the international comparisons and just as the U. S. began a long economic expansion. If ANAR’s thesis had been correct, such growth would have been impossible. Nobody gave the schools any credit for that growth. Indeed, three months after the New York Times headlined a story, “The American Economy, Back on Top,” IBM’s CEO, Louis Gerstner penned a Times op-ed, “Our Schools Are Failing.”

The media assist these beliefs with scary headlines such as “Math + Test = Trouble for the U. S. Economy” (Christian Science Monitor), and “Economic Time Bomb” (Wall Street Journal). U. S. students scored 2 points lower on PIRLS in 2006 than they had in PIRLS 2001—a trivial difference. Both scores were well above the international average, which, you might think, would lead to positive reporting but most of the media ignored the results and the coverage in Education Week was headlined, “America Idles on International Reading Test.” The schools can’t seem to score a good headline.

Simply ranking nations is often misleading because a country scoring a miniscule amount lower than another country will be one full rank lower. Many countries bunch up with similar scores, but translating these slight score differences into rankings makes it seem as if they are achieving at very different levels even if a trivial point or two means a completely different ranking.

No one knows what PISA measures. TIMSS items, taken by 4th and 8th graders, have short questions geared to specific curricula. PISA questions ramble discursively and sometimes contain irrelevant information and factually incorrect material. PISA’s long questions, administered to 15-year-olds, mean that its assessment of science and math is hopelessly confounded with reading. The correlation between PISA’s math and reading assessment, for instance, is .77, indicating a great deal of overlap.

Svein Sjøberg at the University of Oslo analyzed PISA items and found some to have confusing and erroneous material. For example, Sjøberg observed that the title of an article about cloning, “A Copying Machine for Living Beings,” was translated literally word for word into Norwegian, rendering the title incomprehensible. And the questions are supposed to be culturally neutral. Ha. In one reading item, a peasant and a scholar each claims a woman as his wife. The judge calls her to his private quarters and asks her to fill his inkwell, which she “cleaned quickly and deftly and filled with ink; therefore it was work she was accustomed to.” So she must be the scholar’s wife. Give PISA an award for that sexist gem.

Many PISA items are free-response (written-in answers instead of multiple-choice bubbles), but countries have refused to publish the answers, meaning that we can’t learn anything from them. All we can do is rank countries and make inappropriate inferences about why they scored the way they scored. Only Luxembourg has published some student answers to these free-response questions and the results show in some cases that the students simply didn’t understand what the item writers were trying to ask. Wrong answers would also contain a lot of information about students’ reasoning, but because those answers go unpublished, that source is not available for analysis.

Debating technical problems is not as glamorous as debating whether OECD has set out to rule the world, but the technical problems are critical. Sampling is not consistent across countries. There are no uniform criteria for what kind of and how many special needs students to exclude—the principal or other personnel in a school decide whom not to test. And there are persistent stories of certain students being told to stay home on test day. PISA items use the metric system, which few Americans know, yet we know irrefutably that people score better when tested on things they know.

Countries vary greatly in how they motivate kids to take the tests. Sjoberg says that most Scandinavian students never have seen a test like PISA and don’t take it seriously. That certainly would be true here as well. As testing approaches in Singapore, on the other hand, shops put up stands that contain painkillers (presumably to treat study-induced headaches) along with test-prep books. An observer in Taiwan noticed that on test day parents gathered with their children outside the school building exhorting them to do well. Then they marched into the school to the national hymn. Inside, the principal delivered another exhortatory message. In Korea, as each child’s name was called to go to the testing center, that student rose to thunderous applause from classmates; such an honor to be chosen to bring glory to the nation!

Students in different countries use different test-taking tactics. Dutch students try to answer all items, which leads to a lot of guessing as they begin to run out of time. Austrian and German students skip many items so they are not pressed for time. In countries not familiar with American-style testing, students often fill in more than one bubble per item. Items in the various languages differ in length—German items are longer than English items and have more complex grammar.

In the U. S., educators and politicians have not paid much attention to PISA except as a general cudgel to beat schools over the head and instill fear (China and India don’t participate so these two giants are not available for use as looming menaces). In Europe, PISA’s had more impact on policymaking, perhaps because countries that didn’t do well, like Germany, were greeted with headlines like “Dummkopf” and “Are German Students Stupid,” and “School Makes You Dumb.”

Even if the tests were valuable sources of information, there would be the fact that among 21 developed nations, the U.S. had the highest poverty rate. If we’re number one in poverty, is it reasonable to expect that we would be number one in test scores? But, really, the tests aren’t informative. In fact, if you analyze the test scores by the poverty levels of schools, the top 30 percent of American kids score higher than the highest country in reading. And another 28 percent score high enough that if they constituted a nation, they’d rank fourth in the world (out of 35). But individuals like Bob Wise and Roy Romer are not interested in such analysis. They have the money and reason to peddle fear.

Worst of all, PISA uses a statistical technique called the “One Dimensional Item Response Theory.” Joachim Wuttke of Jülich Research Center in Munich contends that this is wholly inappropriate. “Items that did not fit into the idea that competence can be measured in a culturally neutral way on a one-dimensional scale were simply eliminated.” This was 65 percent of the items in the field tests. This corroborates the University of Oslo’s Rolf Olsen who argues that “in PISA-like studies, the major portion of information is thrown away.”

Why does PISA use this one-dimensional model? Wuttke thinks because it is the only model that will yield unambiguous rankings, which is all anyone, even in Europe, pays any attention to. With a multi-dimensional model, one country might be on top in one dimension, another country is number one in another, and so on. Then no one could then claim “We’re number one!”

Stefan Hopmann of the University of Vienna underscores that there is no research underlying PISA’s enormous assumptions, namely, that PISA measures important knowledge and that PISA measures knowledge important to a nation’s economic future. As Iris Rotberg of George Washington University concluded in a convincingly-researched paper, “The fact is test score comparisons tell us little about the quality of education in any country.” (Education Week, June 11 2008)

If Rotberg is right, and I believe she is, why does anyone put much credence in international test scores? I think the answer is multifold and the different groups involved are not necessarily independent or single-minded. Some people wish to continue the march to privatization begun in earnest with voucher programs and No Child Left Behind. Others wish to make a case for a particular kind of education in line with a neo-liberal organization like OECD. American business and industry has tried for well over a century to control curriculum and instruction. And yet a third group, typified by Bill Gates and Intel’s Craig Barrett, hope that mediocre showings in test-score comparisons will lead people to believe that the import of foreign scientists and engineers (who just happen to be willing to work longer hours for less pay than American scientists and engineers) is so critical to our economic well-being, that limits on visas for foreign workers should be lifted or, at least, eased.

THERE IS one good result from PISA. Finland, a country whose education system stands our No Child Left Behind (NCLB) on its head, is ranked first. Students in Finland don’t start school until they’re seven. They incur very little homework. They sit for virtually no standardized tests. Students work on their own without hovering adults from an early age. NCLB is the ultimate one-size-fits-all education reform (where “reform” just means “reshape,” not necessarily improve). In Finland, there are national standards, but teachers create lessons to fit their students—teachers pick books and customize lessons—whereas NCLB is hated at the school level because most teachers don’t want to teach the way NCLB forces them to (not to mention that for teachers NCLB is all stick and no carrot).

There are some other things about Finland that contribute directly or indirectly to academic achievement (we know this because reporters and observers descended in droves after the PISA results were published. Finland number one in education? Who would have guessed?). Finland’s universal health care prevents students from missing long periods of school time. Finns read newspapers and take books out of libraries at a higher rate, per capita, than any other nation. It’s hard to become a teacher in Finland and the training is thorough. Teachers are not paid especially well, but the profession is respected, in part because of its selectivity: 90 percent of applicants to colleges of education are rejected. In addition, teacher training in Finland looks more like U. S. doctor training in teaching hospitals—there is a lot of hands-on experience in schools.

When Congress gets around to reauthorizing No Child Left Behind, it should cancel a few junkets to glamorous, jazzy venues and hop over to Helsinki.

Gerald W. Bracey writes two monthly columns on educational research and policy (Phi Delta Kappan and Principal Leadership) and has written books with titles such as The War Against America’s Public Schools, Setting the Record Straight, and The Death of Childhood and the Destruction of Public Schools. He maintains the Education Disinformation Detection and Reporting Agency website and blogs several times a week for the Huffington Post.