Three Important Distinctions In How We Talk About Test Scores

In education discussions and articles, people (myself included) often say “achievement” when referring to test scores, or “student learning” when talking about changes in those scores. These words reflect implicit judgments to some degree (e.g., that the test scores actually measure learning or achievement). Every once in a while, it’s useful to remind ourselves that scores from even the best student assessments are imperfect measures of learning. But this is so widely understood - certainly in the education policy world, and I would say among the public as well - that the euphemisms are generally tolerated.

And then there are a few common terms or phrases that, in my personal opinion, are not so harmless. I’d like to quickly discuss three of them (all of which I’ve talked about before). All three appear many times every day in newspapers, blogs, and regular discussions. To criticize their use may seem like semantic nitpicking to some people, but I would argue that these distinctions are substantively important and may not be so widely-acknowledged, especially among people who aren’t heavily engaged in education policy (e.g., average newspaper readers).

So, here they are, in no particular order.

Teachers And Their Unions: A Conceptual Border Dispute

One of the segments from “Waiting for Superman” that stuck in my head is the following statement by Newsweek reporter Jonathan Alter:

It’s very, very important to hold two contradictory ideas in your head at the same time. Teachers are great, a national treasure. Teachers’ unions are, generally speaking, a menace and an impediment to reform.
The distinction between teachers and their unions (as well as those of other workers) has been a matter of political and conceptual contention for long time. On one “side," the common viewpoint, as characterized by Alter's slightly hyperbolic line, is “love teachers, don’t like their unions." On the other “side," criticism of teachers’ unions is often called “teacher bashing."

So, is there any distinction between teachers and teachers’ unions? Of course there is.

Revisiting The "5-10 Percent Solution"

In a post over a year ago, I discussed the common argument that dismissing the “bottom 5-10 percent" of teachers would increase U.S. test scores to the level of high-performing nations. This argument is based on a calculation by economist Eric Hanushek, which suggests that dismissing the lowest-scoring teachers based on their math value-added scores would, over a period of around ten years  (when the first cohort of students would have gone through the schooling system without the “bottom” teachers), increase U.S. math scores dramatically – perhaps to the level of high-performing nations such as Canada or Finland.*

This argument is, to say the least, controversial, and it invokes the full spectrum of reactions. In my opinion, it's best seen as a policy-relevant illustration of the wide variation in test-based teacher effects, one that might suggest a potential of a course of action but can't really tell us how it will turn out in practice. To highlight this point, I want to take a look at one issue mentioned in that previous post – that is, how the instability of value-added scores over time (which Hanushek’s simulation doesn’t address directly) might affect the projected benefits of this type of intervention, and how this is turn might modulate one's view of the huge projected benefits.

One (admittedly crude) way to do this is to use the newly-released New York City value-added data, and look at 2010 outcomes for the “bottom 10 percent” of math teachers in 2009.

Guessing About NAEP Results

Every two years, the release of data from the National Assessment of Educational Progress (NAEP) generates a wave of research and commentary trying to explain short- and long-term trends. For instance, there have been a bunch of recent attempts to “explain” an increase in aggregate NAEP scores during the late 1990s and 2000s. Some analyses postulate that the accountability provisions of NCLB were responsible, while more recent arguments have focused on the “effect” (or lack thereof) of newer market-based reforms – for example, looking to NAEP data to “prove” or “disprove” the idea that changes in teacher personnel and other policies have (or have not) generated “gains” in student test scores.

The basic idea here is that, for every increase or decrease in cross-sectional NAEP scores over a given period of time (both for all students and especially for subgroups such as minority and low-income students), there must be “something” in our education system that explains it. In many (but not all) cases, these discussions consist of little more than speculation. Discernible trends in NAEP test score data are almost certainly due to a combination of factors, and it’s unlikely that one policy or set of policies is dominant enough to be identified as “the one." Now, there’s nothing necessarily wrong with speculation, so long as it is clearly identified as such, and conclusions presented accordingly. But I find it curious that some people involved with these speculative arguments seem a bit too willing to assume that schooling factors – rather than changes in cohorts’ circumstances outside of school – are the primary driver of NAEP trends.

So, let me try a little bit of illustrative speculation of my own: I might argue that changes in the economic conditions of American schoolchildren and their families are the most compelling explanation for changes in NAEP.

The Deafening Silence Of Unstated Assumptions

Here’s a thought experiment. Let’s say we were magically granted the ability to perfectly design our public education system. In other words, we were somehow given the knowledge of the most effective policies and how to implement them, and we put everything in place. How quickly would schools improve? Where would we be after 20 years of having the best possible policies in place? What about after 50 years?

I suspect there is much disagreement here, and that answers would vary widely. But, since there is a tendency in education policy to shy away from even talking realistically about expectations, we may never really know. We sometimes operate as though we expect immediate gratification - quick gains, every single year. When schools or districts don't achieve gains, even over a short period of time, they are subject to being labeled as failures.

Without question, we need to set and maintain high expectations, and no school or district should ever cease trying to improve. Yet, in the context of serious policy discussions, the failure to even discuss expectations in a realistic manner hinders our ability to interpret and talk about evidence, as it often means that we have no productive standard by which to judge our progress or the effects of the policies we try.

Do Teachers Really Come From The "Bottom Third" Of College Graduates?

** Also posted here on 'Valerie Strauss' Answer Sheet' in the Washington Post

The conventional wisdom among many education commentators is that U.S. public school teachers “come from the bottom third” of their classes. Most recently, New York City Mayor Michael Bloomberg took this talking point a step further, and asserted at a press conference last week that teachers are drawn from the bottom 20 percent of graduates.

All of this is supposed to imply that the U.S. has a serious problem with the “quality” of applicants to the profession.

Despite the ubiquity of the “bottom third” and similar arguments (which are sometimes phrased as massive generalizations, with no reference to actual proportions), it’s unclear how many of those who offer them know what specifically they refer to (e.g., GPA, SAT/ACT, college rank, etc.). This is especially important since so many of these measurable characteristics are not associated with future test-based effectiveness in the classroom, while those that are are only modestly so.

Still, given how often it is used, as well as the fact that it is always useful to understand and examine the characteristics of the teacher labor supply, it’s worth taking a quick look at where the “bottom third” claim comes from and what it might or might not mean.

Smear Review

A few weeks ago, the National Education Policy Center (NEPC) issued a review of the research on virtual learning. Several proponents of online education issued responses that didn't offer much substance beyond pointing out NEPC’s funding sources. A similar reaction ensued after the release last year of the Gates Foundation's preliminary report on the Measures of Effective Teaching Project. There were plenty of substantive critiques, but many of the reactions amounted to knee-jerk dismissals of the report based on pre-existing attitudes toward the foundation's agenda.

More recently, we’ve even seen unbelievably puerile schemes in which political operatives actually pretend to represent legitimate organizations requesting consulting services. They record the phone calls, and post out-of-context snippets online to discredit the researchers.

Almost all of the people who partake in this behavior share at least one fundamental characteristic: They are unable to judge research for themselves, on its merits. They can’t tell the difference, so they default to attacking substantive work based on nothing more than the affiliations and/or viewpoints of the researchers.

The Uncertain Future Of Charter School Proliferation

This is the third in a series of three posts about charter schools. Here are the first and second parts.

As discussed in prior posts, high-quality analyses of charter school effects show that there is wide variation in the test-based effects of these schools but that, overall, charter students do no better than their comparable regular public school counterparts. The existing evidence, though very tentative, suggests that the few schools achieving large gains tend to be well-funded, offer massive amounts of additional time, provide extensive tutoring services and maintain strict, often high-stakes discipline policies.

There will always be a few high-flying chains dispersed throughout the nation that get results, and we should learn from them. But there’s also the issue of whether a bunch of charters schools with different operators using diverse approaches can expand within a single location and produce consistent results.

Charter supporters typically argue that state and local policies can be leveraged to “close the bad charters and replicate the good ones." Opponents, on the other hand, contend that successful charters can’t expand beyond a certain point because they rely on selection bias of the best students into these schools (so-called “cream skimming”), as well as the exclusion of high-needs students.

Given the current push to increase the number of charter schools, these are critical issues, and there is, once again, some very tentative evidence that might provide insights.

The Evidence On Charter Schools

** Also posted here on "Valerie Strauss' Answer Sheet" in the Washington Post and here on the Huffington Post

This is the first in a series of three posts about charter schools. Here are the second and third parts.

In our fruitless, deadlocked debate over whether charter schools “work," charter opponents frequently cite the so-called CREDO study (discussed here), a 2009 analysis of charter school performance in 16 states. The results indicated that overall charter effects on student achievement were negative and statistically significant in both math and reading, but both effects sizes were tiny. Given the scope of the study, it’s perhaps more appropriate to say that it found wide variation in charter performance within and between states – some charters did better, others did worse and most were no different. On the whole, the size of the aggregate effects, both positive and negative, tended to be rather small.

Recently, charter opponents’ tendency to cite this paper has been called “cherrypicking." Steve Brill sometimes levels this accusation, as do others. It is supposed to imply that CREDO is an exception – that most of the evidence out there finds positive effects of charter schools relative to comparable regular public schools.

CREDO, while generally well-done given its unprecedented scope, is a bit overused in our public debate – one analysis, no matter how large or good, cannot prove or disprove anything. But anyone who makes the “cherrypicking” claim is clearly unfamiliar with the research. CREDO is only one among a number of well-done, multi- and single-state studies that have reached similar conclusions about overall test-based impacts.

This is important because the endless back-and-forth about whether charter schools “work” – whether there is something about "charterness" that usually leads to fantastic results – has become a massive distraction in our education debates. The evidence makes it abundantly clear that that is not the case, and the goal at this point should be to look at the schools of both types that do well, figure out why, and use that information to improve all schools.

NAEP Shifting

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

Tomorrow, the education world will get the results of the 2011 National Assessment of Educational Progress (NAEP), often referred to as the “nation’s report card." The findings – reading and math scores among a representative sample of fourth and eighth graders - will drive at least part of the debate for the next two years, when the next round comes out.

I’m going to make a prediction, one that is definitely a generalization, but is hardly uncommon in policy debates: People on all “sides” will interpret the results favorably no matter how they turn out.

If NAEP scores are positive – i.e., overall scores rise by a statistically significant margin, and/or there are encouraging increases among key subgroups such as low performers or low-income students – supporters of market-based reform will say that their preferred policies are working. They’ll claim that the era of test-based accountability, which began with the enactment of No Child Left Behind ten years ago, have produced real results. Market reform skeptics, on the other hand, will say that virtually none of the policies, such as test-based teacher evaluations and merit pay, for which reformers are pushing were in force in more than a handful of locations between 2009 and 2011. Therefore, they’ll claim, the NAEP progress shows that the system is working without these changes.

If the NAEP results are not encouraging – i.e., overall progress is flat (or negative), and there are no strong gains among key subgroups – the market-based crowd will use the occasion to argue that the “status quo” isn’t producing results, and they will strengthen their call for policies like new evaluations and merit pay. Skeptics, in contrast, will claim that NCLB and standardized test-based accountability were failures from the get-go. Some will even use the NAEP results to advocate for the wholesale elimination of standardized testing.