Do Teachers Really Come From The "Bottom Third" Of College Graduates?

** Also posted here on 'Valerie Strauss' Answer Sheet' in the Washington Post

The conventional wisdom among many education commentators is that U.S. public school teachers “come from the bottom third” of their classes. Most recently, New York City Mayor Michael Bloomberg took this talking point a step further, and asserted at a press conference last week that teachers are drawn from the bottom 20 percent of graduates.

All of this is supposed to imply that the U.S. has a serious problem with the “quality” of applicants to the profession.

Despite the ubiquity of the “bottom third” and similar arguments (which are sometimes phrased as massive generalizations, with no reference to actual proportions), it’s unclear how many of those who offer them know what specifically they refer to (e.g., GPA, SAT/ACT, college rank, etc.). This is especially important since so many of these measurable characteristics are not associated with future test-based effectiveness in the classroom, while those that are are only modestly so.

Still, given how often it is used, as well as the fact that it is always useful to understand and examine the characteristics of the teacher labor supply, it’s worth taking a quick look at where the “bottom third” claim comes from and what it might or might not mean.

What Value-Added Research Does And Does Not Show

Value-added and other types of growth models are probably the most controversial issue in education today. These methods, which use sophisticated statistical techniques to attempt to isolate a teacher’s effect on student test score growth, are rapidly assuming a central role in policy, particularly in the new teacher evaluation systems currently being designed and implemented. Proponents view them as a primary tool for differentiating teachers based on performance/effectiveness.

Opponents, on the other hand, including a great many teachers, argue that the models’ estimates are unstable over time, subject to bias and imprecision, and that they rely entirely on standardized test scores, which are, at best, an extremely partial measure of student performance. Many have come to view growth models as exemplifying all that’s wrong with the market-based approach to education policy.

It’s very easy to understand this frustration. But it's also important to separate the research on value-added from the manner in which the estimates are being used. Virtually all of the contention pertains to the latter, not the former. Actually, you would be hard-pressed to find many solid findings in the value-added literature that wouldn't ring true to most educators.

Has Teacher Quality Declined Over Time?

** Also posted here on "Valerie Strauss' Answer Sheet" in the Washington Post

One of the common assumptions lurking in the background of our education debates is that “quality” of the teaching workforce has declined a great deal over the past few decades (see here, here, here and here [slide 16]). There is a very plausible storyline supporting this assertion: Prior to the dramatic rise in female labor force participation since the 1960s, professional women were concentrated in a handful of female-dominated occupations, chief among them teaching. Since then, women’s options have changed, and many have moved into professions such as law and medicine instead of the classroom.

The result of this dynamic, so the story goes, is that the pool of candidates to the teaching profession has been “watered down." This in turn has generated a decline in the aggregate “quality” of U.S. teachers, and, it follows, a stagnation of student achievement growth. This portrayal is often used as a set-up for a preferred set of solutions – e.g., remaking teaching in the image of the other professions into which women are moving, largely by increasing risk and rewards.

Although the argument that “teacher quality” has declined substantially is sometimes taken for granted, its empirical backing is actually quite thin, and not as clear-cut as some might believe.

Smear Review

A few weeks ago, the National Education Policy Center (NEPC) issued a review of the research on virtual learning. Several proponents of online education issued responses that didn't offer much substance beyond pointing out NEPC’s funding sources. A similar reaction ensued after the release last year of the Gates Foundation's preliminary report on the Measures of Effective Teaching Project. There were plenty of substantive critiques, but many of the reactions amounted to knee-jerk dismissals of the report based on pre-existing attitudes toward the foundation's agenda.

More recently, we’ve even seen unbelievably puerile schemes in which political operatives actually pretend to represent legitimate organizations requesting consulting services. They record the phone calls, and post out-of-context snippets online to discredit the researchers.

Almost all of the people who partake in this behavior share at least one fundamental characteristic: They are unable to judge research for themselves, on its merits. They can’t tell the difference, so they default to attacking substantive work based on nothing more than the affiliations and/or viewpoints of the researchers.

The Categorical Imperative In New Teacher Evaluations

There is a push among many individuals and groups advocating new teacher evaluations to predetermine the number of outcome categories – e.g., highly effective, effective, developing, ineffective, etc. - that these new systems will include. For instance, a "statement of principles" signed by 25 education advocacy organizations recommends that the reauthorized ESEA law require “four or more levels of teacher performance." The New Teacher Project’s primary report on redesigning evaluations made the same suggestion.* For their part, many states have followed suit, mandating new systems with a minimum of 4-5 categories.

The rationale here is pretty simple on the surface: Those pushing for a minimum number of outcome categories believe that teacher performance must be adequately differentiated, a goal on which prior systems, most of which relied on dichotomous satisfactory/unsatisfactory schemes, fell short. In other words, the categories in new evaluation systems must reflect the variation in teacher performance, and that cannot be accomplished when there are only a couple of categories.

It’s certainly true that the number of categories matters – it is an implicit statement as to the system’s ability to tease out the “true” variation in teacher performance. The number of categories a teacher evaluation system employs should depend on how on how well it can differentiate teachers with a reasonable degree of accuracy. If a system is unable to pick up this “true” variation, then using several categories may end up doing more harm than good, because it will be providing faulty information. And, at this early stage, despite the appearance of certainty among some advocates, it remains unclear whether all new teacher evaluation systems should require four or more levels of “effectiveness."

The Uncertain Future Of Charter School Proliferation

This is the third in a series of three posts about charter schools. Here are the first and second parts.

As discussed in prior posts, high-quality analyses of charter school effects show that there is wide variation in the test-based effects of these schools but that, overall, charter students do no better than their comparable regular public school counterparts. The existing evidence, though very tentative, suggests that the few schools achieving large gains tend to be well-funded, offer massive amounts of additional time, provide extensive tutoring services and maintain strict, often high-stakes discipline policies.

There will always be a few high-flying chains dispersed throughout the nation that get results, and we should learn from them. But there’s also the issue of whether a bunch of charters schools with different operators using diverse approaches can expand within a single location and produce consistent results.

Charter supporters typically argue that state and local policies can be leveraged to “close the bad charters and replicate the good ones." Opponents, on the other hand, contend that successful charters can’t expand beyond a certain point because they rely on selection bias of the best students into these schools (so-called “cream skimming”), as well as the exclusion of high-needs students.

Given the current push to increase the number of charter schools, these are critical issues, and there is, once again, some very tentative evidence that might provide insights.

Explaining The Consistently Inconsistent Results of Charter Schools

This is the second in a series of three posts about charter schools. Here is the first part, and here is the third.

As discussed in a previous post, there is a fairly well-developed body of evidence showing that charter and regular public schools vary widely in their impacts on achievement growth. This research finds that, on the whole, there is usually not much of a difference between them, and when there are differences, they tend to be very modest. In other words, there is nothing about "charterness" that leads to strong results.

It is, however, the exceptions that are often most instructive to policy. By taking a look at the handful of schools that are successful, we might finally start moving past the “horse race” incarnation of the charter debate, and start figuring out which specific policies and conditions are associated with success, at least in terms of test score improvement (which is the focus of this post).

Unfortunately, this question is also extremely difficult to answer – policies and conditions are not randomly assigned to schools, and it’s very tough to disentangle all the factors (many unmeasurable) that might affect achievement. But the available evidence at this point is sufficient to start draw a few highly tentative conclusions about “what works."

The Evidence On Charter Schools

** Also posted here on "Valerie Strauss' Answer Sheet" in the Washington Post and here on the Huffington Post

This is the first in a series of three posts about charter schools. Here are the second and third parts.

In our fruitless, deadlocked debate over whether charter schools “work," charter opponents frequently cite the so-called CREDO study (discussed here), a 2009 analysis of charter school performance in 16 states. The results indicated that overall charter effects on student achievement were negative and statistically significant in both math and reading, but both effects sizes were tiny. Given the scope of the study, it’s perhaps more appropriate to say that it found wide variation in charter performance within and between states – some charters did better, others did worse and most were no different. On the whole, the size of the aggregate effects, both positive and negative, tended to be rather small.

Recently, charter opponents’ tendency to cite this paper has been called “cherrypicking." Steve Brill sometimes levels this accusation, as do others. It is supposed to imply that CREDO is an exception – that most of the evidence out there finds positive effects of charter schools relative to comparable regular public schools.

CREDO, while generally well-done given its unprecedented scope, is a bit overused in our public debate – one analysis, no matter how large or good, cannot prove or disprove anything. But anyone who makes the “cherrypicking” claim is clearly unfamiliar with the research. CREDO is only one among a number of well-done, multi- and single-state studies that have reached similar conclusions about overall test-based impacts.

This is important because the endless back-and-forth about whether charter schools “work” – whether there is something about "charterness" that usually leads to fantastic results – has become a massive distraction in our education debates. The evidence makes it abundantly clear that that is not the case, and the goal at this point should be to look at the schools of both types that do well, figure out why, and use that information to improve all schools.

When The Legend Becomes Fact, Print The Fact Sheet

The New Teacher Project (TNTP) just released a "fact sheet" on value-added (VA) analysis. I’m all for efforts to clarify complex topics such as VA, and, without question, there is a great deal of misinformation floating around on this subject, both "pro-" and "anti-."

The fact sheet presents five sets of “myths and facts." Three of the “myths” seem somewhat unnecessary: that there’s no research behind VA; that teachers will be evaluated based solely on test scores; and that VA is useless because it’s not perfect. Almost nobody believes or makes these arguments (at least in my experience). But I guess it never hurts to clarify.

In contrast, the other two are very common arguments, but they are not myths. They are serious issues with concrete policy implications. If there are any myths, they're in the "facts" column.

The False Conflict Between Unionism and Professionalism

Some people have the unfortunate idea that unionism is somehow antithetical to or incompatible with being a professional. This notion is particularly salient within education circles, where phrases like “treat teachers like professionals” are often used as implicit arguments against policies associated with unions, such as salary schedules and tenure (examples here, here, here and here).

Let’s take a quick look at this "conflict," first by examining union membership rates among professionals versus workers in other types of occupations. As shown in the graph below, if union membership and professionalism don’t mix, we have a little problem: Almost one in five professionals is a union member. Actually, union membership is higher among professionals than among any other major occupational category except construction workers.