Why Teacher Evaluation Reform Is Not A Failure

The RAND Corporation recently released an important report on the impact of the Gates Foundation’s “Intensive Partnerships for Effective Teaching” (IPET) initiative. IPET was a very thorough and well-funded attempt to improve teaching quality in schools in three districts and four charter management organizations (CMOs). The initiative was multi-faceted, but its centerpiece was the implementation of multi-measure teacher evaluation systems and the linking of ratings from those systems to professional development and high stakes personnel decisions, including compensation, tenure, and dismissal. This policy, particularly the inclusion in teacher evaluations of test-based productivity measures (e.g., value-added scores), has been among the most controversial issues in education policy throughout the past 10 years.

The report is extremely rich and there's a lot of interesting findings in there, so I would encourage everyone to read it themselves (at least the executive summary), but the headline finding was that the IPET had no discernible effect on student outcomes, namely test scores and graduation rates, in the districts that participated, vis-à-vis similar districts that did not. Given that IPET was so thoroughly designed and implemented, and that it was well-funded, it can potentially be viewed as a "best case scenario" test of the type of evaluation reform that most states have enacted. Accordingly, critics of these reforms, who typically focus their opposition on the high stakes use of evaluation measures, particularly value-added and other test-based measures, in these evaluations, have portrayed the findings as vindication of their opposition. 

This reaction has merit. The most important reason why is that evaluation reform was portrayed by advocates as a means to immediate and drastic improvements in student outcomes. This promise was misguided from the outset, and evaluation reform opponents are (and were) correct in pointing this out. At the same time, however, it would be wise not to dismiss evaluation reform as a whole, for several reasons, a few of which are discussed below.

We Can't Graph Our Way Out Of The Research On Education Spending

The graph below was recently posted by U.S. Education Department (USED) Secretary Betsy DeVos, as part of her response to the newly released scores on the 2017 National Assessment of Educational Progress (NAEP), administered every two years and often called the “nation’s report card.” It seems to show a massive increase in per-pupil education spending, along with a concurrent flat trend in scores on the fourth grade reading version of NAEP. The intended message is that spending more money won’t improve testing outcomes. Or, in the more common phrasing these days, "we can't spend our way out of this problem."

Some of us call it “The Graph.” Versions of it have been used before. And it’s the kind of graph that doesn’t need to be discredited, because it discredits itself. So, why am I bothering to write about it? The short answer is that I might be unspeakably naïve. But we’ll get back to that in a minute.

First, let’s very quickly run through the graph. In terms of how it presents the data, it is horrible practice. The double y-axes, with spending on the left and NAEP scores on the right, are a textbook example of what you might call motivated scaling (and that's being polite). The NAEP scores plotted range from a minimum of 213 in 2000 to a maximum of 222 in 2017, but the graph inexplicably extends all the way up to 275. In contrast, the spending scale extends from just below the minimum observation ($6,000) to just above the maximum ($12,000). In other words, the graph is deliberately scaled to produce the desired visual effect (increasing spending, flat scores). One could very easily rescale the graph to produce the opposite.

What Happened To Teacher Quality?

Starting around 2005 and up until a few years ago, education policy discourse and policymaking was dominated by the issue of improving “teacher quality.” We don’t really hear too much about it the past couple of years, or at least not nearly as much. One of the major reasons why is that the vast majority of states have enacted policies ostensibly designed to improve teacher quality.

Thanks in no small part to the Race to the Top grant program, and the subsequent ESEA waiver program, virtually all states reformed their teacher evaluation systems, the “flagship” policy of the teacher quality push. Many of these states also tied their new evaluation results to high stakes personnel decisions, such as granting tenure, dismissals, layoffs, and compensation. Predictably, the details of these new systems vary quite a bit, both within and between states. Many advocates are unsatisfied with how the new policies were designed, and one could write a book on all the different issues. Yet it would be tough to deny that this national policy effort was among the fastest shifts in recent educational history, particularly given the controversy surrounding it.

So, what happened to all the attention to teacher quality? It was put into practice. The evidence on its effects is already emerging, but this will take a while, and so it is still a quiet time in teacher quality land, at least compared to the previous 5-7 years. Even so, there are already many lessons out there, too many for a post. Looking back, though, one big picture lesson – and definitely not a new one – is about how the evaluation reform effort stands out (in a very competitive field) for the degree to which it was driven by the promise of immediate, large results.

What Do Schools Fostering A Teacher “Growth Mindset” Look Like?

Our guest authors today are Stefanie Reinhorn, Susan Moore Johnson, and Nicole Simon. Reinhorn is an independent consultant working with school systems on Instructional Rounds and school improvement.  Johnson is the Jerome T Murphy Research Professor at the Harvard Graduate School of Education.  Simon is a director in the Office of K-16 Initiatives at the City University of New York. The authors are researchers at The Project on the Next Generation of Teachers at Harvard Graduate School of Education. This piece is adapted from the authors’ chapter in Teaching in Context: The Social Side of Education Reform edited by Esther Quintero (Harvard Education Press, 2017).

Carol Dweck’s theories about motivation and development have become mainstream in schools since her book, Mindset, was published in 2006.  It is common to hear administrators, teachers, parents, and even students talk about helping young learners adopt a “growth mindset” --expecting and embracing the idea of developing knowledge and skills over time, rather than assuming individuals are born with fixed abilities.  Yet, school leaders and teachers scarcely talk about how to adopt a growth mindset for themselves—one that assumes that educators, not only the students they teach, can improve with support and practice. Many teachers find it hard to imagine working in a school with a professional culture designed to cultivate their development, rather than one in which their effectiveness is judged and addressed with rewards and sanctions.  However, these schools do exist.

In our research (see herehere and here*), we selected and studied six high-performing, high-poverty urban schools so that we could understand how these schools were beating the odds. Specifically, we wondered what they did to attract and develop teachers, and how teachers experienced working there. These schools, all located in one Massachusetts city, included: one traditional district school; two district turnaround schools; two state charter schools; and one charter-sponsored restart school. Based on interviews with 142 teachers and administrators, we concluded that all six schools fostered and supported a “growth mindset” for their educators.

The Social Side Of Capability: Improving Educational Performance By Attending To Teachers’ And School Leaders’ Interactions About Instruction

Our guest authors today are Matthew Shirrell, James P. Spillane, Megan Hopkins, and Tracy Sweet. Shirrell is an Assistant Professor of Educational Leadership and Administration in the Graduate School of Education and Human Development at George Washington University. Spillane is the Spencer T. and Ann W. Olin Professor in Learning and Organizational Change at the School of Education and Social Policy at Northwestern University. Hopkins is Assistant Professor of Education Studies at the University of California, San Diego. Sweet is an Assistant Professor in the Measurement, Statistics and Evaluation program in the Department of Human Development and Quantitative Methodology at the University of Maryland. This piece is adapted from the authors’ chapter in Teaching in Context: The Social Side of Education Reform edited by Esther Quintero (Harvard Education Press, 2017).

The last two decades have witnessed numerous educational reforms focused on measuring the performance of teachers and school leaders. Although these reforms have produced a number of important insights, efforts to measure teacher and school leader performance have often overlooked the fact that performance is not simply an individual matter, but also a social one. Theory and research dating back to the last century suggest that individuals use their social relationships to access resources that can improve their capability and, in turn, their performance. Scholars refer to such real or potential resources accessed through relationships as “social capital,” and research in schools has demonstrated the importance of this social capital to a variety of key school processes and outcomes, such as instructional improvement and student performance.

We know that social relationships are the necessary building blocks of this social capital; we also know that social relationships within schools (as in other settings) don’t arise simply by chance. Over the last decade, we have studied the factors that predict social relationships both within and between schools by examining interactions about instruction among school and school system staff. As suggested by social capital theory, such interactions are important because they facilitate access to social resources such as advice and information. Thus, understanding the predictors of these interactions can help us determine what it might take to build social capital in our schools and school systems. In this post, we briefly highlight two major insights from our work; for more details, see our chapter in Teaching in Context.

Promoting Productive Collaboration Through Inquiry: The Limits Of Policy Mandates

Our guest author today is Robert Shand, the Novice G. Fawcett Postdoctoral Researcher in Educational Studies at The Ohio State University. His research focuses on the economics of education, teacher collaboration and professional development, and how teachers and school leaders make decisions based on data and research to improve student outcomes.

In some ways, it is hard to dispute the traditional view that K-12 teaching is a professionally solitary activity. At the end of the day, most instruction still occurs with a single teacher standing in front of a classroom. When I tell folks that I study teacher collaboration for a living, some are puzzled – other than team teaching, what would teachers even collaborate about? Some former colleagues from my time as a middle and high school teacher even bristle at the growing demands by administrators that they collaborate. These former colleagues no doubt envision pointless meetings, contrived team-based scenarios, and freeloading colleagues trying to offload their work onto others.

Despite these negative preconceptions, there is growing evidence that meaningful work with colleagues can enhance teacher productivity, effectiveness, and professional growth, and even increase job satisfaction. Teachers can share ideas and instructional strategies, divide the work of developing curriculum, learn from colleagues, and analyze data and evidence to solve instructional problems and help meet diverse student needs. The evidence for the potential benefits of collaboration is so compelling, and collaborative work in education is becoming so pervasive, that the Every Student Succeeds Act legally redefines professional development to include “collaborative” as part of the definition.

A Closer Look At Our Report On Public And Private School Segregation In DC

Last week, we released our research brief on segregation by race and ethnicity in the District of Columbia. The analysis is unique insofar as it includes regular public schools, charter schools, and private schools, thus providing a comprehensive look at segregation in our nation’s capital.

Private schools serve only about 17 percent of D.C.’s students, but almost 60 percent of its white students. This means that any analysis of segregation in D.C. that excludes private schools may be missing a pretty big part of the picture. Our brief includes estimates of segregation, using different types of measures, within the private and public sectors (including D.C.’s large charter school sector). Unsurprisingly, we find high levels of segregation in both sectors, using multiple race and ethnicity comparisons. Yet, while segregation in both sectors is extensive, it is not substantially higher in one or the other.

But one of our most interesting findings, which we’d like to discuss here, is that between 25-40 percent of total citywide segregation is actually found between the public and private sectors. This is not a particularly intuitive finding to interpret, so a quick explanation may be useful.

The Theory And Practice Of School Closures

The idea of closing “low performing schools” has undeniable appeal, at least in theory. The basic notion is that some schools are so dysfunctional that they cannot be saved and may be doing irreparable harm to their students every day they are open. Thus, it is argued, closing such schools and sending their students elsewhere is the best option – even if students end up in “average” schools, proponents argue, they will be better off.

Such closures are very controversial, however, and for good reason. For one thing, given adequate time and resources, schools may improve – i.e., there are less drastic interventions that might be equally (or more) effective as a way to help students. Moreover, closing a school represents a disruption in students’ lives (and often, by the way, to the larger community). In this sense, any closure must offer cumulative positive effects sufficient to offset an initial negative effect. Much depends on how and why schools are identified for closure, and the quality of the schools that displaced students attend. In practice, then, closure is a fairly risky policy, both educationally and (perhaps especially) politically. This disconnect between the appeal of theoretical school closures and the actual risks, in practice, may help explain why U.S. educational policy has been designed such that many schools operate at some risk of closure, but relatively few ever end up shutting their doors.

Despite the always contentious debates about the risks and merits of closing “low performing schools,” there has not been a tremendous amount of strong evidence about effects (in part because such closures have been somewhat rare). A new report by the Center for Research on Education Outcomes (CREDO) helps fill the gap, using a very large dataset to examine the test-based impact of school closures (among other things). The results speak directly to the closure debate, in both specific and general terms, but interpreting them is complicated by the fact that this analysis evaluates what is at best a policy done poorly.

Where Do Achievement Gaps Come From?

For almost two decades now, educational accountability policy in the U.S. has included a focus on the performance of student subgroups, such as those defined by race and ethnicity, income, or special education status. The (very sensible) logic behind this focus is the simple fact that aggregate performance measures, whether at the state-, district-, or school levels, often mask large gaps between subgroups.

Yet one of the unintended consequences of this subgroup focus has been confusion among both policymakers and the public as to how to interpret and use subgroup indicators in formal school accountability systems, particularly when those indicators are expressed as simple “achievement gaps” or “gap closing” measures. This is not only because achievement gaps can narrow for undesirable reasons and widen for desirable reasons, but also because many gaps exist prior to entry into the school (or district). If, for instance, a large Hispanic/White achievement gap for a given cohort exists at the start of kindergarten, it is misleading and potentially damaging to hold a school accountable for the persistence of that gap in later grades – particularly in cases where public policy has failed to provide the extra resources and supports that might help lower-performing students make accelerated achievement gains every year. In addition, the coarseness of current educational variables, particularly those usually used as income proxies, limits the detail and utility of some subgroup measures.

A helpful and timely little analysis by David Figlio and Krzystof Karbownik, published by the Brookings Institution, addresses some of these issues, and the findings have clear policy implications.

ESSA: An Opportunity For Research-Practice Partnerships To Support Districts And States

Our guest authors today are Bill Penuel, professor of learning sciences and human development in the School of Education at the University of Colorado Boulder, and Caitlin C. Farrell, director of the National Center of Research in Policy and Practice (NCRPP) at the University of Colorado Boulder. This piece is adapted from the authors’ chapter in Teaching in Context: The Social Side of Education Reform, edited by Esther Quintero (Harvard Education Press, 2017).

Many parts of the Every Student Succeeds Act (ESSA) call on schools, districts, and states to select “evidence-based programs.” Many state plans now being developed include strategies for meeting these provisions of the law. These state plans in development vary widely. Some mainly pass through responsibilities for selecting evidence-based programs to districts. Other states are considering ways to integrate continuous improvement research that would focus on studying the implementation of evidence-based programs.

Our book chapter in Teaching in Context: The Social Side of Reform presents a number of scenarios where long-term research-practice partnerships (RPPs) have helped districts select, adapt, and design evidence-based programs. RPPs are long-term, mutually beneficial relationships between practitioners and researchers around problems of practice. This promising strategy has been growing in popularity in recent years, and there is now even a network of RPPs to support exchange among them.