• A Few Quick Fixes For School Accountability Systems

    Our guest authors today are Morgan Polikoff and Andrew McEachin. Morgan is Assistant Professor in the Rossier School of Education at the University of Southern California. Andrew is an Institute of Education Science postdoctoral fellow at the University of Virginia.

    In a previous post, we described some of the problems with the Senate's Harkin-Enzi plan for reauthorizing the No Child Left Behind Act, based on our own analyses, which yielded three main findings. First, selecting the bottom 5% of schools for intervention based on changes in California’s composite achievement index resulted in remarkably unstable rankings. Second, identifying the bottom 5% based on schools' lowest performing subgroup overwhelmingly targeted those serving larger numbers of special education students. Third and finally, we found evidence that middle and high schools were more likely to be identified than elementary schools, and smaller schools more likely than larger schools.

    None of these findings was especially surprising (see here and here, for instance), and could easily have been anticipated. Thus, we argued that policymakers need to pay more attention to the vast (and rapidly expanding) literature on accountability system design.

  • Why Nobody Wins In The Education "Research Wars"

    ** Reprinted here in the Washington Post

    In a recent post, Kevin Drum of Mother Jones discusses his growing skepticism about the research behind market-based education reform, and about the claims that supporters of these policies make. He cites a recent Los Angeles Times article, which discusses how, in 2000, the San Jose Unified School District in California instituted a so-called “high expectations” policy requiring all students to pass the courses necessary to attend state universities. The reported percentage of students passing these courses increased quickly, causing the district and many others to declare the policy a success. In 2005, Los Angeles Unified, the nation's second largest district, adopted similar requirements.

    For its part, the Times performed its own analysis, and found that the San Jose pass rate was actually no higher in 2011 compared with 2000 (actually, slightly lower for some subgroups), and that the district had overstated its early results by classifying students in a misleading manner. Mr. Drum, reviewing these results, concludes: “It turns out it was all a crock."

    In one sense, that's true – the district seems to have reported misleading data. On the other hand, neither San Jose Unified's original evidence (with or without the misclassification) nor the Times analysis is anywhere near sufficient for drawing conclusions - "crock"-based or otherwise - about the effects of this policy. This illustrates the deeper problem here, which is less about one “side” or the other misleading with research, but rather something much more difficult to address: Common misconceptions that impede deciphering good evidence from bad.

  • Living In The Tails Of The Rhetorical And Teacher Quality Distributions

    A few weeks ago, Students First NY (SFNY) released a report, in which they presented a very simple analysis of the distribution of “unsatisfactory” teacher evaluation ratings (“U-ratings”) across New York City schools in the 2011-12 school year.

    The report finds that U-ratings are distributed unequally. In particular, they are more common in schools with higher poverty, more minorities, and lower proficiency rates. Thus, the authors conclude, the students who are most in need of help are getting the worst teachers.

    There is good reason to believe that schools serving larger proportions of disadvantaged students have a tougher time attracting, developing and retaining good teachers, and there is evidence of this, even based on value-added estimates, which adjust for these characteristics (also see here). However, the assumptions upon which this Students First analysis is based are better seen as empirical questions, and, perhaps more importantly, the recommendations they offer are a rather crude, narrow manifestation of market-based reform principles.

  • Value-Added As A Screening Device: Part II

    Our guest author today is Douglas N. Harris, associate professor of economics and University Endowed Chair in Public Education at Tulane University in New Orleans. His latest bookValue-Added Measures in Education, provides an accessible review of the technical and practical issues surrounding these models. 

    This past November, I wrote a post for this blog about shifting course in the teacher evaluation movement and using value-added as a “screening device.”  This means that the measures would be used: (1) to help identify teachers who might be struggling and for whom additional classroom observations (and perhaps other information) should be gathered; and (2) to identify classroom observers who might not be doing an effective job.

    Screening takes advantage of the low cost of value-added and the fact that the estimates are more accurate in making general assessments of performance patterns across teachers, while avoiding the weaknesses of value-added—especially that the measures are often inaccurate for individual teachers, as well as confusing and not very credible among teachers when used for high-stakes decisions.

    I want to thank the many people who responded to the first post. There were three main camps.

  • Making Sense Of Florida's School And Teacher Performance Ratings

    Last week, Florida State Senate President Don Gaetz (R – Niceville) expressed his skepticism about the recently-released results of the state’s new teacher evaluation system. The senator was particularly concerned about his comparison of the ratings with schools’ “A-F” grades. He noted, “If you have a C school, 90 percent of the teachers in a C school can’t be highly effective. That doesn’t make sense."

    There’s an important discussion to be had about the results of both the school and teacher evaluation systems, and the distributions of the ratings can definitely be part of that discussion (even if this issue is sometimes approached in a superficial manner). However, arguing that we can validate Florida’s teacher evaluations using its school grades, or vice-versa, suggests little understanding of either. Actually, given the design of both systems, finding a modest or even weak association between them would make pretty good sense.

    In order to understand why, there are two facts to consider.

  • The Cartography Of High Expectations

    In October of last year, the education advocacy group ConnCAN published a report called “The Roadmap to Closing the Gap” in Connecticut. This report says that the state must close its large achievement gaps by 2020 – that is, within eight years – and they use to data to argue that this goal is “both possible and achievable."

    There is value in compiling data and disaggregating them by district and school. And ConnCAN, to its credit, doesn't use this analysis as a blatant vehicle to showcase its entire policy agenda, as advocacy organizations often do. But I am compelled to comment on this report, mostly as a springboard to a larger point about expectations.

    However, first things first – a couple of very quick points about the analysis. There are 60-70 pages of district-by-district data in this report, all of it portrayed as a “roadmap” to closing Connecticut’s achievement gap. But it doesn't measure gaps and won't close them.

  • Are Charter Schools Better Able To Fire Low-Performing Teachers?

    Charter schools, though they comprise a remarkably diverse sector, are quite often subject to broad generalizations. Opponents, for example, promote the characterization of charters as test prep factories, though this is a sweeping claim without empirical support. Another common stereotype is that charter schools exclude students with special needs. It is often (but not always) true that charters serve disproportionately fewer students with disabilities, but the reasons for this are complicated and vary a great deal, and there is certainly no evidence for asserting a widespread campaign of exclusion.

    Of course, these types of characterizations, which are also leveled frequently at regular public schools, don't always take the form of criticism. For instance, it is an article of faith among many charter supporters that these schools, thanks to the fact that relatively few are unionized, are better able to aggressively identify and fire low-performing teachers (and, perhaps, retain high performers). Unlike many of the generalizations from both "sides," this one is a bit more amenable to empirical testing.

    A recent paper by Joshua Cowen and Marcus Winters, published in the journal Education Finance and Policy, is among the first to take a look, and some of the results might be surprising.

  • Moving From Ideology To Evidence In The Debate About Public Sector Unions

    Drawing on a half century of empirical evidence, as well as new data and analysis, a team of scholars has  challenged the substance of many of the attacks on public employees and their unions –urging political leaders and the research community to take this “transformational” moment in the divisive and ideologically driven debate over the role of government and the value of public services to deepen their commitment to evidence-based policy ideas.

    These arguments were outlined in "The Great New Debate about Unionism and Collective Bargaining in U.S. State and Local  Governments," published by Cornell University’s ILR Review.  The authors – David Lewin (UCLA), Jeffrey Keefe (Rutgers), and Thomas Kochan (MIT) – point out that, with half a century of experience, there is now a wealth of data by which to evaluate public sector unionism and its effects.

    In that context, the authors spell out the history, arguments and empirical findings on three key issues: 1) Are public employees overpaid?; 2) Do labor-management dispute resolution procedures, which are part of many state and local government collective bargaining laws, enhance or hinder effective governance?; 3) Have unions and managers in the public sector demonstrated the ability to respond constructively to fiscal crises?

  • A Few Points About The Instability Of Value-Added Estimates

    One of the most frequent criticisms of value-added and other growth models is that they are "unstable" (or, more accurately, modestly stable). For instance, a teacher who is rated highly in one year might very well score toward the middle of the distribution – or even lower – in the next year (see here, here and here, or this accessible review).

    Some of this year-to-year variation is “real." A teacher might get better over the course of a year, or might have a personal problem that impedes their job performance. In addition, there could be changes in educational circumstances that are not captured by the models – e.g., a change in school leadership, new instructional policies, etc. However, a great deal of the the recorded variation is actually due to sampling error, or idiosyncrasies in student testing performance. In other words, there is a lot of “purely statistical” imprecision in any given year, and so the scores don’t always “match up” so well between years. As a result, value-added critics, including many teachers, argue that it’s not only unfair to use such error-prone measures for any decisions, but that it’s also bad policy, since we might reward or punish teachers based on estimates that could be completely different the next year.

    The concerns underlying these arguments are well-founded (and, often, casually dismissed by supporters and policymakers). At the same time, however, there are a few points about the stability of value-added (or lack thereof) that are frequently ignored or downplayed in our public discourse. All of them are pretty basic and have been noted many times elsewhere, but it might be useful to discuss them very briefly. Three in particular stand out.

  • When Growth Isn't Really Growth

    Let’s try a super-simple thought experiment with data. Suppose we have an inner-city middle school serving grades 6-8. Students in all three grades take the state exam annually (in this case, we’ll say that it’s at the very beginning of the year). Now, for the sake of this illustration, let’s avail ourselves of the magic of hypotheticals and assume away many of the sources of error that make year-to-year changes in public testing data unreliable.

    First, we’ll say that this school reports test scores instead of proficiency rates, and that the scores are comparable between grades. Second, every year, our school welcomes a new cohort of sixth graders that is the exact same size and has the exact same average score as preceding cohorts – 30 out of 100, well below the state average of 65. Third and finally, there is no mobility at this school. Every student who enters sixth grade stays there for three years, and goes to high school upon completion of eighth grade. No new students are admitted mid-year.

    Okay, here’s where it gets interesting: Suppose this school is phenomenally effective in boosting its students’ scores. In fact, each year, every single student gains 20 points. It is the highest growth rate in the state. Believe it or not, using the metrics we commonly use to judge schoolwide “growth” or "gains," this school would still look completely ineffective. Take a look at the figure below.