Charter And Regular Public School Performance In "Ohio 8" Districts, 2010-11

Every year, the state of Ohio releases an enormous amount of district- and school-level performance data. Since Ohio has among the largest charter school populations in the nation, the data provide an opportunity to examine performance differences between charters and regular public schools in the state.

Ohio’s charters are concentrated largely in the urban “Ohio 8” districts (sometimes called the “Big 8”): Akron; Canton; Cincinnati; Cleveland; Columbus; Dayton; Toledo; and Youngstown. Charter coverage varies considerably between the “Ohio 8” districts, but it is, on average, about 20 percent, compared with roughly five percent across the whole state. I will therefore limit my quick analysis to these districts.

Let’s start with the measure that gets the most attention in the state: Overall “report card grades." Schools (and districts) can receive one of six possible ratings: Academic emergency; academic watch; continuous improvement; effective; excellent; and excellent with distinction.

These ratings represent a weighted combination of four measures. Two of them measure performance “growth," while the other two measure “absolute” performance levels. The growth measures are AYP (yes or no), and value-added (whether schools meet, exceed, or come in below the growth expectations set by the state’s value-added model). The first “absolute” performance measure is the state’s “performance index," which is calculated based on the percentage of a school’s students who fall into the four NCLB categories of advanced, proficient, basic and below basic. The second is the number of “state standards” that schools meet as a percentage of the number of standards for which they are “eligible." For example, the state requires 75 percent proficiency in all the grade/subject tests that a given school administers, and schools are “awarded” a “standard met” for each grade/subject in which three-quarters of their students score above the proficiency cutoff (state standards also include targets for attendance and a couple of other non-test outcomes).

The graph below presents the raw breakdown in report card ratings for charter and regular public schools.

Our Annual Testing Data Charade

Every year, around this time, states and districts throughout the nation release their official testing results. Schools are closed and reputations are made or broken by these data. But this annual tradition is, in some places, becoming a charade.

Most states and districts release two types of assessment data every year (by student subgroup, school and grade): Average scores (“scale scores”); and the percent of students who meet the standards to be labeled proficient, advanced, basic and below basic. The latter type – the rates – are of course derived from the scores – that is, they tell us the proportion of students whose scale score was above the minimum necessary to be considered proficient, advanced, etc.

Both types of data are cross-sectional. They don’t follow individual students over time, but rather give a “snapshot” of aggregate performance among two different groups of students (for example, third graders in 2010 compared with third graders in 2011). Calling the change in these results “progress” or “gains” is inaccurate; they are cohort changes, and might just as well be chalked up to differences in the characteristics of the students (especially when changes are small). Even averaged across an entire school or district, there can be huge differences in the groups compared between years – not only is there often considerable student mobility in and out of schools/districts, but every year, a new cohort enters at the lowest tested grade, while a whole other cohort exits at the highest tested grade (except for those retained).

For these reasons, any comparisons between years must be done with extreme caution, but the most common way - simply comparing proficiency rates between years - is in many respects the worst. A closer look at this year’s New York City results illustrates this perfectly.

Melodramatic

At a press conference earlier this week, New York City Mayor Michael Bloomberg announced the city’s 2011 test results. Wall Street Journal reporter Lisa Fleisher, who was on the scene, tweeted Mayor Bloomberg’s remarks. According to Fleisher, the mayor claimed that there was a “dramatic difference” between his city’s testing progress between 2010 and 2011, as compared with the rest of state.

Putting aside the fact that the results do not measure “progress” per se, but rather cohort changes – a comparison of cross-sectional data that measures the aggregate performance of two different groups of students – I must say that I was a little astounded by this claim. Fleisher was also kind enough to tweet a photograph that the mayor put on the screen in order to illustrate the “dramatic difference” between the gains of NYC students relative to their non-NYC counterparts across the state.  Here it is:

When It Comes To How We Use Evidence, Is Education Reform The New Welfare Reform?

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

In the mid-1990s, after a long and contentious debate, the U.S. Congress passed the Personal Responsibility and Work Opportunity Reconciliation Act of 1996, which President Clinton signed into law. It is usually called the “Welfare Reform Act," as it effectively ended the Aid to Families with Dependent Children (AFDC) program (which is what most people mean when they say “welfare," even though it was [and its successor is] only a tiny part of our welfare state). Established during the New Deal, AFDC was mostly designed to give assistance to needy young children (it was later expanded to include support for their parents/caretakers as well).

In place of AFDC was a new program – Temporary Assistance for Needy Families (TANF). TANF gave block grants to states, which were directed to design their own “welfare” programs. Although the states were given considerable leeway, their new programs were to have two basic features: first, for welfare recipients to receive benefits, they had to be working; and second, there was to be a time limit on benefits, usually 3-5 years over a lifetime, after which individuals were no longer eligible for cash assistance (states could exempt a proportion of their caseload from these requirements). The general idea was that time limits and work requirements would “break the cycle of poverty”; recipients would be motivated (read: forced) to work, and in doing so, would acquire the experience and confidence necessary for a bootstrap-esque transformation.

There are several similarities between the bipartisan welfare reform movement of the 1990s and the general thrust of the education reform movement happening today. For example, there is the reliance on market-based mechanisms to “cure” longstanding problems, and the unusually strong liberal-conservative alliance of the proponents. Nevertheless, while calling education reform “the new welfare reform” might be a good soundbyte, it would also take the analogy way too far.

My intention here is not to draw a direct parallel between the two movements in terms of how they approach their respective problems (poverty/unemployment and student achievement), but rather in how we evaluate their success in doing so. In other words, I am concerned that the manner in which we assess the success or failure of education reform in our public debate will proceed using the same flawed and misguided methods that were used by many for welfare reform.

Settling Scores

In 2007, when the D.C. City Council passed a law giving the mayor control of public schools, it required that a five-year independent evaluation be conducted to document the law’s effects and suggest changes. The National Research Council (a division of the National Academies) was charged with performing this task. As reported by Bill Turque in the Washington Post, the first report was released a couple of weeks ago.

The primary purpose of this first report was to give “first impressions” and offer advice on how the actual evaluation should proceed. It covered several areas – finance, special programs, organizational structure, etc. – but, given the controversy surrounding Michelle Rhee’s tenure, the section on achievement results got the most attention. The team was only able to analyze preliminary performance data; the same data that are used constantly by Rhee, her supporters, and her detractors to judge her tenure at the helm of DCPS.

It was one of those reports that tells us what we should already know, but too often fail to consider.

A List Of Education And Related Data Resources

We frequently present quick analyses of data on this blog (and look at those done by others). As a close follower of the education debate, I often get the sense that people are hungry for high-quality information on a variety of different topics, but searching for these data can be daunting, which probably deters many people from trying.

So, while I’m sure that many others have compiled lists of data resources relevant to education, I figured I would do the same, with a focus on more user-friendly sources.

But first, I would be remiss if I didn’t caution you to use these data carefully. Almost all of the resources below have instructions or FAQ’s, most non-technical. Read them. Remember that improper or misleading presentation of data is one of the most counterproductive features of today’s education debates, and it occurs to the detriment of all.

That said, here are a few key resources for education and other related quantitative data. It is far from exhaustive, so feel free to leave comments and suggestions if you think I missed anything important.

The Legend Of Last Fall

The subject of Michelle Rhee’s teaching record has recently received a lot of attention. While the controversy has been interesting, it could also be argued that it’s relatively unimportant. The evidence that she exaggerated her teaching prowess is, after all, inconclusive (though highly suggestive). A little resume inflation from a job 20 years ago might be overlooked, so long as Rhee’s current claims about her more recent record are accurate. But are they?

On Rhee’s new website, her official bio - in effect, her resume today (or at least her cover letter) - contains a few sentences about her record as chancellor of D.C Public Schools (DCPS), under the header "Driving Unprecedented Growth in the D.C. Public Schools." There, her test-based accomplishments are characterized as follows:

Under her leadership, the worst performing school district in the country became the only major city system to see double-digit growth in both their state reading and state math scores in seventh, eighth and tenth grades over three years.
This time, we can presume that the statement has been vetted thoroughly, using all the tools of data collection and analysis available to Rhee during her tenure at the helm of DCPS.

But the statement is false.

PISA For Our Time: A Balanced Look

Press coverage of the latest PISA results over the past two months has almost been enough to make one want to crawl under the bed and hide. Over and over, we’ve been told that this is a “Sputnik moment," that the U.S. among the lowest performing nations in the world, and that we’re getting worse.

Thankfully, these claims are largely misleading. Insofar as we’re sure to hear them repeated often over the next few years—at least until the next set of international results come in — it makes sense to try to correct the record (also see here and here).

But, first, I want to make it very clear that U.S. PISA results are not good enough by any stretch of the imagination, and we can and should do a whole lot better. Nevertheless, international comparisons of any kind are very difficult, and if we don’t pay careful attention to what the data are really telling us, it will be more difficult to figure out how to respond appropriately.

This brings me to three basic points about the 2009 PISA results that we need to bear in mind.

Michelle Rhee's Testing Legacy: An Open Question

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post.

Michelle Rhee’s resignation and departure have, predictably, provoked a flurry of conflicting reactions. Yet virtually all of them, from opponents and supporters alike, seem to assume that her tenure at the helm of the D.C. Public Schools (DCPS) helped to boost student test scores dramatically. She and D.C. Mayor Adrian Fenty made similar claims themselves in the Wall Street Journal (WSJ) just last week.

Hardly anybody, regardless of their opinion about Michelle Rhee, thinks that test scores alone are an adequate indicator of student success. But, in no small part because of her own emphasis on them, that is how this debate has unfolded. Her aim was to raise scores and, with few exceptions (also here and here), even those who objected to her “abrasive” style and controversial policies seem to believe that she succeeded wildly in the testing area.

This conclusion is premature. A review of the record shows that Michelle Rhee’s test score “legacy” is an open question. 

There are three main points to consider:

The Cost Of Success In Education

Many are skeptical of the current push to improve our education system by means of test-based “accountability” - hiring, firing, and paying teachers and administrators, as well as closing and retaining schools, based largely on test scores. They say it won’t work. I share their skepticism, because I think it will.

There is a simple logic to this approach: when you control the supply of teachers, leaders, and schools based on their ability to increase test scores, then this attribute will become increasingly common among these individuals and institutions. It is called “selecting on the dependent variable," and it is, given the talent of the people overseeing this process and the money behind it, a decent bet to work in the long run.

Now, we all know the arguments about the limitations of test scores. We all know they’re largely true. Some people take them too far, others are too casual in their disregard. The question is not whether test scores provide a comprehensive measure of learning or subject mastery (of course they don’t). The better question is the extent to which teachers (and schools) who increase test scores a great deal are imparting and/or reinforcing the skills and traits that students will need after their K-12 education, relative to teachers who produce smaller gains. And this question remains largely unanswered.

This is dangerous, because if there is an unreliable relationship between teaching essential skills and the boosting of test scores, then success is no longer success. And by selecting teachers and schools based on those scores, we will have deliberately engineered our public education system to fail in spite of success.

It may be only then that we truly realize what we have done.