The Thrill Of Success, The Agony Of Measurement

** Reprinted here in the Washington Post

The recent release of the latest New York State testing results created a little public relations coup for the controversial Success Academies charter chain, which operates over 20 schools in New York City, and is seeking to expand.

Shortly after the release of the data, the New York Post published a laudatory article noting that seven of the Success Academies had overall proficiency rates that were among the highest in the state, and arguing that the schools “live up to their name." The Daily News followed up by publishing an op-ed that compares the Success Academies' combined 94 percent math proficiency rate to the overall city rate of 35 percent, and uses that to argue that the chain should be allowed to expand because its students “aced the test” (this is not really what high proficiency rates mean, but fair enough).

On the one hand, this is great news, and a wonderfully impressive showing by these students. On the other, decidedly less sensational hand, it's also another example of the use of absolute performance indicators (e.g., proficiency rates) as measures of school rather than student performance, despite the fact that they are not particularly useful for the former purpose since, among other reasons, they do not account for where students start out upon entry to the school. I personally don't care whether Success Academy gets good or bad press. I do, however, believe that how one gauges effectiveness, test-based or otherwise, is important, even if one reaches the same conclusion using different measures.

The Semantics of Test Scores

Our guest author today is Jennifer Borgioli, a Senior Consultant with Learner-Centered Initiatives, Ltd., where she supports schools with designing performance based assessments, data analysis, and curriculum design.

The chart below was taken from the 2014 report on student performance on the Grades 3-8 tests administered by the New York State Department of Education.

Based on this chart, which of the following statements is the most accurate?

A. “64 percent of 8th grade students failed the ELA test”

B. “36 percent of 8th graders are at grade level in reading and writing”

C. “36 percent of students meet or exceed the proficiency standard (Level 3 or 4) on the Grade 8 CCLS-aligned math test”

What Kindergartners Might Teach Us About Test-Based Accountability

There is an ongoing debate about widespread administration of standardized tests to kindergartners. This is of course a serious decision. My personal opinion about whether this is a good idea depends on several factors, such as how good the tests will be and, most importantly, how the results will be used (and I cannot say that I am optimistic about the latter).

Although the policy itself must be considered seriously on its merits, there is one side aspect of testing kindergarteners that fascinates me: It would demonstrate how absurd it is to judge school performance, as does NCLB, using absolute performance levels – i.e., how highly students score on tests, rather than their progress over time.

Basically, the kindergarten tests would inevitably shake out the same way as those administered in later grades. Schools and districts serving more disadvantaged students would score substantially lower than their counterparts in more affluent areas. If the scores were converted to proficiency rates or similar cut-score measures, they would show extremely low pass rates in urban districts such as Detroit.

Contrarians At The Gates

Unlike many of my colleagues, I don’t have a negative view of the Gates Foundation's education programs. Although I will admit that part of me is uneasy with the sheer amount of resources (and influence) they wield, and there are a few areas where I don’t see eye-to-eye with their ideas (or grantees), I agree with them on a great many things, and I think that some of their efforts, such as the Measuring Effective Teachers project, are important and beneficial (even if I found their packaging of the MET results a bit overblown).

But I feel obliged to say that I am particularly impressed with their recent announcement of support for a two-year delay on attaching stakes to the results of new assessments aligned with the Common Core. Granted, much of this is due to the fact that I think this is the correct policy decision (see my opinion piece with Morgan Polikoff). Independent of that, however, I think it took intellectual and political courage for them to take this stance, given their efforts toward new teacher evaluations that include test-based productivity measures.

The announcement was guaranteed to please almost nobody.

A Moral Panic Over Real Accountability?

The late conservative British Prime Minister Margaret Thatcher was famous for declaring “there is no alternative” as she executed her laissez-faire economic policies of austerity and privatization, redistributing wealth and helping to concentrate power into the hands of that nation’s rich and powerful. The notion that current ideas and policies are inescapable, that there can be no feasible or desirable alternatives, became a staple of apologies for the status quo long before Thatcher’s declaration. Unfortunately, it has also become a common trope in discussions of U.S. education policy in the wake of No Child Left Behind and Race to the Top.

A few weeks ago, noted education scholar Linda Darling-Hammond and AFT President Randi Weingarten[i] appeared in the pages of the Huffington Post with an essay declaring that the current accountability regime in American education was badly broken, and that there was indeed an alternative, a system of real accountability, that should be adopted in its stead. What is needed, they reasoned, is nothing less than a paradigm shift from the current fixation on “test and punish” to a “support and improve” model.

Darling-Hammond and Weingarten argued that, after more than a decade of proliferating standardized exams, our curricula had narrowed and too many of our schools had been transformed into ‘test prep’ factories. The linking of these tests to high-stakes decisions about the future of students, educators and schools has created a culture of fear and anxiety that saps student and teacher morale, drains the joy out of teaching and learning and diminishes the quality of education. The mass closure of schools has negatively impacted communities that can least afford to lose their very few public institutions, without meaningfully improving the education of students living in poverty, students of color and immigrant students.

Expectations For Student Performance Under NCLB Waivers

A recent story in the Chicago Tribune notes that Illinois’ NCLB waiver plan sets lower targets for certain student subgroups, including minority and low-income students. This, according to the article, means that “Illinois students of different backgrounds no longer will be held to the same standards," and goes on to quote advocates who are concerned that this amounts to lower expectations for traditionally lower-scoring groups of children.

The argument that expectations should not vary by student characteristics is, of course, valid and important. Nevertheless, as Chad Aldeman notes, the policy of setting different targets for different groups of students has been legally required since the enactment of NCLB, under which states must “give credit to lower-performing groups that demonstrate progress." This was supposed to ensure, albeit with exceedingly crude measures, that schools weren't punished due to the students they serve, and how far behind were those students upon entry into the schools.

I would take that a step further by adding two additional points. The first is quite obvious, and is mentioned briefly in the Tribune article, but too often is obscured in these kinds of conversations: Neither NCLB nor the waivers actually hold students to different standards. The cut scores above which students are deemed “proficient," somewhat arbitrary though they may be, do not vary by student subgroup, or by any other factor within a given state. All students are held to the same exact standard.

Performance Measurement In Healthcare And Education

A recent story in the New York Times reports that, according to an Obama Administration-commissioned panel, the measures being used to evaluate the performance of healthcare providers are unfairly penalizing those that serve larger proportions of disadvantaged patients (thanks to Mike Petrilli for sending me the article). For example, if you’re grading hospitals based on simple, unadjusted re-admittance rates, it might appear as if hospitals serving high poverty populations are doing worse -- even if the quality of their service is excellent -- since readmissions are more likely for patients who can’t afford medication, or aren’t able to take off from work, or don’t have home support systems.

The panel recommended adjusting the performance measures, which, for instance, are used for Medicare reimbursement, using variables such as patient income and education, as this would provide a more fair accountability system – one that does not penalize healthcare institutions and their personnel for factors that are out of their control.

There are of course very strong, very obvious parallels here to education accountability policy, in which schools are judged in part based on raw proficiency rates that make no attempt to account for differences in the populations of students in different schools. The comparison also reveals an important feature of formal accountability systems in other policy fields.

"Show Me What Democracy Looks Like"

Our guest author today is John McCrann, a Math teacher and experiential educator at Harvest Collegiate High School in New York City. John is a member of the America Achieves Fellowship, Youth Opportunities Program, and Teacher Leader Study Group. He tweets at @JohnTroutMcCran.

New York City’s third through eighth graders are in the middle of state tests, and many of our city’s citizens have taken strong positions on the value (or lack thereof) of these assessments.  The protests, arguments and activism surrounding these tests remind me of a day when I was a substitute civics teacher during summer school.  “I need help," Charlotte said as she approached my desk, “what is democracy?"

On that day, my mind flashed to a scene I witnessed outside the White House in the spring of 2003.  On one side of the fence, protestors shouted: “Show me what democracy looks like! This is what democracy looks like!”  On the other side worked an administration who had invaded another country in an effort to “expand democracy." Passionate, bright people on both sides of that fence believed in the idea that Charlotte was asking about, but came to very different conclusions about how to enact the concept. 

Is Selective Admission A School Improvement Plan?

The Washington Post reports that parents and alumni of D.C.’s Dunbar High School have quietly been putting together a proposal to revitalize what the article calls "one of the District's worst performing schools."

Those behind the proposal are not ready to speak about it publicly, and details are still very thin, but the Post article reports that it calls for greater flexibility in hiring, spending and other core policies. Moreover, the core of the plan – or at least its most drastic element - is to make Dunbar a selective high school, to which students must apply and be accepted, presumably based on testing results and other performance indicators (the story characterizes the proposal as a whole with the term “autonomy”). I will offer no opinion as to whether this conversion, if it is indeed submitted to the District for consideration, is a good idea. That will be up to administrators, teachers, parents, and other stakeholders.

I am, however, a bit struck by two interrelated aspects of this story. The first is the unquestioned characterization of Dunbar as a “low performing” or “struggling” school. This fateful label appears to be based mostly on the school’s proficiency rates, which are indeed dismally low – 20 percent in math and 29 percent in reading.

ESEA Waivers And The Perpetuation Of Poor Educational Measurement

Some of the best research out there is a product not of sophisticated statistical methods or complex research designs, but rather of painstaking manual data collection. A good example is a recent paper by Morgan Polikoff, Andrew McEachin, Stephani Wrabel and Matthew Duque, which was published in the latest issue of the journal Educational Researcher.

Polikoff and his colleagues performed a task that makes most of the rest of us cringe: They read and coded every one of the over 40 state applications for ESEA flexibility, or “waivers." The end product is a simple but highly useful presentation of the measures states are using to identify “priority” (low-performing) and “focus” (schools "contributing to achievement gaps") schools. The results are disturbing to anyone who believes that strong measurement should guide educational decisions.

There's plenty of great data and discussion in the paper, but consider just one central finding: How states are identifying priority (i.e., lowest-performing) schools at the elementary level (the measures are of course a bit different for secondary schools).