What You Need To Know About Misleading Education Graphs, In Two Graphs

There’s no reason why insisting on proper causal inference can’t be fun.

A weeks ago, ASCD published a policy brief (thanks to Chad Aldeman for flagging it), the purpose of which is to argue that it is “grossly misleading” to make a “direct connection” between nations’ test scores and their economic strength.

On the one hand, it’s implausible to assert that better educated nations aren’t stronger economically. On the other hand, I can certainly respect the argument that test scores are an imperfect, incomplete measure, and the doomsday rhetoric can sometimes get out of control.

In any case, though, the primary piece of evidence put forth in the brief was the eye-catching graph below, which presented trends in NAEP versus those in U.S. GDP and productivity.

Regular Public And Charter Schools: Is A Different Conversation Possible?

Uplifting Leadership, Andrew Hargreaves' new book with coauthors Alan Boyle and Alma Harris, is based on a seven-year international study, and illustrates how leaders from diverse organizations were able to lift up their teams by harnessing and balancing qualities that we often view as opposites, such as dreaming and action, creativity and discipline, measurement and meaningfulness, and so on.

Chapter three, Collaboration With Competition, was particularly interesting to me and relevant to our series, "The Social Side of Reform." In that series, we've been highlighting research that emphasizes the value of collaboration and considers extreme competition to be counterproductive. But, is that always the case? Can collaboration and competition live under the same roof and, in combination, promote systemic improvement? Could, for example, different types of schools serving (or competing for) the same students work in cooperative ways for the greater good of their communities?

Hargreaves and colleagues believe that establishing this environment is difficult but possible, and that it has already happened in some places. In fact, Al Shanker was one of the first proponents of a model that bears some similarity. In this post, I highlight some ideas and illustrations from Uplifting Leadership and tie them to Shanker's own vision of how charter schools, conceived as idea incubators and, eventually, as innovations within the public school system, could potentially lift all students and the entire system, from the bottom up, one group of teachers at a time.

The Semantics of Test Scores

Our guest author today is Jennifer Borgioli, a Senior Consultant with Learner-Centered Initiatives, Ltd., where she supports schools with designing performance based assessments, data analysis, and curriculum design.

The chart below was taken from the 2014 report on student performance on the Grades 3-8 tests administered by the New York State Department of Education.

Based on this chart, which of the following statements is the most accurate?

A. “64 percent of 8th grade students failed the ELA test”

B. “36 percent of 8th graders are at grade level in reading and writing”

C. “36 percent of students meet or exceed the proficiency standard (Level 3 or 4) on the Grade 8 CCLS-aligned math test”

Lost In Citation

The so-called Vergara trial in California, in which the state’s tenure and layoff statutes were deemed unconstitutional, already has its first “spin-off," this time in New York, where a newly-formed organization, the Partnership for Educational Justice (PEJ), is among the organizations and entities spearheading the effort.

Upon first visiting PEJ’s new website, I was immediately (and predictably) drawn to the “Research” tab. It contains five statements (which, I guess, PEJ would characterize as “facts”). Each argument is presented in the most accessible form possible, typically accompanied by one citation (or two at most). I assume that the presentation of evidence in the actual trial will be a lot more thorough than that offered on this webpage, which seems geared toward the public rather than the more extensive evidentiary requirements of the courtroom (also see Bruce Baker’s comments on many of these same issues surrounding the New York situation).

That said, I thought it might be useful to review the basic arguments and evidence PEJ presents, not really in the context of whether they will “work” in the lawsuit (a judgment I am unqualified to make), but rather because they're very common, and also because it's been my observation that advocates, on both “sides” of the education debate, tend to be fairly good at using data and research to describe problems and/or situations, yet sometimes fall a bit short when it comes to evidence-based discussions of what to do about them (including the essential task of acknowledging when the evidence is still undeveloped). PEJ’s five bullet points, discussed below, are pretty good examples of what I mean.

The Language Of Teacher Effectiveness

There is a tendency in education circles these days, one that I'm sure has been discussed by others, and of which I myself have been "guilty," on countless occasions. The tendency is to use terms such “effective/ineffective teacher” or “teacher performance” interchangeably with estimates from value-added and other growth models.

Now, to be clear, I personally am not opposed to the use of value-added estimates in teacher evaluations and other policies, so long as it is done cautiously and appropriately (which, in my view, is not happening in very many places). Moreover, based on my reading of the research, I believe that these estimates can provide useful information about teachers’ performance in the classroom. In short, then, I am not disputing whether value-added scores should be considered to be one useful proxy measure for teacher performance and effectiveness (and described as such), both formally and informally.

Regardless of one's views on value-added and its policy deployment, however, there is a point at which our failure to define terms can go too far, and perhaps cause confusion.

Contrarians At The Gates

Unlike many of my colleagues, I don’t have a negative view of the Gates Foundation's education programs. Although I will admit that part of me is uneasy with the sheer amount of resources (and influence) they wield, and there are a few areas where I don’t see eye-to-eye with their ideas (or grantees), I agree with them on a great many things, and I think that some of their efforts, such as the Measuring Effective Teachers project, are important and beneficial (even if I found their packaging of the MET results a bit overblown).

But I feel obliged to say that I am particularly impressed with their recent announcement of support for a two-year delay on attaching stakes to the results of new assessments aligned with the Common Core. Granted, much of this is due to the fact that I think this is the correct policy decision (see my opinion piece with Morgan Polikoff). Independent of that, however, I think it took intellectual and political courage for them to take this stance, given their efforts toward new teacher evaluations that include test-based productivity measures.

The announcement was guaranteed to please almost nobody.

In Education Policy, Good Things Come In Small Packages

A recent report from the U.S. Department of Education presented a summary of three recent studies of the differences in the effectiveness of teaching provided advantaged and disadvantaged students (with the former defined in terms of value-added scores, and the latter in terms of subsidized lunch eligibility). The brief characterizes the results of these reports in an accessible manner - that the difference in estimated teaching effectiveness between advantaged and disadvantaged students varied quite widely between districts, but overall is about four percent of the achievement gap in reading and 2-3 percent in math.

Some observers were not impressed. They wondered why so-called reformers are alienating teachers and hurting students in order to address a mere 2-4 percent improvement in the achievement gap.

Just to be clear, the 2-4 percent figures describe the gap (and remember that it varies). Whether it can be narrowed or closed – e.g., by improving working conditions or offering incentives or some other means – is a separate issue. Nevertheless, let’s put aside all the substantive aspects surrounding these studies, and the issue of the distribution of teacher quality, and discuss this 2-4 percent thing, as it illustrates what I believe is the among the most important tensions underlying education policy today: Our collective failure to have a reasonable debate about expectations and the power of education policy.

Being Kevin Huffman

In a post earlier this week, I noted how several state and local education leaders, advocates and especially the editorial boards of major newspapers used the results of the recently-released NAEP results inappropriately – i.e., to argue that recent reforms in states such as Tennessee and D.C. are “working." I also discussed how this illustrates a larger phenomenon in which many people seem to expect education policies to generate immediate, measurable results in terms of aggregate student test scores, which I argued is both unrealistic and dangerous.

Mike G. from Boston, a friend whose comments I always appreciate, agrees with me, but asks a question that I think gets to the pragmatic heart of the matter. He wonders whether individuals in high-level education positions have any alternative. For instance, Mike asks, what would I suggest to Kevin Huffman, who is the head of Tennessee’s education department? Insofar as Huffman’s opponents “would use any data…to bash him if it’s trending down," would I advise him to forego using the data in his favor when they show improvement?*

I have never held any important high-level leadership positions. My political experience and skills are (and I’m being charitable here) underdeveloped, and I have no doubt many more seasoned folks in education would disagree with me. But my answer is: Yes, I would advise him to forego using the data in this manner. Here’s why.

Immediate Gratification And Education Policy

A couple of months ago, Bill Gates said something that received a lot of attention. With regard to his foundation’s education reform efforts, which focus most prominently on teacher evaluations, but encompass many other areas, he noted, “we don’t know if it will work." In fact, according to Mr. Gates, “we won’t know for probably a decade."

He’s absolutely correct. Most education policies, including (but not limited to) those geared toward shifting the distribution of teacher quality, take a long time to work (if they do work), and the research assessing these policies requires a great deal of patience. Yet so many of the most prominent figures in education policy routinely espouse the opposite viewpoint: Policies are expected to have an immediate, measurable impact (and their effects are assessed in the crudest manner imaginable).

A perfect example was the reaction to the recent release of results of the National Assessment of Educational Progress (NAEP).

A Research-Based Case For Florida's Education Reforms

Advocates of the so-called “Florida Formula," a package of market-based reforms enacted throughout the 1990s and 2000s, some of which are now spreading rapidly in other states, traveled to Michigan this week to make their case to the state’s lawmakers, with particular emphasis on Florida's school grading system. In addition to arguments about accessibility and parental involvement, their empirical (i.e., test-based) evidence consisted largely of the standard, invalid claims that cross-sectional NAEP increases prove the reforms’ effectiveness, along with a bonus appearance of the argument that since Florida starting grading schools, the grades have improved, even though this is largely (and demonstrably) a result of changes in the formula.

As mentioned in a previous post, I continue to be perplexed at advocates’ insistence on using this "evidence," even though there is a decent amount of actual rigorous policy research available, much of it positive.

So, I thought it would be fun, though slightly strange, for me to try on my market-based reformer cap, and see what it would look like if this kind of testimony about the Florida reforms was actually research-based (at least the test-based evidence). Here’s a very rough outline of what I came up with: