When Growth Isn't Really Growth

Let’s try a super-simple thought experiment with data. Suppose we have an inner-city middle school serving grades 6-8. Students in all three grades take the state exam annually (in this case, we’ll say that it’s at the very beginning of the year). Now, for the sake of this illustration, let’s avail ourselves of the magic of hypotheticals and assume away many of the sources of error that make year-to-year changes in public testing data unreliable.

First, we’ll say that this school reports test scores instead of proficiency rates, and that the scores are comparable between grades. Second, every year, our school welcomes a new cohort of sixth graders that is the exact same size and has the exact same average score as preceding cohorts – 30 out of 100, well below the state average of 65. Third and finally, there is no mobility at this school. Every student who enters sixth grade stays there for three years, and goes to high school upon completion of eighth grade. No new students are admitted mid-year.

Okay, here’s where it gets interesting: Suppose this school is phenomenally effective in boosting its students’ scores. In fact, each year, every single student gains 20 points. It is the highest growth rate in the state. Believe it or not, using the metrics we commonly use to judge schoolwide “growth” or "gains," this school would still look completely ineffective. Take a look at the figure below.

A Simple Choice Of Words Can Help Avoid Confusion About New Test Results

In 1998, the National Institutes of Health (NIH) lowered the threshold at which people are classified as “overweight." Literally overnight, about 25 million Americans previously considered as having a healthy weight were now overweight. If, the next day, you saw a newspaper headline that said “number of overweight Americans increases," you would probably find that a little misleading. America’s “overweight” population didn’t really increase; the definition changed.

Fast forward to November 2012, during which Kentucky became the first state to release results from new assessments that were aligned with the Common Core Standards (CCS). This led to headlines such as, "Scores Drop on Kentucky’s Common Core-Aligned Tests" and "Challenges Seen as Kentucky’s Test Scores Drop As Expected." Yet, these descriptions unintentionally misrepresent what happened. It's not quite accurate - or at least highly imprecise - to say that test scores “dropped," just as it would have been wrong to say that the number of overweight Americans increased overnight in 1998 (actually, they’re not even scores, they’re proficiency rates). Rather, the state adopted different tests, with different content, a different design, and different standards by which students are deemed “proficient."

Over the next 2-3 years, a large group of states will also release results from their new CCS-aligned tests. It is important for parents, teachers, administrators, and other stakeholders to understand what the results mean. Most of them will rely on newspapers and blogs, and so one exceedingly simple step that might help out is some polite, constructive language-policing.

Are Teachers Changing Their Minds About Education Reform?

** Reprinted here in the Washington Post

In a recent Washington Post article called “Teachers leaning in favor of reforms," veteran reporter Jay Mathews puts forth an argument that one hears rather frequently – that teachers are “changing their minds," in a favorable direction, about the current wave of education reform. Among other things, Mr. Mathews cites two teacher surveys. One of them, which we discussed here, is a single-year survey that doesn't actually look at trends, and therefore cannot tell us much about shifts in teachers’ attitudes over time (it was also a voluntary online survey).

His second source, on the other hand, is in fact a useful means of (cautiously) assessing such trends (though the article doesn't actually look at them). That is the Education Sector survey of a nationally-representative sample of U.S. teachers, which they conducted in 2003, 2007 and, most recently, in 2011.

This is a valuable resource. Like other teacher surveys, it shows that educators’ attitudes toward education policy are diverse. Opinions vary by teacher characteristics, context and, of course, by the policy being queried. Moreover, views among teachers can (and do) change over time, though, when looking at cross-sectional surveys, one must always keep in mind that observed changes (or lack thereof) might be due in part to shifts in the characteristics of the teacher workforce. There's an important distinction between changing minds and changing workers (which Jay Mathews, to his great credit, discusses in this article).*

That said, when it comes to the many of the more controversial reforms happening in the U.S., those about which teachers might be "changing their minds," the results of this particular survey suggest, if anything, that teachers’ attitudes are actually quite stable.

A Case Against Assigning Single Ratings To Schools

The new breed of school rating systems, some of which are still getting off the ground, will co-exist with federal proficiency targets in many states, and they are (or will be) used for a variety of purposes, including closure, resource allocation and informing parents and the public (see our posts on the systems in INFLOHCONYC).*

The approach that most states are using, in part due to the "ESEA flexibility" guidelines set by the U.S. Department of Education, is to combine different types of measures, often very crudely, into a single grade or categorical rating for each school. Administrators and media coverage usually characterize these ratings as measures of school performance - low-rated schools are called "low performing," while those receiving top ratings are characterized as "high performing." That's not accurate - or, at best, it's only partially true.

Some of the indicators that comprise the ratings, such as proficiency rates, are best interpreted as (imperfectly) describing student performance on tests, whereas other measures, such as growth model estimates, make some attempt to isolate schools’ contribution to that performance. Both might have a role to play in accountability systems, but they're more or less appropriate depending on how you’re trying to use them.

So, here’s my question: Why do we insist on throwing them all together into a single rating for each school? To illustrate why I think this question needs to be addressed, let’s take a quick look at four highly-simplified situations in which one might use ratings.

When You Hear Claims That Policies Are Working, Read The Fine Print

When I point out that raw changes in state proficiency rates or NAEP scores are not valid evidence that a policy or set of policies is “working," I often get the following response: “Oh Matt, we can’t have a randomized trial or peer-reviewed article for everything. We have to make decisions and conclusions based on imperfect information sometimes."

This statement is obviously true. In this case, however, it's also a straw man. There’s a huge middle ground between the highest-quality research and the kind of speculation that often drives our education debate. I’m not saying we always need experiments or highly complex analyses to guide policy decisions (though, in general, these are always preferred and sometimes required). The point, rather, is that we shouldn’t draw conclusions based on evidence that doesn't support those conclusions.

This, unfortunately, happens all the time. In fact, many of the more prominent advocates in education today make their cases based largely on raw changes in outcomes immediately after (or sometimes even before) their preferred policies were implemented (also see hereherehereherehere, and here). In order to illustrate the monumental assumptions upon which these and similar claims ride, I thought it might be fun to break them down quickly, in a highly simplified fashion. So, here are the four “requirements” that must be met in order to attribute raw test score changes to a specific policy (note that most of this can be applied not only to claims that policies are working, but also to claims that they're not working because scores or rates are flat):

Are Stereotypes About Career And Technical Education Crumbling?

The stereotypes, bias, and misunderstanding that have for many decades surrounded and isolated Career and Technical Education (CTE) may slowly be crumbling.  A recent report by the National Research Center for Career and Technical Education (NRCCTE) argues that traditional CTE typology -- the way in which CTE students are identified and classified -- is obsolete.  The distinctions between “CTE” students and “academic” students are no longer useful. Today, nearly all high school students, including the highest achieving academic-track students, enroll in some CTE courses.

Moreover, a significant number of students complete “high intensity” CTE courses as well as academic courses, in patterns that cross SES lines. In order to understand the contemporary high school experience, these researchers argue, we need a new typology based on the reality of today’s classroom, students, and curricula.

The October 2012 study, “A Typology for Understanding the Career and Technical Education Credit-taking Experience of High School Students," proposes a new, more nuanced classification system --  one the authors believe would more accurately capture the high school experience and needs of today’s students. The researchers argue that these long-overdue changes could alter experts’ views of what students actually study in high school, break down the obsolete conceptual barriers that currently divide CTE and academic curricula, and help educators work with students to devise the most appropriate pathways to academic and career success.

Value-Added, For The Record

People often ask me for my “bottom line” on using value-added (or other growth model) estimates in teacher evaluations. I’ve written on this topic many times, and while I have in fact given my overall opinion a couple of times, I have avoided expressing it in a strong “yes or no” format. There's a reason for this, and I thought maybe I would write a short piece and explain myself.

My first reaction to the queries about where I stand on value-added is a shot of appreciation that people are interested in my views, followed quickly by an acute rush of humility and reticence. I know think tank people aren’t supposed to say things like this, but when it comes to sweeping, big picture conclusions about the design of new evaluations, I’m not sure my personal opinion is particularly important.

Frankly, given the importance of how people on the ground respond to these types of policies, as well as, of course, their knowledge of how schools operate, I would be more interested in the views of experienced, well-informed teachers and administrators than my own. And I am frequently taken aback by the unadulterated certainty I hear coming from advocates and others about this completely untested policy. That’s why I tend to focus on aspects such as design details and explaining the research – these are things I feel qualified to discuss.  (I also, by the way, acknowledge that it’s very easy for me to play armchair policy general when it's not my job or working conditions that might be on the line.)

That said, here’s my general viewpoint, in two parts. First, my sense, based on the available evidence, is that value-added should be given a try in new teacher evaluations.

The Educational Attainment Of Girls And Boys: Two Sides of the Same Coin

Last month, Malala Yousafzai, a 14-year-old Pakistani girl, was shot in the head, in an attempted assassination by Taliban militants. Her “crime” was daring to advocate for girls’ education. In a New York Times column, Nicholas Kristof observes that we in the West find it “easy to dismiss such incidents as distant barbarities," and uses the example of sex trafficking to illustrate that we “have a blind spot for our own injustices." I agree. However, I am not sure we need to go so far to find domestic injustices.

How about a close look within this very area: The education of girls (and boys) in the U.S.? Stories about how girls have surpassed boys in educational attainment have become common, and are often linked to statements about how boys are forgotten and/or lost. This rhetoric is troubling for several reasons. First, it can be read to imply a zero-sum equation; that is, that the educational advancement of girls is the cause of boys’ educational neglect. Second, stories about girls’ “successes” and boys’ “failures” may obscure more than they reveal.

There are the "lost boys" of higher education and the "missing girls" of STEM. We worry about boys and reading and girls and math. Recurring questions include where are the women in technology? Or, are there enough novels that cater to boys? Women have sailed past men in obtaining college degrees but, importantly, continue to concentrate in different fields and need Ph.D.s to match men with bachelor’s in the workplace.

When issues are addressed in this fragmented manner, it’s hard to tell if it’s girls or boys that we should be worrying about. Well, both and neither. What all these pieces of the puzzle really say is that – at least in this day, age, and nation – gender still matters.

The Structural Curve In Indiana's New School Grading System

The State of Indiana has received a great deal of attention for its education reform efforts, and they recently announced the details, as well as the first round of results, of their new "A-F" school grading system. As in many other states, for elementary and middle schools, the grades are based entirely on math and reading test scores.

It is probably the most rudimentary scoring system I've seen yet - almost painfully so. Such simplicity carries both potential advantages (easier for stakeholders to understand) and disadvantages (school performance is complex and not always amenable to rudimentary calculation).

In addition, unlike the other systems that I have reviewed here, this one does not rely on explicit “weights," (i.e., specific percentages are not assigned to each component). Rather, there’s a rubric that combines absolute performance (passage rates) and proportions drawn from growth models (a few other states use similar schemes, but I haven't reviewed any of them).

On the whole, though, it's a somewhat simplistic variation on the general approach most other states are taking -- but with a few twists.

Surveying The Teacher Opinion Landscape

I’m a big fan of surveys of teachers’ opinions of education policy, not only because of educators' valuable policy-relevant knowledge, but also because their views are sometimes misrepresented or disregarded in our public discourse.

For instance, the diverse set of ideas that might be loosely characterized as “market-based reform” faces a bit of tension when it comes to teacher support. Without question, some teachers support the more controversial market-based policy ideas, such as pay and evaluations based substantially on test scores, but most do not. The relatively low levels of teacher endorsement don’t necessarily mean these ideas are “bad," and much of the disagreement is less about the desirability of general policies (e.g., new teacher evaluations) than the specifics (e.g., the measures that comprise those evaluations). In any case, it's a somewhat awkward juxtaposition: A focus on “respecting and elevating the teaching profession” by means of policies that most teachers do not like.

Sometimes (albeit too infrequently) this tension is discussed meaningfully, other times it is obscured - e.g., by attempts to portray teachers' disagreement as "union opposition." But, as mentioned above, teachers are not a monolith and their opinions can and do change (see here). This is, in my view, a situation always worth monitoring, so I thought I’d take a look at a recent report from the organization Teach Plus, which presents data from a survey that they collected themselves.