The False Choice Of Growth Versus Proficiency
Tennessee is considering changing its school accountability system such that schools have the choice of having their test-based performance judged by either status (how highly students score) or growth (how much progress students make over the course of the year). In other words, if schools do poorly on one measure, they are judged by the other (apparently, Texas already has a similar system in place).
As we’ve discussed here many times in the past, status measures, such as proficiency rates, are poor measures of school performance, since some students, particularly those living in poverty, enter their schools far behind their more affluent peers. As a result, schools serving larger proportions of poor students will exhibit lower scores and proficiency rates, even if they are very effective in compelling progress from their students. That is why growth models, which focus on individual student gains on over time, are a superior measure of school performance per se.
This so-called “growth versus proficiency” debate has resurfaced several times over the years, and it was particularly prevalent during the time when states were submitting proposals for their accountability systems during reauthorization of the Elementary and Secondary Education Act. The policy that came out of these discussions was generally promising, as many states moved at least somewhat toward weighting growth model estimates more heavily.
At the same time, however, it is important to mention that the “growth versus proficiency” debate sometimes implies that states must choose between these two types of indicators. This is misleading. And the Tennessee proposal is a very interesting context for discussing this, since they are essentially using these two types of measures interchangeably. The reality, of course, is that both types of measures transmit valuable but different information, and both have a potentially useful role to play in accountability systems.
Remember that the primary purposes of accountability systems are: 1) to compel productive behavioral changes (e.g., motivating schools to improve); and 2) to direct assistance/action where it is needed. Both of these goals require accurate information, which means proper interpretation of measures is key.
Think about teacher accountability systems. We should never hold teachers responsible for how highly their students score on end-of-year tests, because we know that some teachers’s students were way behind at the beginning of the year, whereas other teachers’ students were ahead (or average). Rather, it makes more sense to judge teachers by how much progress their students make throughout the course of the year. The same basic principle applies to school accountability.
Growth measures, specifically estimates from value-added and others types of growth models, attempt to measure how effective schools (and districts) are in helping their students make progress. They are therefore a useful—albeit imperfect—measure of school performance. Status measures, on the other hand, gauge student performance. They can help you identify the schools that serve students who are most in need of catching up. This can be useful, for example, for resource allocation—i.e., providing assistance, such as additional funding or interventions, to schools serving students with lower average scores (or the more commonly used proficiency rates).
Consider, for example, schools that consistently fail to compel strong progress from their students. Using our simple “hybrid” approach, in which growth and status are both useful but interpreted properly, we will see that some of these low-growth schools serve higher-scoring students whereas others serve lower-scoring students. This is important information, largely lost to an approach that chooses between them. Resources are finite, and we might decide that the low growth schools serving lower-scoring students, which will also tend to be located in poorer neighborhoods, are more urgent targets for intervention and assistance than their counterparts serving higher-scoring students (see Polikoff and McEachin 2013 for a nice discussion of this type of matrix in the California context).
We might also apply this approach to high growth schools. Schools that consistently compel progress from their students should be celebrated and copied, but those serving lower-scoring student populations are particularly noteworthy, since they are high-performing for students who need it most. They are also more likely than their high growth, high status counterparts to face challenges, including teacher recruitment/retention, inadequate facilities, troubled students, and other factors that are common in schools located in high-poverty areas. Approaches to improving schools serving large numbers of disadvantaged students may not be the same as those for improving schools in affluent areas, and any attempts to copy or learn from high-growth schools must consider that. Using both status and growth as distinct measures helps make these important distinctions.
So, where does Tennessee’s proposed system fit into this? It's an interesting context, since the proposal doesn't really treat growth versus status as a choice, but rather conceptualizes the two measures as interchangeable. In doing so, Tennessee will inevitably misclassify schools as effective when they are not – specifically, schools that serve high-performing students but do not compel progress from those students (low growth, high status). Under Tennessee’s system, these schools would be judged effective, since their high scores or rates would trump their low growth scores. At best, they would be getting a pass on their test-based ineffectiveness, thanks to the students they serve. At worst, these schools would be rewarded, chosen by parents, or copied by schools based on misinterpreted information.
On a more positive note, it’s true that Tennessee’s proposed system would classify high growth, low status schools as effective, since the high growth would override the low status. This is the sole advantage of Tennessee’s proposed approach over a primitive, status-only system. That is a low bar, however. (And, by the way, this correct classification under Tennessee's system occurs somewhat accidentally.)
But, again, the problems with Tennessee’s approach aren’t just about classification errors. There are other, more subtle problems with treating growth and status as interchangeable measures of school performance—doing so sacrifices valuable, actionable information. For example, under Tennessee’s proposed system, high growth, low status schools would not necessarily be singled out as models for lower-performing, low status schools and districts to emulate, as they would receive the same rating as high growth, high status schools. Similarly, low growth schools serving low performing students might not be prioritized for assistance over their low growth counterparts serving higher-scoring students.
Also note that Tennessee’s proposal will judge as ineffective only schools that are both low growth and low status. That basically means that the vast majority of low-rated schools will be those serving lower-income students. This is a highly distorted, misleading portrait of school performance.
Of course, I should emphasize that the discussion of possible outcomes here is highly simplified, and intended to illustrate the issues, rather than single out any one state or approach. Tennessee has been a leader in using growth measures for accountability (which is one reason why this new proposal is so puzzling), and virtually all states use status in some form in their accountability systems. Moreover, Tennessee could, for example, make these growth/status distinctions in their report cards (e.g., noting that low growth, high status schools received a good rating but were still failing to compel growth).
In general, though, single school ratings don't lend themselves to nuanced interpretation. Assigning one summative rating to each school is inherently reductive. If the state gives a “school performance rating,” it will tend to be interpreted as such. Many parents, journalists, and other community members will take the ratings at face value. And, since these ratings often receive a fair amount of public attention, the stakes are high.
In short, the “choose your own measure” approach is virtually guaranteed to send misinformation as well as sacrifice valuable information. And it does so for the same reason as a status-only system: the failure to exploit the different signals sent by growth and status measures. One need not choose between them, but nor should one treat them as interchangeable. They are both useful, but must be used correctly.