Do Value-Added Models "Control For Poverty?"
There is some controversy over the fact that Florida’s recently-announced value-added model (one of a class often called “covariate adjustment models”), which will be used to determine merit pay bonuses and other high-stakes decisions, doesn’t include a direct measure of poverty.
Personally, I support adding a direct income proxy to these models, if for no other reason than to avoid this type of debate (and to facilitate the disaggregation of results for instructional purposes). It does bear pointing out, however, that the measure that’s almost always used as a proxy for income/poverty – students’ eligibility for free/reduced-price lunch – is terrible as a poverty (or income) gauge. It tells you only whether a student’s family has earnings below (or above) a given threshold (usually 185 percent of the poverty line), and this masks most of the variation among both eligible and non-eligible students. For example, families with incomes of $5,000 and $20,000 might both be coded as eligible, while families earning $40,000 and $400,000 are both coded as not eligible. A lot of hugely important information gets ignored this way, especially when the vast majority of students are (or are not) eligible, as is the case in many schools and districts.
That said, it’s not quite accurate to assert that Florida and similar models “don’t control for poverty." The model may not include a direct income measure, but it does control for prior achievement (a student’s test score in the previous year[s]). And a student’s test score is probably a better proxy for income than whether or not they’re eligible for free/reduced-price lunch.
Even more importantly, however, the key issue about bias is not whether the models “control for poverty," but rather whether they control for the range of factors – school and non-school – that are known to affect student test score growth, independent of teachers’ performance. Income is only one part of this issue, which is relevant to all teachers, regardless of the characteristics of the students that they teach.
First, let’s take a quick look at the issue of “controlling for poverty."
An important, albeit somewhat counterintuitive fact is that a statistical model doesn’t have to control for something directly in order to partially account for its influence, if it does control for a different variable that is highly correlated. If this is confusing, try a simplified example: Let’s say I want to see whether a person’s age influences their political views. In doing so, I will obviously need to control for a bunch of factors that might also be associated with those views, such as gender. But let’s also say that I don’t have data on one very important factor – earnings.
I do, however, include a variable measuring education, which is highly correlated with earnings. It’s not a perfect association, but if I control for education (which, in this case, I should do anyway), I will be able to account for at least some portion of the variation in political views that is associated with earnings. There will inevitably be some error – not everyone with a lot of education makes a lot of money, and vice-versa, but, over a large enough sample of individuals, much of this error will be cancelled out. In that sense, even though I’m not controlling directly for earnings, I can still pick up at least some of its effects.
The same thing goes for value-added models. Even if Florida doesn’t control for free/reduced-price lunch eligibility (in their case, they can do so, but have chosen not to), that doesn’t mean the models simply ignore income (or poverty). They do control for students’ prior achievement – i.e., they predict student testing gains as a function of a bunch of variables, the most important of which is a student’s actual score the previous year.
And, like education and earnings, there is a very strong correlation between family income and students’ absolute test scores, which means that controlling for the latter will “pick up” a lot of the variation in the former, especially over large samples of students.
This does not, however, mean that Florida shouldn’t control for lunch program eligibility directly. One could make a strong argument that the models should include everything possible that is associated with testing performance, especially when there are no costs in doing so. On the other hand, there is an empirical question here – do the estimates change a great deal when the variable is included? The evidence on this score depends on the type of model used, the years of data available, the properties of the test and other factors. Sometimes results are different overall, sometimes they’re quite similar, though in both cases, at least some individual teachers’ estimates are likely affected.*
I’m not familiar with how Floridians made their decision to exclude the lunch measure, but I do hope they will test regularly whether their results are different when it’s included.
In any case, to a large extent, the inclusion of prior test scores (and other variables) does actually account for income and poverty, at least to some extent, and the free/reduced-price lunch variable is so limited that it doesn’t always add much to the power of the models. This means that it is misleading to say that the “models don’t control for poverty." **
It’s also an oversimplification to the point of being a distraction.
The proper question about bias is more broad: Do the models adequately account for all the relevant factors that are outside of teachers’ control?
This is a bigger issue, and it pertains to teachers of high- and low-poverty students in all schools and districts, urban and rural, large and small. Now, it’s certainly true that many of the conditions that influence performance, such as parental involvement, oral language development, early childhood education, family stress, etc., are associated with income, but the relationship is imperfect. And, many other important factors are only weakly or unrelated (especially given the limitations of the lunch program variable). Child development is cumulative and multifaceted.
So, the answer to this more central question – whether a growth model accounts for non-teacher factors – is inherently a matter of degree. When using properly-interpreted estimates from the best models with multiple years of data (not the case in many places using these estimates for decisions), it’s fair to say that a large proportion of the non-teacher-based variation in performance can be accounted for.
There will, however, always be bias, sometimes substantial, affecting many individual teachers, as there would be with any performance measurement, including classroom observations. Whether or not the bias is “tolerable” depends on one’s point of view, as well as how the estimates are used (the latter is especially important among cautious value-added supporters like myself). Furthermore, as I’ve argued many times, the bigger problem in many cases, one that can be partially addressed but is being largely ignored, is random error.
But that’s a separate discussion. For now, the main point is that the controversy over the role of “poverty” in education has assumed a role of unqualified importance in the debate over value-added. It’s more broad and complicated than that. Reducing it to a poverty argument is likely to be unproductive in the short and long run. It oversimplifies the potential problem of systematic bias, and also ends up ignoring the critical issues – implementation, random error, model specification, data quality, etc. – that can make all the difference.
- Matt Di Carlo
*****
* A related but somewhat technical issue with controlling for student demographic characteristics, which is sometimes used as a reason for excluding them, is the possibility that less effective teachers are concentrated in higher-poverty schools. If that’s the case, then, put simply, the models will “mistake” this for poverty effects (or those of other characteristics). I should also mention that there is a distinction here between controlling for individual students’ income/poverty, and classroom- and school-level poverty (e.g., the percent of students in a class or school who are eligible for free/reduced-price lunch). The latter measures help address factors such as peer effects. Whether or not classroom- or school-level characteristics have a substantial impact on results also varies by methods and data availability.
** In case you’re missing the irony here, consider two facts: first, opponents of value-added often point out that the models do not account for student poverty, which is a strong predictor of testing performance; second, the models admit to the relationship between scores and poverty by controlling for the former to account for the latter.
Most everything you write here is equally true of indirect proxies for school dysfunctionality, and the effects of dysfunctional schools are also beyond a teacher's control. And yes, we should acknowledge upfront that high-povery schools, which are more likely to be dysfunctional, can have greater concentrations of bad teachers. In my experience, the top 75% of my colleagues were just as good as their counterparts in great schools, but every year we hired more warm bodies who had no chance of success. Typically the warm bodies went to classrooms which previously had hopeless cases, and so on. And everyone knew the parts of the school that had disproportionately had more classes that everyone knew to be dumping gounds.
Another indirect correlation is whether the principal was allowed to enforce the district's academic, attendance, and disciplinary policies. In my experiences, "Fs," "No Credits" for nonattendance, and disciplinary consequences were like the photocopier budget. Every school got its quota. It didn't matter for schools that never approached their quota. In schools like mine, our quota was used up by October. Once the alternative schools were full, the principal lost power to deal with chronic "hall walkers."
And this was doubly true for disruptive and violent kids whose behavior stemmed from their disability. For instance, a principal was not allowed to suspend an IEP student with a knife with a blade shorter than 2-1/2 inches. That wasn't an issue in most schools, but in schools were 40% of our kids in regular classes were on IEPs, that was huge.
I'm curious if you'd find the same indirect correlation related to today's NYC news regarding credit recovery. Abuses of that can be a huge factor beyond the teacher's control, and I'd be curious if those abuses were more common in schools under the gun.
Every district is different. But I bet that most urban district will have their own unwritten ways of empowering and disempowering different schools.
So, I'd take all of the qualifications you made about indirect estimates of student factors not under the control of teachers, and double it for the school factors not under the control of teachers and principals.