• A Dark Day For Educational Measurement In The Sunshine State

    Just this week, Florida announced its new district grading system. These systems have been popping up all over the nation, and given the fact that designing one is a requirement of states applying for No Child Left Behind waivers, we are sure to see more.

    I acknowledge that the designers of these schemes have the difficult job of balancing accessibility and accuracy. Moreover, the latter requirement – accuracy – cannot be directly tested, since we cannot know “true” school quality. As a result, to whatever degree it can be partially approximated using test scores, disagreements over what specific measures to include and how to include them are inevitable (see these brief analyses of Ohio and California).

    As I’ve discussed before, there are two general types of test-based measures that typically comprise these systems: absolute performance and growth. Each has its strengths and weaknesses. Florida’s attempt to balance these components is a near total failure, and it shows in the results.

  • Performance And Chance In New York's Competitive District Grant Program

    New York State recently announced a new $75 million competitive grant program, which is part of its Race to the Top plan. In order to receive some of the money, districts must apply, and their applications receive a score between zero and 115. Almost a third of the points (35) are based on proposals for programs geared toward boosting student achievement, 10 points are based on need, and there are 20 possible points awarded for a description of how the proposal fits into districts’ budgets.

    The remaining 50 points – almost half – of the application is based on “academic performance” over the prior year. Four measures are used to produce the 0-50 point score: One is the year-to-year change (between 2010 and 2011) in the district’s graduation rate, and the other three are changes in the state “performance index” in math, English Language Arts (ELA) and science. The “performance index” in these three subjects is calculated using a simple weighting formula that accounts for the proportion of students scoring at levels 2 (basic), 3 (proficient) and 4 (advanced).

    The idea of using testing results as a criterion in the awarding of grants is to reward those districts that are performing well. Unfortunately, due to the choice of measures and how they are used, the 50 points will be biased and to no small extent based on chance.

  • Burden Of Proof, Benefit Of Assumption

    ** Also posted here on "Valerie Strauss' Answer Sheet" in the Washington Post

    Michelle Rhee, the controversial former chancellor of D.C. public schools, is a lightning rod. Her confrontational style has made her many friends as well as enemies. As is usually the case, people’s reaction to her approach in no small part depends on whether or not they support her policy positions.

    I try to be open-minded toward people with whom I don’t often agree, and I can certainly accept that people operate in different ways. Honestly, I have no doubt as to Ms. Rhee’s sincere belief in what she’s doing; and, even if I think she could go about it differently, I respect her willingness to absorb so much negative reaction in order to try to get it done.

    What I find disturbing is how she continues to try to build her reputation and advance her goals based on interpretations of testing results that are insulting to the public’s intelligence.

  • Income And Educational Outcomes

    The role of poverty in shaping educational outcomes is one of the most common debates going on today. It can also be one of the most shallow.

    The debate tends to focus on income. For example (and I’m generalizing a bit here), one “side” argues that income and test scores are strongly correlated; the other “side” points to the fact that many low-income students do very well and cautions against making excuses for schools’ failure to help poor kids.

    Both arguments have merit, but it bears quickly mentioning that the focus on the relationship between income and achievement is a rather crude conceptualization of the importance of family background (and non-schooling factors in general) for education outcomes. Income is probably among the best widely available proxies for these factors, insofar as it is correlated with many of the conditions that can hinder learning, especially during a child’s earliest years. This includes (but is not at all limited to): peer effects; parental education; access to print and background knowledge; parental involvement; family stressors; access to healthcare; and, of course, the quality of neighborhood schools and their teachers.

    And that is why, when researchers try to examine school performance – while holding constant the effect of factors outside of schools’ control – income or some kind of income-based proxy (usually free/reduced price lunch) can be a useful variable. It is, however, quite limited.

  • Trial And Error Is Fine, So Long As You Know The Difference

    It’s fair to say that improved teacher evaluation is the cornerstone of most current education reform efforts. Although very few people have disagreed on the need to design and implement new evaluation systems, there has been a great deal of disagreement over how best to do so – specifically with regard to the incorporation of test-based measures of teacher productivity (i.e., value-added and other growth model estimates).

    The use of these measures has become a polarizing issue. Opponents tend to adamantly object to any degree of incorporation, while many proponents do not consider new evaluations meaningful unless they include test-based measures as a major element (say, at least 40-50 percent). Despite the air of certainty on both sides, this debate has mostly been proceeding based on speculation. The new evaluations are just getting up and running, and there is virtually no evidence as to their effects under actual high-stakes implementation.

    For my part, I’ve said many times that I'm receptive to trying value-added as a component in evaluations (see here and here), though I disagree strongly with the details of how it’s being done in most places. But there’s nothing necessarily wrong with divergent opinions over an untested policy intervention, or with trying one. There is, however, something wrong with fully implementing such a policy without adequate field testing, or at least ensuring that the costs and effects will be carefully evaluated post-implementation. To date, virtually no states/districts of which I'm aware have mandated large-scale, independent evaluations of their new systems.*

    If this is indeed the case, the breathless, speculative debate happening now will only continue in perpetuity.

  • Beyond Anecdotes: The Evidence About Financial Incentives And Teacher Retention

    ** Also posted here on "Valerie Strauss' Answer Sheet" in the Washington Post

    Our guest author today is Eleanor Fulbeck, who earned her Ph.D. in education policy from the University of Colorado at Boulder in 2011, and is currently a post-doctoral fellow at the University of Pennsylvania.

    A couple of weeks ago, an article in the New York Times, written by reporter Sam Dillon, took a look at the new incentive program being used by the District of Columbia Public Schools (DCPS). Under this plan (called “Impact Plus”), teachers rated “highly effective” by the district’s new evaluation system are eligible for large cash bonuses and/or permanent salary increases.

    Dillon notes that, “The profession is notorious for losing thousands of its brightest young teachers within a few years, which many experts attribute to low starting salaries and a traditional step-raise structure that rewards years of service and academic degrees rather than success in the classroom." He also profiles several teachers who received the bonuses, most of whom say it played a role in their decision to remain in the classroom.

    Putting aside these anecdotes and characterizations of “experts’” views, the idea that financial incentives – such as bonuses for performance or teaching in hard-to-staff schools – is a key to boosting teacher retention is a complex empirical question, and an open one at that.

  • Are Americans Exceptional In Their Attitudes Toward Government's Role In Reducing Inequality?

    As discussed in a previous post, roughly half of Americans believe that government should take some active role in reducing income differences between rich and poor, though, as one would expect, this view is less prevalent among Republicans, more educated and higher earning survey respondents.

    These data, however, lack a frame of reference. That is, they don’t tell us whether American support for government redistribution is “high” or “low” compared with that in other nations. The conventional wisdom in this area is that Americans generally prefer a more limited government, especially when it comes to things like income redistribution.

    It might therefore be interesting to take a quick look at how the U.S. stacks up against other nations in terms of these redistributive preferences.

  • The Persistence Of Both Teacher Effects And Misinterpretations Of Research About Them

    In a new National Bureau of Economic Research working paper on teacher value-added, researchers Raj Chetty, John Friedman and Jonah Rockoff present results from their analysis of an incredibly detailed dataset linking teachers and students in one large urban school district. The data include students’ testing results between 1991 and 2009, as well as proxies for future student outcomes, mostly from tax records, including college attendance (whether they were reported to have paid tuition or received scholarships), childbearing (whether they claimed dependents) and eventual earnings (as reported on the returns). Needless to say, the actual analysis includes only those students for whom testing data were available, and who could be successfully linked with teachers (with the latter group of course limited to those teaching math or reading in grades 4-8).

    The paper caused a remarkable stir last week, and for good reason: It’s one of the most dense, important and interesting analyses on this topic in a very long time. Much of the reaction, however, was less than cautious, specifically the manner in which the research findings were interpreted to support actual policy implications (also see Bruce Baker’s excellent post).

    What this paper shows – using an extremely detailed dataset and sophisticated, thoroughly-documented methods – is that teachers matter, perhaps in ways that some didn’t realize. What it does not show is how to measure and improve teacher quality, which are still open questions. This is a crucial distinction, one which has been discussed on this blog numerous times (also here and here), as it is frequently obscured or outright ignored in discussions of how research findings should inform concrete education policy.

  • New Report: Does Money Matter?

    Over the past few years, due to massive budget deficits, governors, legislators and other elected officials are having to slash education spending. As a result, incredibly, there are at least 30 states in which state funding for 2011 is actually lower than in 2008. In some cases, including California, the amounts are over 20 percent lower.

    Only the tiniest slice of Americans believe that we should spend less on education, while a large majority actually supports increased funding. At the same time, however, there’s a concerted effort among some advocates, elected officials and others to convince the public that spending more money on education will not improve outcomes, while huge cuts need not do any harm.

    Often, their evidence comes down to some form of the following graph:

  • Is California's "API Growth" A Good Measure Of School Performance?

    California calls its “Academic Performance Index” (API) the “cornerstone” of its accountability system. The API is calculated as a weighted average of the proportions of students meeting proficiency and other cutoffs on the state exams.

    It is a high-stakes measure. “Growth” in schools’ API scores determines whether they meet federal AYP requirements, and it is also important in the state’s own accountability regime. In addition, toward the middle of last month, the California Charter Schools Association called for the closing of ten charter schools based in part on their (three-year) API “growth” rates.

    Putting aside the question of whether the API is a valid measure of student performance in any given year, using year-to-year changes in API scores in high-stakes decisions is highly problematic. The API is cross-sectional measure – it doesn’t follow students over time – and so one must assume that year-to-year changes in a school’s index do not reflect a shift in demographics or other characteristics of the cohorts of students taking the tests. Moreover, even if the changes in API scores do in fact reflect “real” progress, they do not account for all the factors outside of schools’ control that might affect performance, such as funding and differences in students’ backgrounds (see here and here, or this Mathematica paper, for more on these issues).

    Better data are needed to test these assumptions directly, but we might get some idea of whether changes in schools’ API are good measures of school performance by testing how stable they are over time.