DC School Growth Scores And Poverty

As noted in a nice little post over at Greater Greater Washington's education blog, the District of Columbia Office of the State Superintendent of Education (OSSE) recently started releasing growth model scores for DC’s charter and regular public schools. These models, in a nutshell, assess schools by following their students over time and gauging their testing progress relative to similar students (they can also be used for individual teachers, but DCPS uses a different model in its teacher evaluations).

In my opinion, producing these estimates and making them available publicly is a good idea, and definitely preferable to the district’s previous reliance on changes in proficiency, which are truly awful measures (see here for more on this). It’s also, however, important to note that the model chosen by OSSE – a “median growth percentile," or MGP model, produces estimates that have been shown to be at least somewhat more heavily associated with student characteristics than other types of models, such as value-added models proper. This does not necessarily mean the growth percentile models are “inaccurate” – there are good reasons, such as resources and more difficulty with teacher recruitment/retention, to believe that schools serving poorer students might be less effective, on average, and it’s tough to separate “real” effects from bias in the models.

That said, let’s take a quick look at this relationship using the DC MGP scores from 2011, with poverty data from the National Center for Education Statistics.

Data-Driven Instruction Can't Work If Instructors Don't Use The Data

In education today, data, particularly testing data, are everywhere. One of many potentially valuable uses of these data is helping teachers improve instruction – e.g., identifying students’ strengths and weaknesses, etc. Of course, this positive impact depends on the quality of the data and how it is presented to educators, among other factors. But there’s an even more basic requirement – teachers actually have to use it.

In an article published in the latest issue of the journal Education Finance and Policy, economist John Tyler takes a thorough look at teachers’ use of an online data system in a mid-sized urban district between 2008 and 2010. A few years prior, this district invested heavily in benchmark formative assessments (four per year) for students in grades 3-8, and an online “dashboard” system to go along with them. The assessments’ results are fed into the system in a timely manner. The basic idea is to give these teachers a continual stream of information, past and present, about their students’ performance.

Tyler uses weblogs from the district, as well as focus groups with teachers, to examine the extent and nature of teachers’ data usage (as well as a few other things, such as the relationship between usage and value-added). What he finds is not particularly heartening. In short, teachers didn’t really use the data.

It's Test Score Season, But Some States Don't Release Test Scores

** Reprinted here in the Washington Post

We’ve entered the time of year during which states and districts release their testing results. It’s fair to say that the two districts that get the most attention for their results are New York City and the District of Columbia Public Schools (DCPS), due in no small part to the fact that both enacted significant, high-profile policy changes over the past 5-10 years.

The manner in which both districts present annual test results is often misleading. Many of the issues, such as misinterpreting changes in proficiency rates as “test score growth” and chalking up all “gains” to recent policy changes, are quite common across the nation. These two districts are just among the more aggressive in doing so. That said, however, there’s one big difference between the test results they put out every year, and although I’ve noted it a few times before, I’d like to point it out once more: Unlike New York City/State, DCPS does not actually release test scores.

That’s right – despite the massive national attention to their “test scores," DCPS – or, specifically, the Office of the State Superintendent for Education (OSSE) – hasn’t released a single test score in many years. Not one.

The Ever-Changing NAEP Sample

The results of the latest National Assessment of Educational Progress long term trend tests (NAEP-LTT) were released last week. The data compare the reading and math scores of 9-, 13- and 17-year olds at various points since the early 1970s. This is an important way to monitor how these age cohorts’ performance changes over the long term.

Overall, there is ongoing improvement in scores among 9- and 13-year olds, in reading and especially math, though the trend is inconsistent and increases are somewhat slow in recent years. The scores for 17-year olds, in contrast, are relatively flat.

These data, of course, are cross-sectional – i.e., they don’t follow students over time, but rather compare children in the three age groups with their predecessors from previous years. This means that changes in average scores might be driven by differences, observable or unobservable, between cohorts. One of the simple graphs in this report, which doesn't present a single test score, illustrates that rather vividly.

A Few Points About The New CREDO Charter School Analysis

A new report from CREDO on charter schools’ test-based performance received a great deal of attention, and rightfully so - it includes 27 states, which together serve 95 percent of the nation's charter students.

The analysis as a whole, like its predecessor, is a great contribution. Its sheer scope, as well as a few specific parts (examination of trends), are new and important. And most of the findings serve to reaffirm the core conclusions of the existing research on charters' estimated test-based effects. Such an interpretation may not be particularly satisfying to charter supporters and opponents looking for new ammunition, but the fact that this national analysis will not settle anything in the contentious debate about charter schools once again suggests the need to start asking a different set of questions.

Along these lines, as well as others, there are a few points worth discussing quickly. 

No Presentation Without Representation

I tend to comment on newly-released teacher surveys, primarily because I think the surveys are important and interesting, but also because teachers' opinions are sometimes misrepresented in our debate about education reform. So, last year, I wrote about a report by the advocacy organization Teach Plus, in which they presented results from a survey focused on identifying differences in attitudes by teacher experience (an important topic). One of my major comments was that the survey was "non-scientific" – it was voluntary, and distributed via social media, e-mail, etc. This means that the results cannot be used to draw strong conclusions about the population of teachers as a whole, since those who responded might be different from those that did not.

I also noted that, even if the sample was not representative, this did not preclude finding useful information in the results. That is, my primary criticism was that the authors did not even mention the issue, or make an effort to compare the characteristics of their survey respondents with those of teachers in general (which can give a sense of the differences between the sample and the population).

Well, they have just issued a new report, which also presents the results of a teacher survey, this time focused on teachers’ attitudes toward the evaluation system used in Memphis, Tennessee (called the “Teacher Effectiveness Measure," or TEM). In this case, not only do they raise the issue of representativeness, but they also present a little bit of data comparing their respondents to the population (i.e., all Memphis teachers who were evaluated under TEM).

What Should The Results Of New Teacher Evaluations Look Like?

In a previous post, I discussed the initial results from new teacher evaluations in several states, and the fact that states with implausibly large proportions of teachers in the higher categories face a difficult situation – achieving greater differentiation while improving the quality and legitimacy of their systems.

I also expressed concern that pre-existing beliefs about the "proper" distribution of teacher ratings -- in particular, how many teachers should receive the lowest ratings -- might inappropriately influence the process of adjusting the systems based on the first round of results. In other words, there is a risk that states and districts will change their systems in a crude manner that lowers ratings simply for the sake of lowering ratings.

Such concerns of course imply a more general question: How should we assess the results of new evaluation systems? That’s a complicated issue, and these are largely uncharted waters. Nevertheless, I'd like to offer a few thoughts as states and districts move forward.

Charter School Authorization And Growth

If you ask a charter school supporter why charter schools tend to exhibit inconsistency in their measured test-based impact, there’s a good chance they’ll talk about authorizing. That is, they will tell you that the quality of authorization laws and practices -- the guidelines by which charters are granted, renewed and revoked -- drives much and perhaps even most of the variation in the performance of charters relative to comparable district schools, and that strengthening these laws is the key to improving performance.

Accordingly, a recently-announced campaign by the National Association of Charter School Authorizers aims to step up the rate at which charter authorizers close “low-performing schools” and are more selective in allowing new schools to open. In addition, a recent CREDO study found (among other things) that charter middle and high schools’ performance during their first few years is more predictive of future performance than many people may have thought, thus lending support to the idea of opening and closing schools as an improvement strategy.

Below are a few quick points about the authorization issue, which lead up to a question about the relationship between selectivity and charter sector growth.

Relationship Counseling

A correlation between two variables measures the strength of the linear relationship between them. Put simply, two variables are positively correlated to the extent that individuals with relatively high or low values on one measure tend to have relatively high or low values on the other, and negatively correlated to the extent that high values on one measure are associated with low values on the other.

Correlations are used frequently in the debate about teacher evaluations. For example, researchers might assess the relationship between classroom observations and value-added measures, which is one of the simpler ways to gather information about the “validity” of one or the other – i.e., whether it is telling us what we want to know. In this case, if teachers with higher observation scores also tend to get higher value-added scores, this might be interpreted as a sign that both are capturing, at least to some extent, "true" teacher performance.

Yet there seems to be a tendency among some advocates and policy makers to get a little overeager when interpreting correlations.

Unreliable Sources: Education Revenue During The Recession

For the better part of the past century, U.S. public education revenue has come predominantly from state and local sources, with the federal government contributing only a relatively small share. For most of this time, local revenue (primarily property taxes) comprised the largest proportion, but this began to shift gradually during the 1970s, to the point where state funds constituted a slightly larger share of overall revenue.

As you can see in the simple graph below, which uses data from the U.S. Census Bureau, this situation persisted throughout the 1990s and most of the 2000s. During this period, states provided roughly 50 percent of total revenue, localities about 45 percent, and the federal government approximately 5-8 percent. Needless to say, these overall proportions varied quite a bit by state. Vermont represents one of the most extreme examples, where, as a result of a 1997 State Supreme Court decision, education funding comes almost entirely from the state. Conversely, since Hawaii’s education system consists of a single statewide district, revenue on paper is dominated by state sources (though, in Hawaii's case, you might view the state and local levels as the same).

That said, the period of 2008 to 2010 was a time of pretty sharp volatility in the overall proportions contributed by each level of government.