• The Test-Based Evidence On New Orleans Charter Schools

    Charter schools in New Orleans (NOLA) now serve over four out of five students in the city – the largest market share of any big city in the nation. As of the 2011-12 school year, most of the city’s schools (around 80 percent), charter and regular public, are overseen by the Recovery School District (RSD), a statewide agency created in 2003 to take over low-performing schools, which assumed control of most NOLA schools in Katrina’s aftermath.

    Around three-quarters of these RSD schools (50 out of 66) are charters. The remainder of NOLA’s schools are overseen either by the Orleans Parish School Board (which is responsible for 11 charters and six regular public schools, and taxing authority for all parish schools) or by the Louisiana Board of Elementary and Secondary Education (which is directly responsible for three charters, and also supervises the RSD).

    New Orleans is often held up as a model for the rapid expansion of charter schools in other urban districts, based on the argument that charter proliferation since 2005-06 has generated rapid improvements in student outcomes. There are two separate claims potentially embedded in this argument. The first is that the city’s schools perform better that they did pre-Katrina. The second is that NOLA’s charters have outperformed the city’s dwindling supply of traditional public schools since the hurricane.

    Although I tend strongly toward the viewpoint that whether charter schools "work" is far less important than why - e.g., specific policies and practices - it might nevertheless be useful to quickly address both of the claims above, given all the attention paid to charters in New Orleans.

  • The Allure Of Teacher Quality

    Those following education know that policy focused on "teacher quality" is by far the dominant paradigm for improving  schools over the past few years. Some (but not nearly all) components of this all-hands-on-deck effort are perplexing to many teachers, and have generated quite a bit of pushback. No matter one’s opinion of this approach, however, what drives it is the tantalizing allure of variation in teacher quality.

    Fueled by the ever-increasing availability of detailed test score datasets linking teachers to students, the research literature on teachers’ test-based effectiveness has grown rapidly, in both size and sophistication. Analysis after analysis finds that, all else being equal, the variation in teachers’ estimated effects on students' test growth – the difference between the “top” and “bottom” teachers – is very large. In any given year, some teachers’ students make huge progress, others’ very little. Even if part of this estimated variation is attributable to confounding factors, the discrepancies are still larger than most any other measured "input" within the jurisdiction of education policy. The underlying assumption here is that “true” teacher quality varies to a degree that is at least somewhat comparable in magnitude to the spread of the test-based estimates.

    Perhaps that's the case, but it does not, by itself, help much. The key question is whether and how we can measure teacher performance at the individual level and, more importantly, influence the distribution – that is, to raise the ceiling, the middle and/or the floor. The variation hangs out there like a drug to which we’re addicted, but haven’t really figured out how to administer. If there was some way to harness it efficiently, the potential benefits could be considerable. The focus of current education policy is in large part an effort to do anything and everything to try and figure this out. And, as might be expected given the enormity of the task, progress has been slow.

  • Value-Added Versus Observations, Part Two: Validity

    In a previous post, I compared value-added (VA) and classroom observations in terms of reliability – the degree to which they are free of error and stable over repeated measurements. But even the most reliable measures aren’t useful unless they are valid – that is, unless they’re measuring what we want them to measure.

    Arguments over the validity of teacher performance measures, especially value-added, dominate our discourse on evaluations. There are, in my view, three interrelated issues to keep in mind when discussing the validity of VA and observations. The first is definitional – in a research context, validity is less about a measure itself than the inferences one draws from it. The second point might follow from the first: The validity of VA and observations should be assessed in the context of how they’re being used.

    Third and finally, given the difficulties in determining whether either measure is valid in and of itself, as well as the fact that so many states and districts are already moving ahead with new systems, the best approach at this point may be to judge validity in terms of whether the evaluations are improving outcomes. And, unfortunately, there is little indication that this is happening in most places.

  • Becoming A 21st Century Learner

    Think about something you have always wanted to learn or accomplish but never did, such as a speaking a foreign language or learning how to play an instrument. Now think about what stopped you. There’s probably a variety of factors but chances are those factors have little to do with technology.

    Electronic devices are becoming cheaper, easier to use, and more intuitive. Much of the world’s knowledge is literally at our fingertips, accessible from any networked gadget. Yet, sustained learning does not always follow. It is often noted that developing digital skills/literacy is fundamental to 21st century learning but, is that all that’s missing? I suspect not. In this post I take a look at university courses available to anyone with an internet connection (a.k.a. massive open on-line courses or MOOCs) and ask: What attributes or skills make some people (but not others) better equipped to take advantage of this and similar educational opportunities brought about by advances in technology?

    In the last few months, Stanford University’s version of MOOCs have attracted considerable attention (also here and here), leading some to question the U.S. higher education model as we know it – and even envision its demise. But, what is really novel about the Stanford MOOCs? Why did 160,000 students from 190 countries sign up for the course “Introduction to Artificial Intelligence”?

  • Jobs And Freedom: Why Labor Organizing Should Be A Civil Right

    Our guest authors today are Norman Hill and Velma Murphy Hill. Norman Hill, staff coordinator of the historic 1963 March on Washington for Jobs and Freedom, is president emeritus of the A. Philip Randolph Institute. Velma Hill, a former vice president of the American Federation of Teachers (AFT), is also the former civil and human rights director for the Service Employees International Union (SEIU). They are currently working on a memoir, entitled Climbing Up the Rough Side of the Mountain.

    Richard D. Kahlenberg and Moshe Z. Marvit have done a great service by writing Why Labor Organizing Should Be a Civil Right: Rebuilding a Middle-Class Democracy by Enhancing Worker Voice, an important work with the potential to become the basis for a strong coalition on behalf of civil rights, racial equality and economic justice.

    In the United States, worker rights and civil rights have a deep and historic connection. What is slavery, after all, if not the abuse of worker rights taken to its ultimate extreme? A. Philip Randolph, the founder and president of the Brotherhood of Sleeping Car Porters, recognized this link and, as far back as the 1920s, spoke passionately about the need for a black-labor alliance. Civil rights activist Bayard Rustin, Randolph’s protégé and an adviser to Martin Luther King, Jr., joined his mentor as a forceful, early advocate for a black-labor coalition.

  • Value-Added Versus Observations, Part One: Reliability

    Although most new teacher evaluations are still in various phases of pre-implementation, it’s safe to say that classroom observations and/or value-added (VA) scores will be the most heavily-weighted components toward teachers’ final scores, depending on whether teachers are in tested grades and subjects. One gets the general sense that many - perhaps most - teachers strongly prefer the former (observations, especially peer observations) over the latter (VA).

    One of the most common arguments against VA is that the scores are error-prone and unstable over time - i.e., that they are unreliable. And it's true that the scores fluctuate between years (also see here), with much of this instability due to measurement error, rather than “real” performance changes. On a related note, different model specifications and different tests can yield very different results for the same teacher/class.

    These findings are very important, and often too casually dismissed by VA supporters, but the issue of reliability is, to varying degrees, endemic to all performance measurement. Actually, many of the standard reliability-based criticisms of value-added could also be leveled against observations. Since we cannot observe “true” teacher performance, it’s tough to say which is “better” or “worse," despite the certainty with which both “sides” often present their respective cases. And, the fact that both entail some level of measurement error doesn't by itself speak to whether they should be part of evaluations.*

    Nevertheless, many states and districts have already made the choice to use both measures, and in these places, the existence of imprecision is less important than how to deal with it. Viewed from this perspective, VA and observations are in many respects more alike than different.

  • There's No One Correct Way To Rate Schools

    Education Week reports on the growth of websites that attempt to provide parents with help in choosing schools, including rating schools according to testing results. The most prominent of these sites is GreatSchools.org. Its test-based school ratings could not be more simplistic – they are essentially just percentile rankings of schools’ proficiency rates as compared to all other schools in their states (the site also provides warnings about the data, along with a bunch of non-testing information).

    This is the kind of indicator that I have criticized when reviewing states’ school/district “grading systems." And it is indeed a poor measure, albeit one that is widely available and easy to understand. But it’s worth quickly discussing the fact that such criticism is conditional on how the ratings are employed - there is a difference between the use of testing data to rate schools for parents versus for high-stakes accountability purposes.

    In other words, the utility and proper interpretation of data vary by context, and there's no one "correct way" to rate schools. The optimal design might differ depending on the purpose for which the ratings will be used. In fact, the reasons why a measure is problematic in one context might very well be a source of strength in another.

  • The Challenges Of Pre-K Assessment

    In the United States, nearly 1.3 million children attend publicly-funded preschool. As enrollment continues to grow, states are under pressure to prove these programs serve to increase school readiness. Thus, the task of figuring out how best to measure preschoolers’ learning outcomes has become a major policy focus.

    First, it should be noted that researchers are almost unanimous in their caution about this subject. There are inherent difficulties in the accurate assessment of very young children’s learning in the fields of language, cognition, socio-emotional development, and even physical development. Young children’s attention spans tend to be short and there are wide, natural variations in children’s performance in any given domain and on any given day. Thus, great care is advised for both the design and implementation of such assessments (see here, here, and here for examples). The question of if and how to use these student assessments to determine program or staff effectiveness is even more difficult and controversial (for instance, here and here). Nevertheless, many states are already using various forms of assessment to oversee their preschool investments.

    It is difficult to react to this (unsurprising) paradox. Sadly, in education, there is often a disconnect between what we know (i.e., research) and what we do (i.e., policy). But, since our general desire for accountability seems to be here to stay, a case can be made that states should, at a minimum, expand what they measure to reflect learning as accurately and broadly as possible.

    So, what types of assessments are better for capturing what a four- or a five- year old knows? How might these assessments be improved?

  • Still In Residence: Arts Education In U.S. Public Schools

    There is a somewhat common argument in education circles that the focus on math and reading tests in No Child Left Behind has had the unintended consequence of generating a concurrent deemphasis on other subjects. This includes science and history, of course, but among the most frequently-mentioned presumed victims of this trend are art and music.

    A new report by the National Center for Education Statistics (NCES) presents some basic data on the availability of arts instruction in U.S. public schools between 1999 and 2010.

    The results provide only very mixed support for the hypothesis that these programs are less available now than they were prior to the implementation of NCLB.

  • Measuring Journalist Quality

    Journalists play an essential role in our society. They are charged with informing the public, a vital function in a representative democracy. Yet, year after year, large pockets of the electorate remain poorly-informed on both foreign and domestic affairs. For a long time, commentators have blamed any number of different culprits for this problem, including poverty, education, increasing work hours and the rapid proliferation of entertainment media.

    There is no doubt that these and other factors matter a great deal. Recently, however, there is growing evidence that the factors shaping the degree to which people are informed about current events include not only social and economic conditions, but journalist quality as well. Put simply, better journalists produce better stories, which in turn attract more readers. On the whole, the U.S. journalist community is world class. But there is, as always, a tremendous amount of underlying variation. It’s likely that improving the overall quality of reporters would not only result in higher quality information, but it would also bring in more readers. Both outcomes would contribute to a better-informed, more active electorate.

    We at the Shanker Institute feel that it is time to start a public conversation about this issue. We have requested and received datasets documenting the story-by-story readership of the websites of U.S. newspapers, large and small. We are using these data in statistical models that we call “Readers-Added Models," or “RAMs."