The Magic Of Multiple Measures

Our guest author today is Cara Jackson, Assistant Director of Research and Evaluation at the Urban Teacher Center.

Teacher evaluation has become a contentious issue in U.S. Some observers see the primary purpose of these reforms as the identification and removal of ineffective teachers; the popular media as well as politicians and education reform advocates have all played a role in the framing of teacher evaluation as such. But, while removal of ineffective teachers was a criterion under Race to the Top, so too was the creation of evaluation systems to be used for teacher development and support.

I think most people would agree that teacher development and improvement should be the primary purpose, as argued here. Some empirical evidence supports the efficacy of evaluation for this purpose (see here). And given the sheer number of teachers we need, declining enrollment in teacher preparation programs, and the difficulty disadvantaged schools have retaining teachers, school principals are probably none too enthusiastic about dismissing teachers, as discussed here.

Of course, to achieve the ambitious goal of improving teaching practice, an evaluation system must be implemented well. Fans of Harry Potter might remember when Dolores Umbridge from the Ministry of Magic takes over as High Inquisitor at Hogwarts and conducted “inspections” of Hogwart’s teachers in Book 5 of J.K. Rowling’s series. These inspections pretty much demonstrate how not to approach classroom observations: she dictates the timing, fails to provide any of indication of what aspects of teaching practice she will be evaluating, interrupts lessons with pointed questions and comments, and evidently does no pre- or post-conferencing with the teachers.

If you wanted teacher evaluation to accomplish the purpose of helping teachers improve, you might start by avoiding Umbridge’s approach; a helpful guide to conducting observations as the basis for improving instruction can be found here.

Additionally, though, you might take advantage of the move towards the use of multiple measures. These measures typically include some form of observation rubric and many have begun using student surveys. Let’s imagine for a minute that Hogwarts has adopted two widely used tools, Charlotte Danielson’s Framework for Teaching and Ronald Ferguson’s Tripod Survey (you have already imagined a magical world in which people fly broomsticks, so imagining this cannot be much harder). The Danielson Framework focuses on 4 domains: Planning and Preparation, Classroom Environment, Instruction, and Professional Responsibilities. Tripod captures students’ perspectives on the “7Cs”: Care, Confer, Captivate, Clarify, Consolidate, Challenge, and Control.

So let’s consider what the evaluations of Hogwarts teachers might tell us.

Professor Snape, Potions. Rated highly in the Planning and Preparation as well as Instruction domains. He knows his content really well, creates cognitively engaging activities for his classes, and communicates clear expectations for learning and directions for activities. Snape receives mixed scores for the Classroom Environment domain; though not always respectful or fair, he does establish a culture for learning. Results on student surveys are widely divergent: Slytherins love him, Gryffindors hate him. As a result, his student survey ratings in Care are low but not rock-bottom. He scores moderately high in Captivate and Challenge. Overall, a decent example of how someone can be kind of awful to some students, and yet a reasonably effective teacher.
Professor Binns, History of Magic. He gets credit for content knowledge under the Planning and Preparation domain, but this is offset by a complete lack of knowledge of his students. He fails to use any questioning and discussion techniques or engage students in learning, resulting in low ratings in the Instruction domain. Survey ratings are low in Confer, Captivate, and Clarify. In another setting, he might score low in Control too, but the students are content to treat the class as a period for dozing or daydreaming.
Professor Hagrid, Care of Magical Creatures. Okay, nobody ever calls Hagrid “professor.” He’s a bit of a caricature of a novice teacher: well-intentioned and enthusiastic, his lessons don’t always go as intended. Rated “developing” on most sections of the Framework. For example, though he knows his subject, his tendency to underestimate the dangers posed by biting books and Blast-Ended Skrewts lowers his ratings in both the Planning and Preparation and Classroom Environment. On the student survey, Hagrid gets moderately high scores in Captivate, but scores low on Control. He offers a good example of possibility of gaming on student surveys – just think what Malfoy, Crabbe, and Goyle would have rated him.

And, as for Umbridge herself, she’d score low across the board. It’s not even clear what, if any, content knowledge she possesses, as her instruction consists of telling students what chapter to read. I suppose the students might give her high marks for Control.

What of their value added scores? Well, they don’t have any; Hermoine’s arithmancy teacher is the only Hogwarts teacher I can think of who might, and we don’t hear much about that class. This is not specific to the Wizarding world. Among Muggle teachers, a relatively small subset of teachers (about 15% in DCPS) have value added scores: those teaching reading or math in grades 4 through 8, and sometimes one high school grade. In theory we could expand the number of teachers with such scores by increasing testing, but that seems unlikely given the current political climate around testing. Alternative student growth metrics are another option, but evidence on reliability and validity remains limited.

The Framework domains and the 7Cs have been found to be positively related to value added as well as other valued outcomes such as happiness in class and interest in college (Ferguson and Danielson 2014). Thus, the lack of student growth metrics does not prevent us from using observation and student survey data to provide feedback to teachers that could improve the quality of instruction, and ultimately, student outcomes.

If I were Headmistress of Hogwarts, and I already have enough trouble recruiting and retaining Defense Against the Dark Arts teachers, I’d make every effort to use this evidence to improve the quality of instruction in other courses rather than replace those teachers. I’d ask Snape to treat students a bit more respectfully and fairly. Binns might be given release time to observe other classes, and would be encouraged to develop more engaging lesson plans and allow for more discussions in class. Hagrid could be given a mentor, and asked to reflect on the appropriateness of his classroom materials. I’d keep them all, honestly. Well, except for Umbridge.

Blog Topics

Accountability

Classroom Observation

Issues Areas