What’s the problem with formative assessment?
I happen to agree with the pedagogical principles outlined in Black and Wiliam’s Theory of Formative Assessment.[i] As a teacher, I much prefer the work of facilitating student learning to evaluating it. And assessment for learning promised to be a panacea for improving student achievement. Governments around the world raced to implement reforms calling for the sort of formative assessment espoused by ‘Inside the Black Box: Raising Standards through Classroom Assessment’ (Black & Wiliam, 1998a) in the journal of educational taste-making, Phi Delta Kappan. In this article, Paul Black and Dylan Wiliam summarized their large review of empirical studies on disparate practices of classroom formative assessment,[ii] claiming standards could be raised if teachers helped students understand what success looks like for a given task or learning goal, took note of their learning process to make instructional adjustments, and provided non-evaluative feedback before summative grading so that students could refine their performance along the way.
Somehow, someway, this concrete assertion of common sense pedagogy touched a nerve. Maybe the needle wasn’t moving far enough after the education accountability movement of the 80s and 90s swept the globe; maybe assessment researchers were looking for a Trojan horse to crack open the fortress large scale standardized testing had on the hearts and minds of policy makers. Ballooning to mythic proportions, the review became a ‘meta-analysis;’ the effect sizes reported in the literature morphed from large (0.4) to mind-blowing (1.0).[iii] As Randy Bennett (2011) points out, even the more modest claim of a 0.4 effect size would mean roughly double the typical gains U.S. elementary students show on standardized tests in a year. Taking stock, Bennett called out “respected testing experts” Popham and Stiggins (2011, p. 10) for exaggerating the claims and effect sizes; he also critiqued the inclusion process and findings of the original review, equating the entire research narrative around formative assessment to urban legend:
In their review article, then, Black and Wiliam report no meta-analysis of their own doing, nor any quantitative results of their own making. The confusion may occur because, in their brief pamphlet and Phi Delta Kappan position paper, Black and Wiliam (1998a, 1998b) do, in fact, attribute a range of effect sizes to formative assessment. However, no source for those values is ever given. As such, these effect sizes are not the ‘quantitative result’, meta-analytical or otherwise, of the 1998 Assessment in Education review but, rather, a mischaracterisation that has essentially become the educational equivalent of urban legend (Bennett, 2011, p. 12).
But ‘Inside the black box’ had become the little black dress of the education world, ‘assessment for learning’ an easy fix to dress up any research or policy initiative. Rick Stiggins is often cited for coining the terms assessment for, as, and of learning to help teachers over the logical hurdle of how and when to use judgements of student work both formatively and summatively. The term assessment literacy also took on a life of its own when Stiggins and colleagues (2006) said there was a ‘right’ way to do assessment for learning.[iv] Take this claim with a grain of salt – as Morrisette (2011) points out, research which states teachers are failing at formative assessment or any other practice often has more to do with seeing how teachers measure up to idealized norms in interviews and on surveys than with studying what they are actually doing.
Two decades on, the staking out of research territory on ways to improve assessment for learning continues[v] – along with a rather cranky commentary by Paul Black, noting if people had attended to his program of research post 1998, they would have realized the claims were overstated in the first place:
The 1998 review was too optimistic where it said there was enough evidence to justify applying the research findings to practical action (Black, 2015, p. 163).
There are lessons here for teachers, system leaders, and policy makers about understandings of formative assessment. Black argues that he and his collaborators always made the case that improving student achievement went hand in hand with improving teacher pedagogy: a messy, fragile and contingent process (2015, p.163). Formative ‘assessment’ was never really about assessment at all in the common sense of judging students, nor was it intended to replace evaluation. It was about developing teacher judgement in the collection and use of evidence of student learning to guide further instruction.[vi] And judgement, in the messy, contingent work of teaching, cannot be boiled down to one size fits all strategies (Bennett, 2011). There is another side to the story, too. There is more than one approach to formative assessment, despite the dominant narrative. Forms of teacher collaborative inquiry such as pedagogical documentation, learning stories, and video inquiry[vii] can also be considered genres of formative assessment.
But the legend of formative assessment is also a morality tale. How should I, as a researcher, conduct myself? How should we, as users of research, respond to claims? The critiques of formative assessment by Bennett and Black remind us to go back to the basics:
- Don’t jump off the bridge because everyone else is
- Learn from the past but stay in the present
- Don’t make promises you can’t deliver
- Consider the source
- Show your work
- Question everything
Continue reading →