Putting evidence to work

With the resurgence of interest in evidence-based research in education, whether arising from randomised controlled trials conducted in classrooms or from cognitive science, there’s an on-going question about how we can get this evidence into the hands of the people who can best make use of it: Teachers. This issue, sometimes called the knowledge mobilisation problem, was the topic of a recent piece of research conducted by Teach First.

Putting Evidence to Work’ involved interviews and consultations with a range of academics, researchers and practitioners from education, psychology and related fields. I was really struck by the sheer generosity with which our respondents gave up their time to contribute to our thinking; so I’d like to say a big public ‘thank you’ to all of them:

Becky Allen, Tom Bennett, Daisy Christodoulou, Rob Coe, Kevan Collins, Philippa Cordingley, Caroline Creaby, Becky Francis, Ben Goldacre, Jonathan Haslam, Jennifer van Heerde-Hudson, Gary Jones, Paul Kirschner, Julie Nelson, Ben Riley, Jonathan Sharples, Phil Stock, John Sweller, Alex Quigley, Yana Weinstein, David Weston, Dylan Wiliam, Dan Willingham.

Their insights were combined with a short review of literature examining the barriers and ways forward for supporting teachers and school leaders to make the best use of research evidence. Whilst I hope the report accurately represents the many great ideas they contributed, I should make clear that our conclusions and recommendations do not necessarily reflect their views.

The report – which can be accessed on the Teach First website – lists some great resources for teachers and examines how these might be introduced over time to help teachers exercise a ‘professionally sceptical’ stance towards the kind of research evidence they will come across over the course of their careers. It also discusses a process of applying evidence to classroom practice which I hope is relevant to teachers and school leaders.

The gap between emerging research evidence and classroom practice is a difficult one to bridge – as I’ve reflected on myself in this blog over the years. Hopefully the report adds something to the discussion about how we can better achieve this – so that the best available evidence can inform, challenge and refine our professional judgement as teachers. If you read the report and have comments or ideas you’d like to share about it, I’d be keen to hear your thoughts.

Posted in Uncategorized | 4 Comments

No, don’t forget everything we know about memory

With a renewed interest in cognitive science within teaching, are we in risk of “conflating hypothetical models with proven neuroscience since accepted facts can quickly become ‘neuro-myths’ when new research contradicts popular theories” as Ellie Mulcahy warns in “Forgetting everything we know about memory”, her recent blog post for LKMco?

As evidence for this concern, she relates a new piece of neuroscientific research examining the formation of memory engrams in genetically-modified mice. There’s two things particularly interesting about this study: firstly, the technique they developed to activate groups of neurons using light signals; second, that their study examining engram formation in the hippocampus and pre-frontal cortex appears to challenge a fairly long-standing neurological theory called the  ‘Multiple Trace model’. In essence, neuroscientists thought that for engrams to be formed in the pre-frontal cortex, multiple retrievals of that engram were required from the hippocampus. This new study (though yet to be replicated) found that for an aversive stimulus (electric shock), the engrams formed in the prefrontal cortex at the same time.

However, after 1 day the engrams were not naturally retrievable (the engram didn’t appear to be activated when placed in the chamber where the mouse had received an electric shock), but could be activated using the light signals to directly stimulate the neurons in the pre-frontal cortex. After 14 days, the pattern was reversed – natural recall occurred and the engram in the prefrontal cortex was active, but the engram in the hippocampus remained dormant (but could be artificially activated using light signals).

Despite the fact that it is “not yet clear what the implications for teaching, learning and pupils’ memory are”, Mulcahy argues that this new research throws “everything we thought we knew in to question”.  Of cognitive theories of memory, she says that this new research “serves to demonstrate that unproven models should not be taken at face value” and that we “risk charging headlong into the territory of new neuromyths and VAK revisited”.

Five claims and counterarguments

Mulcahy makes a series of very strong and, in my opinion, unwarranted and in some cases unscientific claims in her post:

  • Neuroscience represents ‘proven’ facts.
  • This study contradicts popular cognitive theories of memory.
  • We should be ‘cautious’ of ‘unproven’ cognitive models of memory.
  • This study has implications for teaching and learning (which are not yet clear).
  • The findings of this study illustrate the risk of charging headlong into the territory of new neuromyths when applying cognitive psychology to the classroom.

These arguments, I contend, are erroneous: I’ll try to explain why in this post.

Does neuroscience represent ‘proven’ facts?

Well no – and without wishing to be unkind, it’s not the sort of statement I would expect from someone who has made any great study of psychology. Even if you weren’t aware of the many methodological and theoretical issues within neuroscience generally, its contingent (scientific) status is self-evident from Mulcahy’s post. Neuroscientists used to believe that engrams were formed first in the hippocampus and only later were they formed in pre-frontal cortex. This was based on neurological studies – but it turns out that this theory may not be accurate. In essence, a neurological theory about memory formation has been challenged by new evidence: Neuroscience is as contingent and uncertain as the rest of science – it does not represent some inviolable set of facts about the world (any more or less than cognitive science). The fact that this study represents a new finding and hasn’t yet been replicated, might also lead us to wonder why Mulcahy considers its status as a ‘proven’ fact rather than merely an interesting new piece of evidence.

Does this study contradict cognitive theories of memory?

It’s difficult to say from Mulcahy’s post as the only cognitive finding she relates it to is retrieval practice (though it might be more accurate to call retrieval practice an observed behavioural finding rather than a theory of learning). Mulcahy explicitly relates the neurological study to retrieval practice, implying that its status depends upon the multiple trace model (which the study contests). However, this simply isn’t the case:

The first behavioural evidence relating to retrieval practice is often taken to be a study by Arthur Gates in 1917, so the science is more-or-less 100 years old this year. The effect has been observed and replicated many, many times over the intervening years – so it represents a highly reliable observation about learning. Is it ‘proven’ though? Well, this is a philosophical question about science – and my answer would be ‘no’ – because like all scientific ideas it is contingent upon evidence. However, is it the sort of finding you can bet the house on? Well, yes I think it is.

That’s not to say that cognitive scientists won’t refine or improve our understanding of retrieval practice. The key application within this branch of cognitive science isn’t simply that opportunities for practice produce better results, but that retrieval practice produces better results than other forms of studying. Judge the evidence to support this view for yourself – here are a few examples (from decades of research in this field!):

Roediger, H. L., & Butler, A. C. (2011). The critical role of retrieval practice in long-term retention. Trends in cognitive sciences, 15(1), 20-27.

Dunlosky et al (2013) Improving Students’ Learning With Effective Learning Techniques: Promising Directions From Cognitive and  Educational Psychology. Psychological Science in the  Public Interest 14(1) 4 –58

Adesope, et al. (2017). Rethinking the Use of Tests: A Meta-Analysis of Practice Testing. Review of Educational Research, 0034654316689306.

Does the neurological study reported in Mulcahy’s post contradict this evidence? Well no – the study casts new light on the brain mechanisms involved in forming memory, but this is essentially irrelevant to status of retrieval practice as a secure behavioural finding. Indeed, a new neurological model of learning which directly contradicted the secure observation of retrieval practice would be a neurological model in considerable difficulty!

Is cognitive science based on ‘unproven theories’? 

Unfortunately, Mulcahy makes this claim but then doesn’t specify which theories are ‘unproven’ (I’m going to treat this as “untested or lacks reliable evidence” from here on in). I wouldn’t call retrieval practice (the only cognitive finding mentioned explicitly) a cognitive theory of memory; the Working memory model, the New theory of disuse, or Cognitive load theory would have better claim to that status in my opinion. The model of memory that’s been perhaps written about the most recently in relationship to teaching is the Working Memory Model; therefore in the context of the post, is this an example of the sort of ‘unproven’ theory we should be concerned about?

The Working Memory Model was essentially constructed based on behavioural evidence from laboratory studies and ‘real-life’ settings rather than neurological studies. Much is sometimes made of the fact that Baddeley reviewed a great number of brain injury cases when formulating the theory. However, the evidence-base for the theory isn’t restricted to the behavioural deficits apparent in these cases, but has also been rigorously tested in an enormous number of experimental studies. There’s a great clip of Alan Baddeley talking about the development of Working Memory if you’re interested.

The model has undergone many minor revisions, but the basic architecture (e.g. as described in Willingham’s ‘Why don’t students like school?’) has remained remarkably robust in the face of experimental and – yes – neurological findings. There’s a good review of some of the neuroscientific evidence relating to working memory – and I’d argue that the Working Memory Model isn’t some untested hypothesis, but a robust scientific theory:

D’esposito, M., & Postle, B. R. (2015). The cognitive neuroscience of working memory. Annual review of psychology, 66, 115-142.

That isn’t to say that there aren’t further questions and refinements of the model to come: How is the focus of attention organised? How can we better account for capacity limitations in visual STM? What is the role of other neurotransmitters and hormones, in addition to dopamine, in working memory function?

With the technology involved in imagining getting cheaper and new techniques being developed like the one described in Mulcahy post, neuroscience has the potential to make a phenomenal contribution to cognitive science by helping to explicitly test the process models describing behavioural observations. However, what neuroscience won’t be doing is re-writing the behavioural observations upon which cognitive theories are based. For example, we know there are capacity limitations to visual STM from behavioural level observations; the question is how can we explain these? A new neurological study is extremely unlikely to overturn the evidence that we have limited visual STM in the first place.

All scientific theories remain contingent, but sometimes the evidence-base supporting them is strong enough that we can have good confidence that some new observation won’t happen along to upset them anytime soon. The Heliocentric model of the solar system, evolution through natural selection, etc still hold theoretical status, but flat Earthers and creationists are not exercising reasonable  or scientific scepticism when they deny them. Likewise, working memory represents a theory from cognitive science in which we can have strong confidence.

Does this study have implications for teaching and learning (which are not yet clear)?

No. In fact, with all the advances in neuroimaging and our understanding of the brain, it is highly unlikely that we’ll see many applications for classroom practice arising from neuroscience. The reasons for this are explained really well in these three articles:

Willingham, D. Neuroscience applied to education: Mostly unimpressive.

Bishop, D. What is Educational Neuroscience?

Bowers, J. S. (2016). The practical and principled problems with educational neuroscience. Psychological review, 123(5), 600-612.

Bishop, who is a neuroscientist specialising in child development argues that whilst there are a few instances of neuroscience being applied to techniques like neurofeedback, neuropharmacology and brain stimulation, that in the main we should focus on cognitive and behavioural evidence to understand teaching:

“If our goal is to develop better educational interventions, then we should be directing research funds into well-designed trials of cognitive and behavioural studies of learning, rather than fixating on neuroscience”

Bowers argues that neuroscience is often conflated with cognitive psychology and takes credit for its applications. There’s perhaps a reasonable degree of evidence to support this accusation – as even a recent EEF news report mislabelled research into sleep and spaced retrieval practice as neuroscience. In terms of whether neuroscience will give rise to useful applications for teachers:

“More importantly, regarding  the assessment of instruction, the only relevant issue is whether the child learns, as  reflected in behavior.”

Willingham, now Professor of Psychology at the University of Virginia was formerly involved in cognitive neuroscience research looking at brain mechanisms involved in learning. He makes a couple of useful points to bear in mind when assessing the connection between neuroscience and classroom practice.

Firstly, neuroscience isn’t an appropriate level of description for understanding learning in the classroom. Neurological explanations of teaching or learning would be examples of ‘greedy reductionism’. The distance between the actions of a group of neurons in the brain and a group of children in a classroom means that trying to pin classroom behaviours to neurological foundations essentially skips whole levels of theory and description in between. Willingham describes this as the ‘vertical’ problem of educational neuroscience.

Secondly, and in my opinion this represents a deeper problem for Mulcahy’s post, Willingham describes the ‘horizontal’ problem of educational neuroscience:

“Consider that in schools, the outcomes we care about are behavioral; reading, analyzing, calculating, remembering. These are the ways we know the child is getting something from schooling. At the end of the day, we don’t really care what her hippocampus is doing, so long as these behavioral landmarks are in place.”

Cognitive theories are formulated and applied based on these behavioural landmarks. For neuroscience to have useful implications for classroom practice we’d need to translate findings from this behavioural side, to the neural side, and then back to behaviour again. Mulcahy appears to conflate cognitive models of memory based on the behaviour of people with neurological models based on the behaviour of neurons. The former is a promising source of implications for practice, the latter very rarely.

Do the findings of this study illustrate the risk of charging headlong into new neuro-myths when applying cognitive psychology to the classroom?


Unlike neuro-myths like ‘right and left brained learners’ or ‘only using 10% of our brains’ – cognitive models of memory have been tested against a lot of evidence produced from decades of research. Cognitive theories typically applied in teaching, most notably working memory, aren’t ‘flash-in-the-pan’ untested ideas, but well-evidenced theories which (in their applicable form) are extremely unlikely to be overturned any time soon: We can have reasonable confidence in the status of these theories when trying to think how we might make use of them in our teaching.

Now, of course, it’s possible that some future finding will cause significant revision to one or more cognitive theory of memory. I’m not convinced this is true for the study reported by Mulcahy. However, it’s a general truism for any branch of science (indeed, something that separates scientific ideas from neuro-myths). A degree of uncertainty is situation normal in science.

However, it’s also possible that neurological evidence will provide additional support for behavioural models (further increasing the confidence we can have in them); or neurological evidence may provide new understanding for how cognitive models are instantiated in the brain. But it’s extremely unlikely that neurological evidence will fundamentally change the sort of applications arising from behavioural models of learning for the classroom. I’m hugely in favour of teachers developing their professional scepticism and in this age of ‘alternative facts’ there’s more reason than ever to apply it. However, rational scepticism isn’t about making hyperbolic claims or misrepresenting the scientific status of theories.

No one – perhaps least of all the writers listed at the start of Mulcahy’s blog – would suggest that teachers look to psychology credulously or unquestioningly. Cognitive psychological models of memory aren’t a ‘magic bullet’ or the answer to all (indeed that many) problems in the classroom. It has some useful applications in teaching – but the potential benefits are not instantaneous or automatic. Teachers interested in helping their students develop more effective independent learning or looking to implement low-stakes quizzes in their lessons shouldn’t forget everything we know about memory. Instead, they should apply the ideas thoughtfully and use informed professional judgement to check whether they are having the effects intended in practice.

[edit 30/4: minor edits for clarity]
Posted in Psychology for teachers | Tagged , , , , , | 4 Comments

Eliminating unnecessary workload

The ‘Workload Challenge’ consultation ran between 22 October and 21 November 2014. In February 2015 the analysis of this survey was published. The survey asked three main questions about workload:

  1. Tell us about the unnecessary and unproductive tasks which take up too much of your time. Where do these come from?
  2. Send us your solutions and strategies for tackling workload – what works well in your school?
  3. What do you think should be done to tackle unnecessary workload – by government, by schools or by others?

The consultation received 43,832 responses in total, but only 16,820 respondents answered all three open-ended questions about workload. A systematically selected sample of 10% of the full responses was selected for detailed analysis, equating to 1,685 survey respondents.

Of course, many of the things teachers do outside of teaching classes aren’t entirely ‘unnecessary’ and ‘unproductive’ – the analysis noted that many tasks were identified by teachers as related to essential parts of working within a school, but that the time and volume of the tasks were so great that they were unable to complete them even when working much longer than their contracted hours.

The report identified unwarranted ‘detail’, ‘duplication’ and ‘bureaucracy’ as key elements of excessive workload. These related most often to lesson planning, assessment (including marking) and reporting administration (82%).

Teachers reported that the key drivers of workload were perceived requirements of Ofsted / accountability (53%) and tasks set by school leaders (51%).

In response the DFE set up an Independent Teacher Workload Review Group. This group has recently released three reports looking at Data Management, Planning and Resources, and Marking. There are some interesting discussions about the causes of excessive workload in each of these areas – but I’ll simply list some of the key recommendations first:

Eliminating unnecessary workload associated with data management

The report recommends that only data which is ‘purposeful, valid and reliable’ should be collected. One of the issues it identifies is what they call ‘gold-plating’ – collecting everything ‘just in case’ it is needed for accountability purposes. Where the collection of data becomes an end in itself the report suggests it is not simply unnecessary, it is damaging. It also recommends that summative data should not normally be collected more than three times a year per pupil.

Poor data, for example tracking based on formative assessment, can provide a ‘false comfort’. It gives the false impression that a numerical measure of pupil progress can be tracked and used to draw a wide range of conclusions about pupil and teacher performance when the data are flawed (i.e. neither reliable or valid).

Schools should use data in the format available – Ofsted, for example, does not require data in any particular format – and the report suggests that school leaders actively avoid asking for the duplicate collection of data. In short ‘collect once, use many times.’

The report suggests that schools should analyse the cumulative impact on workload of new initiatives before implementing new data collection systems.  It also suggests that schools should be prepared to stop an activity which is time-consuming and has limited value: i.e. not assume that collection or analysis must continue just because it always has

Eliminating unnecessary workload around planning and teaching resources

The report argues that teachers spend too long planning and resourcing lessons. It makes a key distinction between ‘the daily lesson plan’ and ‘lesson planning’. It suggests that school leaders requiring the production of ‘daily written lesson plans’ are using them as proxy evidence for an accountability ‘paper trail’ rather than an effective process of planning for pupil progress and attainment. The report identifies the reaction of school leaders to the real and perceived demands made by Government and Ofsted as a principal cause of this. This unnecessary accountability paperwork often becomes a fairly pointless ‘box-ticking’ exercise and creates a ‘false comfort’ of purpose (the mere appearance of ‘doing something’ to raise school standards).

Perhaps the main issue with resourcing is teachers having to ‘reinvent the wheel’ in the absence of good quality textbooks and fully-resourced schemes of work. Once collaboratively developed schemes of work are in place, the report suggests that individual teachers should be free to teach in a way informed by their professional judgement and experience.

The report discusses some of the cultural mistrust of textbooks in many English schools. This cultural bias is one factor driving increased workload and the report argues that a cultural shift is required – one where high-quality textbooks are seen as part of a recipe – a useful base but still requiring the flair of the chef.

A great deal the recent escalation in workload here is probably due to the rapid changes in curriculum and specifications over the past few years. The report suggests that the DFE should commit to providing sufficient lead-in times for changes for which the sector will have to undertake significant planning to implement.

Eliminating unnecessary workload around marking

The report argues that providing written feedback on pupils’ work has become disproportionately valued by schools and has become unnecessarily burdensome for teachers. It sums up as ‘deep marking’ some of this unnecessary practice – which covers lots of varieties of this practice like dialogic marking, triple marking and quality marking. It suggests decisions by school leaders regarding marking have been in response to distorted ideas about Assessment for Learning and the presence of Ofsted reports which praised particular methods of marking.

The report suggests that ineffective marking looks like this:

  • It usually involves an excessive reliance on the labour intensive practices under our definition of deep marking, such as extensive written comments in different colour pens, or the indication of when verbal feedback has been given by adding ‘VF’ on a pupil’s work.
  • It can be disjointed from the learning process, failing to help pupils improve their understanding. This can be because work is set and marked to a false timetable, and based on a policy of following a mechanistic timetable, rather than responding to pupils’ needs.
  • It can be dispiriting, for both teacher and pupil, by failing to encourage and engender motivation and resilience.
  • It can be unmanageable for teachers, and teachers forced to mark work late at night and at weekends are unlikely to operate effectively in the classroom.

It also makes the point that there is little robust evidence to support the current widespread practice of extensive written comments (an EEF review is looking at existing evidence on marking and identifying gaps in research – and should be published fairly soon) . They recommend that school leaders should also challenge emerging ‘fads’ that indirectly impose excessive marking practices on schools.

The report suggests that effective marking is ‘meaningful’, ‘manageable’ and ‘motivating’:

  • Meaningful: marking varies by age group, subject, and what works best for the pupil and teacher in relation to any particular piece of work. Teachers are encouraged to adjust their approach as necessary and trusted to incorporate the outcomes into subsequent planning and teaching.
  • Manageable: marking practice is proportionate and considers the frequency and complexity of written feedback, as well as the cost and time-effectiveness of marking in relation to the overall workload of teachers. This is written into any assessment policy.
  • Motivating: Marking should help to motivate pupils to progress. This does not mean always writing in-depth comments or being universally positive: sometimes short, challenging comments or oral feedback are more effective. If the teacher is doing more work than their pupils, this can become a disincentive for pupils to accept challenges and take responsibility for improving their work.

What causes excessive workload for teachers?

The report identifies many plausible reasons why workload has become so unmanageable for school teachers over recent years, for example:

  • Rapid changes to curriculum, exam specification and school structures arising from the DFE
  • Historical demands from Ofsted and the perception of ‘what Ofsted wants’
  • School leaders ‘gold-plating’ the evidence they think they ‘might need’ to justify their decisions

However, in my opinion it perhaps misses a key driver for this workload. In ‘What’s driving workload in schools?’, I suggest that excessive demands made upon teachers arise from the inherent uncertainty for school leaders and teachers created by high-stakes judgements made about their effectiveness or capability arising from measures of school or teacher performance which lack validity and reliability. The education white paper (Educational Excellence Everywhere) makes a proposal which might help: to remove the separate Ofsted judgements for Teaching and Learning from future inspections. This seems a pragmatic move – given T&L grades tend to correlate with student achievement anyway, I’m not sure the separate grade often tells school leaders very much – and this might further undermine the distortion to planning, marking and assessment which are so very often given spurious justification by ‘what Ofsted wants’.

Posted in Education policy | Tagged , , | 3 Comments

Lesson observations: Would picking a top set get you a better grading?

Lesson observations: Approach with caution!

For any measure of teaching effectiveness to be useful, it needs to be valid. To be valid, a measure also needs to be reliable. Reliability represents the consistency of a measure. A measure is said to have a high reliability if it produces similar results each time – for example if two observers independently rate the same lesson, those ratings should agree with one another. Validity represents the extent to which a measurement corresponds to what it aimed to measure. So a valid observation would measure genuine learning gains, rather than be subject to bias.

We’ve known for some time that classroom observations lack the reliability for high-stakes judgements of teacher effectiveness. For example, the MET project – which spent millions of dollars to produce robust observation protocols – found that even for these carefully constructed observation measures, the reported reliabilities of observation instruments used in the MET study range from 0.24 to 0.68. Rob Coe gives a great example of why such low reliability represents a problem for teacher appraisal.

“One way to understand these values is to estimate the percentage of judgements that would agree if two raters watch the same lesson. Using Ofsted’s categories, if a lesson is judged ‘Outstanding’ by one observer, the probability that a second observer would give a different judgement is between 51% and 78%.”

Classroom observation: it’s harder than you think

Evidence of poor validity has also been around for a while. For example, Strong et al (2011) asked participants to watch videos of teaching and asked them to rate whether the teacher was ‘effective’ or ‘ineffective’. These ratings were compared to the value-added scores for the students of those teachers. Where the observers had not received specific training on observations, even experienced teachers and head teachers matched ‘effective’ teaching to high value-added less than 50% of the time. Again, Rob Coe gives an example of what this would mean for classroom teachers being graded in the UK.

“Fewer than 1% of lessons judged inadequate and only 4% of lessons judged outstanding produce matching learning gains. Overall, 63% of judgements will not correspond with value-added.”

Classroom observation: it’s harder than you think

Why is observation such a problematic measure of effective teaching?

There are lots of possible reasons why observations may not be a valid or reliable measure of teaching: Learning is invisible – it takes place in a student’s head and anything that we can see in the classroom is merely a proxy for that learning, is one problem. Another problem is the fact that observers likely have strong ideas about what ‘good practice’ looks like – whether those practices lead to learning gains is another matter. Teaching is also based on a natural ability – something humans have evolved to do – therefore even experienced teachers will find it really difficult to explicitly describe what it is they do.

However, there also appears to be another, quite simple, reason why using observations to make high-stakes judgements about teaching tend to lack validity: It seems the prior attainment of students in a class biases the ratings of an observer.

Steinberg, M. P., & Garrett, R. (2016). Classroom Composition and Measured Teacher Performance What Do Teacher Observation Scores Really Measure?. Educational Evaluation and Policy Analysis, 0162373715616249.

Steinberg and Garrett used data from the MET study to explore the extent to which the class a teacher is timetabled to teach might influence observation measures of that teacher’s performance. They review a number of previous studies in this area, relating other factors which appear to influence the outcome of observation ratings. For example, observation scores tend to be lower for teachers whose students come from more disadvantaged backgrounds. They also note the problem that teachers are not randomly assigned to teaching groups in schools – and that often inexperienced teachers are allocated to more disadvantaged students, while more experienced teachers tend to work with higher achieving students.

To examine whether the prior attainment of students influenced observation ratings, they used the data from the MET study. The MET study was carried out over 2 years and across six districts in the US. One of the advantages of the MET data was the fact that the project randomised the allocation of teachers to classes prior to the second year of the study. They used this random allocation of 834 teachers to classes (Grades 4-9) to generate estimates of the effect of prior achievement on measured teacher performance for that second year. Their conclusion suggests that teachers of lower-ability groups may be unfairly rated as relatively ineffective, even when very the strict observation protocols involving considerable training are used:

“In this article, we find that the incoming achievement of a teacher’s students significantly and substantively influences observation-based measures of teacher performance. Indeed, teachers working with higher achieving students tend to receive higher performance ratings, above and beyond that which might be attributable to aspects of teacher quality that are fixed over time.” Page 20

Interestingly, the study found that the influence of prior attainment was greater for teachers of ELA (English Language Arts) than for maths teachers, and for subject-specialist teachers (common in secondary in the UK) compared to generalist teachers (more like primary). Another interesting finding was that prior attainment appeared to particularly influence measures of teaching related to ‘classroom climate’ – suggesting that observers of teachers of higher-performing students may be judged better at behaviour management than they actually are.

This study has significant implications for schools which use high-stakes (let alone graded) observations as the basis for appraisal. If a teacher’s effectiveness is, in part, determined by which groups they are allocated to teach then withholding a pay rise or placing a teacher on capability based on observations of teaching becomes potentially unmerited and inequitable.

How can teachers know their impact?

Observations of teaching can (and I’d say should) provide teachers with useful feedback they can use to develop their professional practice – but if observations lack validity, then they won’t help provide useful formative feedback (let alone summative judgement). Once again, Rob Coe has some suggestions about how schools might approach observations:

There’s a great video of Rob Coe presenting some of the problems and possible ways forward at Teach First: What is the future of lesson observation in our schools? (Part 1) (Jan 2014)

  • Stop assuming that untrained observers can either make valid judgements or provide feedback that improves anything
  • Apply a critical research standard and the best existing knowledge to the process of developing, implementing and validating observation protocols
  • Ensure that good evidence supports any uses or interpretations we make for observations. It follows that appropriate caveats around the limits of such uses should be clearly stated and the use should not go beyond what is justified
  • Undertake robustly evaluated research to investigate how feedback from lesson observation might be used to improve teaching quality (EEF already has one such study underway).

Other than observations, value-added data and student survey feedback might be used to help provide teachers with more valid feedback on their teaching.

The MET study, for example, found that VA data reasonably correlates with a teacher’s long-term success. However, VA data tends to come too infrequently (and too late) in a school cycle to help identify where things might be going well or need to improve. It also doesn’t provide ‘fine-grain’ detail – i.e. it can tell you that students did well, but can’t really tell you what it was the teacher was doing well, or what they should be doing to improve. There are also some other issues with VA scores – for example, one study tested VA modelling techniques by see what effect teachers had on their students’ heights. In their analysis, they found that teachers appeared to influence the height of their students almost as much as English and maths scores.

MET predictors of success

Testing Teachers: What works best for teacher evaluation and appraisal

Student surveys are another method used by the MET. I’ve used these within a coaching context to help teachers identify areas they might work on – and used follow-up surveys to see whether students felt the changes had any impact. You can read a bit about this here: Investigating teaching using a student survey

At the last though, the problem for teachers is that high-stakes judgements use any form of measure which lacks the reliability and validity to form a reasonable basis for such judgement. I suspect part of the issue has been the impression that school leaders needed to have such observation data to support their judgements of the quality of teaching in their schools to Ofsted. One of the proposals in the recent white paper (Educational Excellence Everywhere was to remove the separate Ofsted judgements for Teaching and Learning from future inspections. On the basis of the evidence this seems like a very good idea indeed!

Perhaps once out of Ofsted’s shadow, schools will be able to think about how to use observations much more constructively – perhaps as a coaching tool to help teachers improve their impact rather than a sword of Damocles to hang over their heads three times a year.

Posted in Education policy | Tagged , , , , | 7 Comments

Attachment Theory: Why teachers shouldn’t get too excited about it.

John Bowlby: Attachment theory

The British psychologist John Bowlby is fairly synonymous with attachment theory. From his clinical work with ‘juvenile delinquents’ over the course of World War II be began formulating ideas about the role of early and prolonged separation from parents and caregivers in the development of problems in those children’s social and emotional development.

The core of his theory is that attachment is an evolutionary adaptation which is characterised by a child seeking proximity to caregiver when that child perceives a threat or suffers discomfort. Given the intense needs of human infants, it is perhaps unsurprising that the formation of a “deep and enduring emotional bond that connects one person to another across time and space” evolved to improve the chances of an infant’s survival.

Over the first year of life, an infant begins to develop attachments with parents or carers. As these attachments form we tend to see characteristic behaviour in infant interactions with their attachment figure:

  • Stranger Anxiety – the infant responds with fear or distress to arrival of a stranger.
  • Separation Anxiety – when separated from parent or carer the infant shows distress and upon that attachment figure’s return a degree of proximity seeking for comfort.
  • Social Referencing – the infant looks at the parent or carer to see how they respond to something novel in the environment. The infant looks at the facial expressions of the parent or carer (e.g. smiling or fearful) which influences how they behave in an uncertain situation.

Attachment figures aren’t simply individuals who spend a lot of time with the infant, or the one who feeds them, but typically the individuals who responds the most sensitively, for example often playing and communicating with the infant. For many infants the principal attachment figure is their mother, but fathers, grandparents or siblings may also fulfil this role. By about 18 months, most infants enjoy multiple attachments though these may be somewhat hierarchical with a primary attachment figure of particular importance. The behaviour relating to attachment develops over early childhood, for example babies tend to cry because of fear or pain, whereas by about two-years-old they may cry to beckon their caregiver (and cry louder or shout if that doesn’t work!).

Bowlby believed these early experiences of attachment formed an internal ‘working model’ which the child used to form relationships with secondary attachment figures, later friendships with peers and eventually romantic and parenting relationships in adult life.

Mary Ainsworth: Types of attachment

There are individual differences in the behaviour related to attachment. Famous observation studies by Mary Ainsworth (who worked with John Bowlby during the 1950s) identified that in normal children there were a range of attachment types:

Secure attachment: The majority of infants, across different cultures, tend to have an attachment style typified by strong stranger and separation anxiety along with enthusiastic proximity seeking with the parent upon reunion.

Insecure –avoidant: Slightly more common in western cultures, an insecure-avoidant attachment tends to be characterised by avoiding or ignoring the caregiver and showing little emotion (whilst experiencing inward anxiety) when the caregiver leaves the room, and displaying little enthusiasm when the caregiver returns.

Insecure-resistant: Perhaps more common in ‘collectivist cultures’, an insecure-resistant (sometimes also called insecure-ambivalent) attachment tends to be characterised as showing intense distress during separation, and being difficult to comfort when the caregiver returns. Infants with this attachment type may also show some rejection or resentment towards the caregiver after a separation.

Disorganised attachment: Added in the 1990s, infants with a disorganised attachment tend to show no consistent pattern in behaviour towards their caregiver. For example, they may show intense proximity seeking behaviour one moment, then avoid or ignore the caregiver the next.

If you you’re interested in some of the history and the origins of attachment theory, the work of John Bowlby and Mary Ainsworth are good places to start. There’s a nice summary here – Bretherton, I. (1992). The origins of attachment theory: John Bowlby and Mary Ainsworth. Developmental psychology, 28(5), 759.

Many children may display behaviour suggesting an ‘insecure’ attachment type which may make it a harder to form peer friendships, and this likely underlies an association between insecure and disorganised attachment and higher levels of behaviour problems. However, it’s not certain that differences in attachment are specifically the cause of behaviour problems.  For example, a meta-analysis by Fearnon, et al (2010) found that socio-economic status accounted for a considerable portion of the variance in behaviour problems in childhood.

Fearon, R. P., Bakermans‐Kranenburg, M. J., Van IJzendoorn, M. H., Lapsley, A. M., & Roisman, G. I. (2010). The significance of insecure attachment and disorganization in the development of children’s externalizing behavior: a meta‐analytic study. Child development, 81(2), 435-456.

So, whilst there’s reasonable evidence to suggest that these individual differences in attachment correlate to differences in behaviour within school, it is very important to note that these differences are not ‘pathological’ in a clinical sense. Given that about 30-35% of representative populations have an ‘insecure’ attachment, NICE suggests that it is unhelpful to view insecure attachment as an ‘attachment problem’.

Reactive Attachment Disorder

A popular misconception about attachment is a conflation between the ‘types of attachment’ that children possess and an ‘attachment disorder’. CoramBAAF, a leading charity working within adoption and fostering, suggests that even when used by those trained to do so, attachment classifications cannot be equated with a clinical diagnosis of disorder. While the insecure patterns may indicate a risk factor in a child’s development, they do not by themselves identify disorders. The term ‘attachment disorder’ refers to a highly atypical set of behaviours indicative of children who experience extreme difficulty in forming close attachments. NICE suggests that the prevalence of attachment disorders in the general population is not well established, but is likely to be low. However there are substantially higher rates among young children raised in institutional care or who have been exposed to abuse or neglect. The Office for National Statistics (2002) report for the Department of Health estimated that somewhere between 2.5% to 20% of looked after children had an attachment disorder (depending on whether a broad or narrow definition was used).

There is a broad distinction between two classifications of RAD:

Inhibited attachment disorders: Characterised by significant difficulties with social interactions such as extremely detached or withdrawn – usually attributed to early and severe abuse from ‘attachment figures’ such as parents.

Disinhibited attachment disorders: Characterised by diffuse attachments, as shown by indiscriminate familiarity and affection without the usual selectivity in choice of attachment figures – often attributed to frequent changes of caregiver in the early years.

Reactive Attachment Disorder is a psychiatric condition and often accompanied by other psychiatric disorders. CoramBAAF argues that the lack of clarity about the use of attachment concepts in describing children’s relationship difficulties can create confusion and advises extreme caution. A diagnosis of an attachment disorder can only be undertaken by a psychiatrist.

Unfortunately, there are also no widely applicable, evidence-based set of therapies for RAD. However, there has been significant concern expressed about some therapies. One example is “Holding therapy” involving holding a child in a position which prevents escape whilst engaging in an intense physical and emotional confrontation. CoramBAAF argues there is nothing in attachment theory to suggest that holding therapy is either justifiable or effective for the treatment of attachment disorders. Less controversial therapies involve counselling to address the issues that are affecting the carer’s relationship with the child and teaching parenting skills to help develop attachment.

What should teachers be doing?

This is why I don’t really understand all the apparent excitement about attachment theory at the moment: there’s nothing a teacher should be doing that they shouldn’t already be doing.

Firstly, given the relationship between attachment disorders and abusive or neglectful relationships, perhaps some teachers are worried that they need to know about attachment disorder in order to fulfil their statutory safeguarding responsibilities. However, it’s important to note that whilst some children with RAD have suffered abuse or neglect, that doesn’t mean that problematic behaviour is evidence of such. The teacher isn’t in a position to make either the clinical judgement or investigate the cause of problematic behaviour they suspect may relate to a safeguarding concern. If a student is behaving in a way which concerns you, then report that concerns to your designated member of SLT (as you would any safeguarding concern). Whether or not you might think a child has an insecure attachment or a disordered attachment isn’t really your professional call.

Secondly, it may be that some teachers feel they need to know more about attachment in order to support students with behaviour problems in school. However, the advice for working with RAD students isn’t really any different to good behaviour management generally. Teachers should not confuse their role in loco parentis with being the primary caregiver for a child. For example, the Center for Family Development is an attachment centre based in New York specializing in the treatment of adopted and foster families with trauma and attachment disorder. In their ‘Overview of Reactive Attachment Disorder for Teachers’ they point out that, as a teacher, you are not the primary caregiver for a child you teach.

“You cannot parent this child. You are the child’s teacher, not therapist, nor parent. Teachers are left behind each year, its normal.  These children need to learn that lesson.”

They recommend approaching behaviour through explicit teaching of consequences: that there’s a consequence associated with good behaviour and there’s a consequence for poor behaviour.

Further suggestions include:

  • Creating a structured environment with extremely consistent rules
  • Being consistent and specific when giving praise or confronting poor behaviour
  • Providing the child with choices, but choices provided by you, the teacher.
  • Maintaining your professional boundaries (avoid attempting to create ‘friendship’ or ‘intimacy’ with the child).
  • Keep your calm and avoid losing your temper; communicate directly, positively, and firmly.
  • When implementing consequences, remain unemotional and assume a tone that says, effectively, “That’s just the way business is done – nothing personal.”

In short, there’s nothing that teachers shouldn’t do when working with any student with challenging behaviour. Whether the challenging behaviour is due to an issue with attachment isn’t really the issue.

In Summary

Whilst there’s a relationship between insecure attachment and behaviour problems in the classroom, teachers are not qualified to ‘diagnose’ a student’s attachment type nor engage in any kind of ‘therapy’ with that student. There is a condition called ‘Reactive Attachment Disorder’ which has a higher incidence within ‘looked after’ students. Again, teachers are not qualified to make this psychiatric diagnosis.

There is an important difference between the professional role of a teacher and the role of a primary caregiver, and it’s vital that recent interest in attachment theory within the profession doesn’t blur that line. Where teachers are concerned that behaviour presented in the classroom might indicate abuse or neglect, then they are already obliged by law to report these concerns (but not investigate them or try to involve themselves in resolving them).

In terms of managing the behaviour of students with attachment problems, so that they can overcome the difficulties of their family background and experience success within school, the guidance suggests things like a structured environment, consistent rules, professional distance and focusing feedback on behaviour not the child: Advice that forms the basis of good behaviour management regardless of the cause of problematic behaviour.

It may be the case that specific children with RAD will have different strategies which will help them achieve in school. However, that’s also the case for any student with SEND. Perhaps what is important for teachers is not specific ‘training’ in attachment theory to help them ‘diagnose’ attachments, but a clear understanding of their school’s SEND system and time to read, implement and work with SEND coordinators to ensure any specific strategies suggested by an educational psychologist or child psychiatrist are employed effectively.

Posted in Psychology for teachers | Tagged , | 25 Comments

Germane load: The right kind of mental effort?

Despite our vast capacity to hold information in long term memory; our working memory is extremely limited and becomes overloaded very easily. Greater insight into these problems and some practical ideas about what to do about them comes from the research of John Sweller. Sweller was interested in how teachers could structure their lessons in order to minimise this problem of overload. From the results of numerous experiments, he developed Cognitive Load Theory (CLT) which explains how teachers might manage the ‘load’ they place on working memory and help students learn more readily. The theory divides up the different kinds of loading on working memory:

Intrinsic load represents the inherent difficulty of the material and is related to their levels of element interactivity. This is limited to between 3-5 items. There’s not much we can do about this as teachers (multiplying 5×8 will always be easier than 5x8x3). However, for some materials, it may be possible break material up into simpler sub-components which can be tackled separately at first and recombined later.

Not all material is equally intrinsically difficult. Where materials are related to what David Geary calls ‘biologically primary knowledge’ the load on working memory appears to be greatly reduced. Our brains are adapted to solve complex problems related to survival and reproduction (e.g. reasoning tasks related to social cheating are much easier than formal syllogistic logic).

Another way we can ‘cheat’ working memory limitations is by exploiting the fact that visual and auditory information can be processed simultaneously without creating additional load. For example, Sweller, Van Merrienboer and Paas (1998) report that where material has high intrinsic load, using visual/audio presentations was far more effective than where text and explanation (which both require verbal processing) was used.

Intrinsic load is also reduced where individuals have a strong background of prior knowledge. Familiar information is said to be organised in our long-term memory as a schema – essentially allowing us to work with a sizeable ‘chunk’ of information as if it were one item. By having automatic access to these schemas, it allows us to overcome something of the limitations of our working memory. This is why, for instance, many people argue for the memorisation of multiplication tables. For example, if the student doesn’t have to mentally calculate 5×8 this will reduce the load on working memory and they will find 5x8x3 easier to ‘hold in mind’.

Extraneous load is the load generated by the way that material is presented to the learner. For example, Kirschner, Sweller and Clark (2006) suggest that where the intrinsic load of material is high, presenting new material through minimally-guided activities like problem-solving creates an additional, unhelpful load on working memory. One of the issues is that when faced with a novel problem, students tend to use a ‘processing intensive’ general strategy called means-end analysis in order to find a solution. Sweller, Van Merrienboer and Paas (1998) suggest that ‘goal-free’ problems can avoid this issue by forcing the student to rely upon strategies other than the load-intensive means-end approach. A second strategy to overcome means-end searches discussed in that paper is the use of worked examples as a substitute for solving problems.

Another further source of extraneous load is attention switching. For example, Mousavi, Low, and Sweller (1995) suggest that rather than having labels alongside a diagram – which requires the student to switch attention between the text and the visual image – placing the labels at appropriate locations on the diagram can dramatically facilitate learning. In essence, we should seek to minimise extraneous cognitive load in order to best facilitate learning.

Just taking these two types of cognitive load, the implication might appear to be that eliminating extraneous load and organising instruction so that sub-components of a complex task are automated would be sufficient for the learning of new material.

However, Sweller, Van Merrienboer and Paas (1998) reported that encouraging learners to engage in conscious cognitive processing that is directly relevant to the construction of schemas benefits learning. For example, varying the conditions of practice appears to have beneficial effects upon learning, despite the fact that the presence of that variety would raise the loading on working memory. They called this germane cognitive load.

Germane cognitive load

Van Merrienboer, Kester and Paas (2006) suggest that whilst load reducing methods, such as low variability and explicit guidance and feedback, are effective in producing high retention of the material – that these techniques hinder the transfer of learning. They argue that there is a need to vary the conditions of practice and only give limited guidance and feedback in order to induce germane cognitive load and improve transfer.

It’s tempting to connect this to Robert Bjork’s ideas about ‘desirable difficulties’. Bjork makes the argument that things that make learning ‘easy’ during instruction do not always lead to long-term learning. He argues that by creating conditions which are difficult and appear to impede immediate performance lead to greater long-term retention and better transfer. David Didau summarises the idea like this:

“I love Bjork’s coining, ‘desirable difficulties’ because it gets to the very heart of the counter intuitive nature of learning. It turns out that making it more difficult for students to learn means that they actually learn more!”

There are lots of examples across in psychology where introducing additional difficulty appears to facilitate learning. For example, it has been shown that making font more difficult for the learner to study improves memory performance. Solving anagrams involves more effort than simply copying words, but this additional effort appears to facilitate recall (for easy anagrams at least).

It may seem that there isn’t a problem. Perhaps, Bjork’s desirable difficulties are merely examples of germane cognitive load. However, it does create an issue for the theory – mainly because there’s no easy way to experimentally measure each type of load and this risks making the theory impossible to falsify. Debue and van de Leemput (2014) explain the problem:

“In the absence of reliable measurements for each load, the CLT cannot ever be refuted because it is always possible to attribute variation in the overall cognitive load to a source that corroborates the initial assumptions. For example, assuming that the overall load is kept constant, a decrease in performance will be attributed to a rise in extraneous load that impairs germane cognitive processes. Conversely, if the performance increases it will be attributed to a germane load enhancement made possible by a drop in extraneous load.”

Perhaps the solution is simply to get rid of the notion of germane load. However, it seems that cognitive load theory needs some sort of component which represents the fact that some kinds of mental effort lead to improved long term memory for material. However, unless there’s a way to measure this (and I suggest self-report measure are unlikely to convince critics of the theory), it risks making the theory effectively unfalsifiable.

What is the right sort of mental effort?

The relationship between some sort of mental effort and learning isn’t terribly controversial. For example, most readers of this blog will recognise the quotes below:

Memory is the residue of thought           Dan Willingham

Learning happens when people have to think hard         Rob Coe

But what does thinking or ‘thinking hard’ mean? Is it just the quantity of thinking or some aspect of the quality of thinking which leads to learning?

Well, one way of thinking about the ‘right kind of thinking’ might be to borrow the concept of ‘depth of processing’ first posited by Craik and Tulving (1972). I describe a bit about their ideas in more detail here. In brief, they suggested that mental effort might comprise of more shallow or deeper processing.

“For example, in shallow processing, the subject answered questions concerning the word’s typeface (for example, is the word “HOUSE” written in capital letters?); in intermediate processing, the subject answered questions about rhyme (for example, does the word “house” rhyme with “pencil”?); and in deep processing, the questions were directed toward the word’s semantic content (for example, does the word “house” fit into this sentence: “The ______ has a beautiful window”?).”

They suggested that retention in long-term memory depends on the depth to which new information is analysed. However, they argued that the system stops processing the information once the analysis relevant to the task has been carried out, so if a task merely requires shallow processing of the material then deeper processing will not occur.

A simple way to illustrate this is consider the difference between two fairly common classroom activities – the word search and the crossword – when familiarising students with new terminology. Word search puzzles are a great example of ‘structural processing’ – they can be completed with no understanding of the key words but simply pattern matching the first few letters. Although such an activity might require mental effort (e.g. some of the words are presented as anagrams or are arranged diagonally or backwards in the grid) it’s not the right sort of mental effort for effective learning. A better ‘quiz’ type activity might be to use a crossword – perhaps with the definitions of words as the clues – as at least any mental effort expended will lead students to attend to the deeper, semantic properties of the key terms.

This may help explain why the ‘testing effect’ is a more effective method of encouraging reliable recall than restudying. Testing encourages ‘semantic searches’ in order to retrieve information from long-term memory and that sort of mental effort facilitates future attempts. It’s interesting to note that the testing effect disappears where there is no mental effort involved in the retrieval. A recent study by Endres and Renkl (2015)examined the testing effect under a range of conditions and concluded:

“Overall, our findings on mental effort and non-tested items support the elaborative retrieval hypothesis, including the interpretation of mental effort as an indicator of semantic elaboration.” …

“Our results suggest that testing tasks should be used that require learners to invest substantial mental effort. A more difficult task leads to more elaboration as long as it can be solved (more or less) successfully.”

We can also relate this to the benefits of spacing – the ‘spacing effect’ – where practice is spread over time rather than condensed over a short period. We’ve known since Ebbinghaus that information is lost fairly rapidly from memory – but that reviewing the material periodically (e.g. through a quiz) leads to better recall over time.  It seems plausible that the period of delay increases the semantic focused mental effort required to retrieve the information, whereas immediate testing when the information is freshly retained is too effortless to promote much learning.

This might also help explain why varying the conditions of practice (arguably the key component of germane load), whilst more difficult in the short-term tends lead to better long-term recall. When talking about the problems associated with assessment rubrics, Greg Ashman makes the point that students focus exclusively on the elements required by the rubric and ignore the deeper structure related to the problem.

Greg Ashman Rubrics

Source of image

Again, might this be argued to be a problem related to shallow processing. By varying the conditions of practice, the student is encouraged to engage in deeper semantic processing rather than rely upon fairly superficial automatic recall.

However, does recasting germane load as mental effort related to semantic processing solve the problem of measurement? Well, not yet – but as brain imaging becomes cheaper and more available to psychologists, it’s a possibility. There are certainly studies looking for these neural correlates – for example, Otten, Henson and Rugg (2001) report the results of an fMRI study examining the neural correlates of memory encoding.

15 volunteers were presented with a series of 280 words and (depending on a pre-stimulus cue) had to make a decision based on either a semantic process (was it alive) or a non-semantic process (position in the alphabet). Afterwards they were presented with a recognition task, where they had to pick out the words they had seen (mixed in with 140 others they had not). They found there was anatomical overlap in the fMRI scans for semantically and non-semantically processed items, but the non-semantic items appeared to activate a sub-set of the semantically processed ones. They conclude:

“The overlap between regions activated by the depth of processing and deep subsequent memory effects implies the existence of cognitive operations that are engaged differentially both by semantic versus non-semantic processing and by effective versus less effective episodic encoding in a semantic task.”

It’s a small scale study – like many involving neuroimaging – but might it provide a possible way to eventually anchor a concept of germane load by relating it to semantic processing?

Posted in Psychology for teachers | Tagged , , , , | 16 Comments

Goodbye Mr Chips: can research tell teachers how to teach?

Back in October, I took part in a debate at the Battle of Ideas.

Hosted by Kevin Rooney and featuring Professor Frank Furedi, Jack Marwood and Munira Mirza, the discussion focused on the relevance of research to classroom practice.

The video of that session is available here: WORLDBytes

Details of the session are here: Battle for the Classroom


Posted in Philosophy of education | Tagged | Leave a comment

Psychology of behaviour management (part 3)

In the last posts, I briefly examined some of the key ideas and limitations of offering rewards and sanctions, and restorative approaches. Both of these tackle the issue of behaviour at an individual level; in this post I want to examine group-level strategies which utilise our propensity conform to social norms.

Social norms

Humans are social animals and benefit enormously from shared resources and protection, and the ability to engage in acts of reciprocal altruism with reduced risks of exploitation within social groups. Conversely, exclusion from a group tends to have a highly detrimental effect on an individual’s capacity to survive and reproduce. Therefore, humans have evolved a complex range of strategies for maintaining our membership and status within social groups.

One approach to behaviour in schools is encouraging adherence to social norms. Social norms are the (often unwritten) rules about how we behave in social context. One of the functions of social norms is to distinguish who is part of our group and who is an outsider. Behaving in accordance to the norms of our group, especially when there is a ‘cost’ attached, signals our membership of that group. Breaking social norms carries with it a risk of exclusion from the group.

It’s hard to see how society could function at all if we didn’t conform to some fairly predictable set of rules about how we behave. Some of these norms become enshrined as formal laws, like driving on the left in the UK. However, many involve unspoken arrangements, merely triggering disapproval from others if we break them, e.g. the rules of queuing, or saying ‘please’ and ‘thank you’. Like all cultural institutions, schools possess social norms regarding the behaviour of students. Some of these are explicitly communicated through ‘school rules’, but many are based on the unspoken expectations of the teachers and students who make up the school.

Normative influence

The ‘power’ of this desire to ‘fit in’ with a group was demonstrated by Solomon Asch in a famous series of experiments conducted in the 1950s. He asked groups of students to make a series of comparative judgement about the length of a line:

Asch lines

Source of image

However, only one member of the group was a genuine participant. What the participant didn’t know is that the other people in the group were actually ‘confederates’ of the experimenter, instructed to deliberately give wrong answers on certain critical trials. What Asch was investigating was the extent to which the participant would conform to the rest of the group by also giving the wrong answer. He found that 25% of participants would disregard the wrong answer given by the rest of the group and give the correct answer every time. However, 75% of the participants gave at least one wrong answer and 5% of the participants followed the group in giving the wrong answer on every occasion. For Asch, this demonstrated a strong human instinct to fit in, even with a group of strangers and when the task involved unambiguously wrong answers. There’s a short clip illustrating the procedure here.

Asch went on to use this experimental technique to examine the key variables which strengthen and weaken normative influence. He found that when participants could give their answers in private (by writing them down) they were less likely to conform to the group. He also found that the strength of normative influence was greatly diminished by a lack of unanimity; the presence of a ‘fellow dissenter’ making it much easier to act against the behaviour of the rest of the group.

Further insight into the factors which appear to underlie normative influence comes from the research of Robert Cialdini.  For example, Cialdini and Goldstein (2004) identify three major components to social influence; Accuracy, Affiliation and Maintaining a positive self-concept.

The Goal of Accuracy represents an individual’s motivation to be right thinking or possess the correct information when making a decision. They make the point that individuals often look to social norms to gain an accurate understanding of and effectively respond to social situations, especially during times of uncertainty. The example I use when teaching is when I went to a posh wedding in my youth and was confronted by more cutlery than I knew what to do with. I found myself immediately looking around at which knife, or fork, or spoon other people were using for each course.

The Goal of Affiliation represents an individual’s motivation to create and maintain good relationships with others. In essence we tend to adopt the behaviour of others so they will be more likely to like us. Quite superficial characteristics tend to trigger this kind of behaviour; for example physical attractiveness, perceived similarity (e.g. a shared birthday or the same name), ingratiation (e.g. remembering a person’s name or mild flattery – though it’s worth noting that whilst the target tends to develop more positive feelings towards the person, on-lookers tend not to), and reciprocation (the obligation to repay others for what we have received from them).

An interesting aside to this influence of bolstering affiliation through reciprocation is the ‘Franklin effect’. The Franklin effect exploits cognitive dissonance by getting someone who doesn’t like you to do a small favour for you. As a result, that person often develops more positive feelings towards us. A ‘top tip’ that exploits this might be to ask a challenging student to carry some books to another classroom for you, for instance.

The Goal of Maintaining a Positive Self-Concept represents our tendency to maintain our concept of self through behaving consistently with past “actions, statements, commitments, beliefs, and self-ascribed traits”. Where we have behaved in a particular way in the past, or expressed strong views about a situation, there is a motivation to behave in a way consistent to that in the future. Again, I suspect cognitive dissonance plays a strong role in what I sometimes describe when teaching as a ‘homeostasis of the self’. If we have done something a certain way for a long time, then we tend to believe that those behaviours were correct.

Applying normative influence

Psychologists have attempted to apply normative influence in order to promote pro-social behaviour. For example, Schultz et al (2008) used normative messages in order to encourage hotel guests to conserve energy. An example of one of these messages:

Schultz towels

Source of image

This study is interesting as it appears to show that merely trying to change attitudes (by providing information about the importance of energy conservation) appeared to have little effect on behaviour. The presence of a ‘normative message’ along with this information appeared to have a much stronger effect on the behaviour of guests.

Normative messages have also been used to try to reduce alcohol consumption amongst US students. For example, Borasi and Carey (2001) reviewed various social influence strategies used to encourage moderation in drinking and reported that in some cases normative messages about drinking led to reduced self-reported alcohol consumption. They suggest there are a range of cognitive factors related to perceived norms which can influence behaviour.

  • Descriptive and injunctive norms: “a student will match the drinking they perceive other students doing (descriptive norm) and approving of (injunctive norm)”
  • Pluralistic ignorance: ‘‘individuals assume that their own private attitudes are more conservative than are those of other students, even though their public behavior is identical’’
  • Attribution theory: “the student observes others drinking heavily, it is assumed that such excessive use is typical, resulting in elevated norms”

A combination of these processes leads to exaggerated norms for drinking, which then perpetuate themselves when new students observe others drinking heavily. This has led to researchers attempting to use messages based on descriptive and/or injunctive norms to try to correct this exaggerated view of acceptable drinking. In the review, Borasi and Carey point to a number of successful attempts to reduce self-reported alcohol consumption using descriptive and injunctive normative messages.

Applying normative influence in schools

To a great extent, schools have always tried to create social norms within their institutions to support a positive classroom climate. Either through explicit messages like ‘school rules’ or through implicit mechanisms like ‘ethos’ or ‘traditions’ – schools attempt to separate their institutions from the ‘mundane world’ outside their gates.

For example, Martin Robinson is one education writer who explicitly makes this observation. In Practise Teaching, Teaching Practice: Ritual for instance, Martin writes:

“Whether the atmosphere you create in your classroom is like that of a church where children worship at the altar of knowledge or nearer to that of a high powered office where children come to work efficiently on administrative tasks, the ritual of the classroom is something that is unique to your teaching and the children’s experience of studying with you.”

Long established schools, whether in the state or independent sector, are often remarkable for their extensive lineage of school traditions and small rites and practices which mark the ‘other worldliness’ of their institutions. Some private schools provide an almost ‘cloistered’ atmosphere (quite literally in some cases given their historical origins) which helps create the impression that you are entering a world that in some ways is very separate from everyday life. Schools use a wide variety of techniques to create a strong sense of social norms specific to their institution: School uniforms are perhaps the most common and most visible strategy.

In social learning theory, Albert Bandura  suggests that whilst we learn through vicarious reinforcement (e.g. observing others being rewarded and imitating that behaviour) we also form a set of ‘mental representations’ of acceptable behaviour specific to a social environment which regulates how we act. It seems likely that these traditions, small rituals and changes in dress all act as cues which facilitate behaving within a set of pro-social normative influences within the environment of the school.

One of the difficulties for many schools is how to create this strong sense of pro-social norms within the institution so that anti-social behaviours (e.g. bullying) are not imitated. Indeed, it’s possible that low-level disruption in lessons are similarly occasions where that set of desirable norms have failed to inhibit unhelpful behaviour. There’s not much empirical evidence looking specifically at this question, but there is some support from a recent study of a successful anti-bullying programme:

Paluck, Shepherd and Aronow (2015) relate a study which attempted to test the idea that children attend to the behaviour of their peers to build a sense of what is socially normative and modify their own behaviour in response. They randomly allocated an anti-conflict intervention across 56 schools with 24,191 students – but what’s really interesting is that they measured every school’s social network, before randomly selected ‘seed groups’ of students and assigning them to an intervention that encouraged a public stance against conflict at school. They found that treatment schools reported fewer disciplinary problems compared to the control group. Furthermore the effect was stronger where these ‘seed groups’ contained more socially connected students.

They concluded that students pay particular attention to the behaviour of certain individuals in their community, as they infer which behaviours are socially normative and adjust their own behaviours accordingly. This offers some interesting ways forward with research examining how behavioural climates are produced and changed.

Classroom routines

Another example, I propose, of where normative influence has been exploited to improve behaviour comes from Doug Lemov’s observations of effective teachers. In ‘Teach like a Champion’, Lemov identifies a set of classroom routines which, he suggests, work together to create a positive classroom culture.

To me, the genius of this is that rather than try to promote a positive culture through psychological or social manipulation of attitudes or beliefs (c.f. Growth Mindset), Lemov focuses on creating a strong set of social norms based on simple, visible behaviour routines. Schools often try to sell education through trying to change attitudes, for example inspirational talks or aspirational values, but whilst these messages may be effective for some students, many will merely ‘talk the talk’ rather than ‘walk the walk’. By encouraging a uniform set of simple behavioural rituals, I suspect cognitive dissonance does the rest – ‘If I SLANT in a lesson, it’s because learning is really important to me’.

The success of Lemov’s system probably stems from its simplicity and uniformity. However, therein also lays the controversy. For some teachers, it denies practitioners the chance to discover effective systems for themselves which reflect their unique personality and approach to practice. For others, the concern is that the uniformity of behaviour threatens to suppress behaviours vital to normal mental and physical development. For example, from Sue Cowley:

“But it is what I can’t see that really worries me, because these are children. Where is the choice, the fun, the flitting, the wriggling, the laughter, the joy, the sensitivity, the nuance, the playful interactions, the movement, the gradually developing self-regulation?”

Personally, I find it difficult to believe that even very uniform behavioural expectations would have a negative impact on children – after all, school forms only part of a child’s life and there are many opportunities in everyday life to wriggle and muck about like children. Proponents of these systems might also reasonably argue that the purpose is not to suppress creative or imaginative teaching – but to allow teachers to focus that creativity on their actual teaching rather than battling for control of the classroom.

It seems quite likely that using uniform behavioural routines will promote a strong normative influence to support a positive classroom culture. However, I do think there are interesting questions arising from this debate – is it using a sledgehammer to crack a nut? Some empirical questions for me are:

  • Are some of these routines doing more ‘work’ than others?
  • Are all of these routines strictly necessary?
  • Is the degree of uniformity, whilst clearly effective, necessary?
  • Are there effective (perhaps even more effective) alternative routines to the ones Lemov suggests?

Teasing out what it is about these routines which make them effective is an important research task, in my opinion.

Posted in Psychology for teachers | Tagged , , , , , , , | 12 Comments

The psychology of behaviour management (part 2)

A frequent observation in schools is that the same children tend to end up in detention over and over again. The belief that ‘punitive’ approaches to school discipline were proving ineffective or even counter-productive has led to an interest in ‘restorative’ practice approaches. These approaches appear strongly influenced by ‘positive psychology’ and frequently also import ideas from a variety of therapeutic disciplines like cognitive behavioural therapy (CBT).

Part 2: Restorative practice approaches

The roots of this behaviour management strategy are ‘restorative justice’ programmes arising from criminology. Difficult to define and frequently implemented under a variety of different names, restorative justice is sometimes typified as a compromise position in the ‘rehabilitation vs retribution’ debate. A meta-analysis by Latimer, Dowden and Muise (2005) offered the following definition:

“Restorative justice is a process whereby all the parties with a stake in a particular offence come together to resolve collectively how to deal with the aftermath of the offence and its implications for the future”

The focus of these approaches is to repair the harm caused by the criminal act, so that the victim and the offender have an opportunity to discuss the event and decide appropriate reparations for the offence. In the meta-analysis, the authors find that victim and offender satisfaction tends to be higher using this approach than when using the traditional justice system, and offenders more likely to complete restitution agreements and less likely to reoffend.

The reported success of these programmes led to similar systems, often influenced by therapeutic models, being imported into schools. Once again, the principles behind ‘restorative practice’ are difficult to define and operate under a wide variety of names, but are often typified as a compromise position between authoritarian and laissez-faire disciplinary systems.

Social discipline window

Source of image: based on McCold and Wachtel (2003)

The International Institute for Restorative Practices offers the following as a ‘unifying hypothesis’ of restorative practices:

“human beings are happier, more cooperative and productive, and more likely to make positive changes in their behavior when those in positions of authority do things with them, rather than to them or for them.”

Positive psychology

Positive psychology arose out of the ‘Humanistic approach’ developed by psychologists like Abraham Maslow and Carl Rogers who developed theories around human happiness and helping people to thrive or reach their potential. Positive psychology was a term probably coined by Maslow, but has become strongly associated with the work of Martin Seligman – its philosophy essentially the same, to understand the nature of human happiness and well-being.

Applied within education, this approach tends to focus upon how schools can promote positive emotions and relationships, engagement and a meaningful sense of purpose, and positive goals leading to accomplishment. Seligman suggests these form five distinct elements – summarised by the acronym PERMA:

  • P Positive Emotion
  • E Engagement
  • R Positive Relationships
  • M Meaning and Purpose
  • A Accomplishment

We see the influence of positive psychology in all sorts of areas of education: For example, the idea of ‘teaching for happiness’ or ‘teaching mindfulness’ and many of the ideas underpinning ‘character education’.  There appears to be a clear influence of positive psychology in opposition to more behaviourist ideas within restorative practices applied within schools. For example, Hendry, Hopkins and Steele summarise the differences in Restorative Approaches in Schools in the UK:

Restorative approach vs authoritarian approach

The goals are identified as developing positive relationships between the teacher and student; encouraging empathy and creating a sense of safety and trust where both parties can express their thoughts, feelings and needs; encouraging self-actualisation and optimistic beliefs about personal development; and supporting individual and shared responsibility. The main empirical claim appears to be:

“Schools that consciously focus the bulk of their effort on building and maintaining relationships will find that fewer things will go wrong and so there will be fewer occasions when relationships need to be repaired.”

However, within academic psychology the ‘positive psychology’ approach has faced significant criticism. For example, the abstract for Alistair Miller (2008) paper “A Critique of Positive Psychology— or ‘The New Science of Happiness’” summarises many of the problems:

“This paper argues that the new science of positive psychology is founded on a whole series of fallacious arguments; these involve circular reasoning, tautology, failure to clearly define or properly apply terms, the identification of causal relations where none exist, and unjustified generalisation. Instead of demonstrating that positive attitudes explain achievement, success, well-being and happiness, positive psychology merely associates mental health with a particular personality type: a cheerful, outgoing, goal-driven, status-seeking extravert.”

Cognitive-behavioural therapy

An alternative psychological foundation for restorative practice has been cognitive behavioural therapy (CBT) often combined with elements of other therapeutic programmes (e.g. Solution-Focused Therapy).

The focus in CBT is to identify and change patterns of thinking or beliefs which underlie behaviours which are unhelpful to the individual. It’s often typified as a problem-solving therapeutic approach – finding ways to better cope with ‘here-and-now’ practical problems (rather than say childhood experiences).

Albert Ellis developed some of the core principles involved in CBT back in the 1950’s and 60’s. Rational Emotive Behavioural Therapy (REBT) emphasises the role of ‘faulty thinking’ (an individual’s interpretation or view of an event or situation) which gives rise to emotional distress and subsequent unhelpful behaviours (e.g. avoidance coping). This makes some intuitive sense to many teachers. For example, a student faced with an impending exam may believe that they will fail regardless of what they do, so they find ways to distract themselves from this anxiety (e.g. procrastination) and fail to revise for the exam.

In restorative practice, these elements of CBT tend to involve encouraging the student to relate their offending behaviours to the thoughts and feelings which caused them. By exploring alternatives to the way the student interpreted an event and emotionally reacted to it, the idea is that the student finds better ways to respond to these events in future. For example, Writing Wrongs is a restorative approach for use in schools which explicitly draws upon ideas based on CBT to encourage students to reflect upon the causes and consequences of their behaviour.

Whilst optimistic claims were initially made for the efficacy of CBT as a treatment for mental illness, much of the empirical evidence supporting these has come into question – not least because, along with other forms of psychotherapy, there’s no easy way to create a ‘double-blind’ arrangement within randomly controlled trials and this means that results may be influenced by bias. A recent meta-analysis suggests that effect sizes for CBT outcomes has been steadily declining since the 1970s, implying that sources of bias may have given a distorted view of its efficacy.

Do restorative approaches work?

It is almost impossible to give an empirical answer to this question. Case studies appear to provide very positive evaluations for programmes. For example, Littlechild and Sender (2010) found evidence from interviews that students and staff at four residential homes for young people with developmental and physical disabilities gave very positive evaluations of restorative justice.  However, data from police call-outs was more mixed. They note that one unit had an increase in call-outs and caution that the decrease in call-outs at the three other units was not necessarily due to the introduction of restorative justice.

One area where more systematic evidence is available is the success of anti-bullying programmes, many of which use restorative justice principles. For example, restorative approaches are commonly used in conjunction with sanctions within secondary schools to tackle bullying. A report for the DFE “The Use and Effectiveness of Anti-Bullying Strategies in Schools” (Thompson and Smith, 2011) examined the range of practices used in schools and attempted some evaluation of their effectiveness. They broadly defined restorative approaches as:

“Restorative approaches work to resolve conflict and repair harm. They encourage those who have caused harm to acknowledge the impact of what they have done and give them an opportunity to make reparation. They offer those who have suffered harm the opportunity to have their harm or loss acknowledged and amends made.”

They found that over two-thirds of schools used some form of restorative practice in tackling bullying and that these approaches were recommended by the majority of local authorities above the use of sanctions. The survey reported that 97% of both primary and secondary schools rated restorative approaches as effective in reducing bullying, with high proportions of both school types rating them as cost effective and easy to implement. Small group discussions (circles) were the most common approach in primary schools (96%) whereas some form of restorative discussion was the most common in secondary schools (90%).

So, these kinds of anti-bullying programmes are popular and perceived to be effective. Beyond case studies, however, is there much evidence to support their adoption in schools? The fact that restorative programmes tend to be mixed in with sanctions makes it difficult to pick apart whether these programmes are effective as practised in schools. Historically, the evidence supporting the general effectiveness of anti-bullying programmes is mixed. For example, a meta-analysis by Ferguson et al (2007) examined the effectiveness of school-based anti-bullying programmes. One issue they report with the available research was publication bias (sometimes called the ‘file draw effect’) where studies which obtain some statistical significance are more likely to be published than studies which are non-significant.  Thus, while the meta-analysis yielded an overall ‘significant effect’, the very small overall effect sizes led them to conclude that “school-based anti-bullying programs are not practically effective in reducing bullying or violent behaviors in the schools”.

More positive outcomes were reported in a meta-analysis by Ttofi and Farrington (2011). They suggested that significant reductions in bullying tended to be associated with more intensive programs, programs including parent meetings, firm disciplinary methods, and improved playground supervision. However, work with peers (including things like peer mediation, peer mentoring, and encouraging bystander intervention) was associated with an increase in victimization. They recommend that work with peers (arguably a central feature of restorative practice models) should not be used.

Despite mixed and sometimes disappointing evidence of effectiveness with regard to bullying, the popular perception of restorative practice has led some schools to implement these sorts of programmes as whole-school behaviour management systems. It’s hard to define this approach, but typically they involve facilitated discussion between the teacher and student about low-level disruption in lessons in place of – though sometimes in addition to – a direct sanction. Again, there are many case studies reporting positive effects for these programmes, but systematic quantitative evidence is thin on the ground. For example:

“”We’ve shown in case study after case study that schools that adopt this approach report significant changes in their cultures,” said Dr. Paul McCold, researcher and founding faculty member of the International Institute for Restorative Practices (IIRP) graduate school. “What’s needed now is solid quantitative research.””

There are evidently many problems when trying to implement restorative practice programmes in schools. David Didau identifies this problem in his list of psychological principles for teachers:

The biggest problem with restorative justice is that it often becomes a blunt and clumsy stick. The culprit’s needs are often placed over those of the victim. A victim may not want a relationship to be restored and this should never be imposed.”

This becomes even more of an issue when such programmes are used for issues of low-level disruption. In my experience, it can sometimes be successful (e.g. where the student genuinely accepts they were in the wrong and is keen to make amends). However, I suspect the same students who ended up in detention all the time simply end up in endless ‘conflict resolution discussions’ instead. I’ve experienced many occasions where the student isn’t prepared to accept any responsibility or – more difficult still – tries to manipulate the discussion to appear the ‘victim’. If a student merely goes through the motions and isn’t really interested in taking responsibility for their actions, there’s the risk that such systems may inadvertently undermine good behaviour.

Lastly, there are some psychologists who are deeply concerned about the therapeutic frameworks being imported from positive psychology and CBT into schools. In ‘The Dangerous Rise of Therapeutic Education’, Katherine Eccelstone warns that these approaches risk developing students into anxious and self-preoccupied individuals, undermine parental and teacher authority, and represent a diminished view of human potential.

Posted in Psychology for teachers | Tagged , , , , | 16 Comments

The psychology of behaviour management (part 1)

The topic of behaviour management and the problems teachers face in dealing with disruption to lessons continues to evoke strong argument within the profession. The extent of the problem was explored in a 2014 paper by Terry Haydn which argued that whilst ‘official’ reports like Ofsted inspections appeared to rate behaviour as at least ‘satisfactory’  the majority of schools, there was evidence that deficits in classroom climate continue to be a serious and widespread problem. Examples of blogs detailing the sorts of issues in school approaches to behaviour are plentiful (an excellent example from Andrew Old can be found here).

Systems of rewards and punishments have long been the norm in schools but perhaps because of a growing feeling that behaviour has become increasingly difficult to manage, behaviour management has become the focus of experimentation. Some schools have started looking for novel solutions to the problem of disruption in lessons (e.g. Kilgarth school in Birkenhead was recently reported to have ‘banned’ punishment altogether). Whereas, others believe that proportionate sanctions need to be available to teachers as a deterrent (e.g. Tom Bennett urging “schools to bring back detention”). In June last year, the government set up a working party, led by Tom Bennett to develop better training for new teachers and showcase effective practices in schools. For an example of Tom’s approach, there’s a nice practical guide to managing difficult behaviour recently published by Unison.

One controversial approach has been to move schools away from systems of reward and punishment towards a ‘Restorative Justice’ approach. Originally developed within the context of police work, the idea of restorative practice involves conversations between ‘offender’ and ‘victim’ or the teacher and student to give an opportunity to discuss how they have been affected by events and to decide what should be done to move forward. There are claims that this approach can improve behaviour and results, but critics argue that such policies are making schools less safe. Whilst not always explicitly linked, many of the processes appear to draw upon techniques used in cognitive behavioural therapy (CBT). For example, ‘Restorative Thinking’ is a team that work with schools to implement school restorative practices that make the link to CBT and other forms of therapy explicit.

Another controversial approach has come from Doug Lemov’s ‘Teach Like a Champion’. Lemov’s approach involves using standardised routines to create a positive classroom climate.  The system has sparked considerable interest in the UK, but also many critics. Perhaps most notable amongst these critics is Sue Cowley, author of ‘Getting the buggers to behave’ who recently condemned* an example of this approach as “a kind of ‘Pavlov’s Dogs’ approach to education”.

(*Edit – However see Sue’s comment below)

Most teachers likely already use some combination of these various approaches, but teachers may not be aware of the psychological theories and practices which they are (implicitly or explicitly) based upon. Over a short series of blogs, I want to briefly explore these psychological underpinnings in the hope they help explain some of the advantages and limitations of each system.

Part 1: Behaviourism

“Behaviourist” is sometimes used in a pejorative way when describing behaviour management systems, but schools using some sort of system for rewarding or sanctioning behaviour are implicitly using a behaviourist approach.

Behaviourism was a term coined by John Watson in an article published in 1913, but its roots go back to the famous studies by Ivan Pavlov (who discovered Classical conditioning as an accidental side-line to his Nobel Prize winning research on digestion). However, the behaviourist most associated with education is B. F. Skinner. Much misunderstood, and often unfairly maligned, his theory of operant conditioning continues to influence schools to this day.

BFSkinner pic

Source of image

Drawing on the earlier work of Edward Thorndike, Skinner developed his theory of operant conditioning by exposing animals like rats and pigeons to carefully controlled stimuli and recording their responses (what’s often referred to as a ‘Skinner box’).  Skinner identified a variety of techniques which could be used to shape animal behaviour and wrote about how these might be applied to human behaviour (and education specifically).

The core idea within operant conditioning is reinforcement and punishment. Very simply, when an animal receives reinforcement after performing a behaviour they are more likely to repeat that behaviour. Conversely, receiving a punishment after performing a behaviour leads the animal to be less likely to repeat that behaviour in future. Skinner further described reinforcements and punishments as being ‘positive’ or ‘negative’ in character.

reinforcement and punishment grid


Skinner’s rather harsh reputation means that many teachers are surprised to discover that he was very much against the use of punishment in schools. Skinner believed that one of the major disadvantages of punishment is that, even where it is consistently applied, it merely temporarily suppresses an undesirable behaviour.

“Severe punishment unquestionably has an immediate effect in reducing a tendency to act in a given way. This result is no doubt responsible for its widespread use. We “instinctively” attack anyone whose behavior displeases us —perhaps not in physical assault, but with criticism, disapproval, blame, or ridicule. Whether or not there is an inherited tendency to do this, the immediate effect of the practice is reinforcing enough to explain its currency. In the long run, however, punishment does not actually eliminate behavior from a repertoire, and its temporary achievement is obtained at tremendous cost in reducing the over-all efficiency and happiness of the group.”

Science and Human Behaviour, p190.

Contrary to his rather cold, clinical popular reputation, Skinner was a compassionate humanitarian (he won The American Humanist Association’s “Humanist of the Year” award in 1972) who wanted science to help shape a better society by utilising rewards rather than punishment in order to promote pro-social behaviour. I suspect he’d have approved of Kilgarth school’s decision to ‘ban’ punishment, for instance.

However, the issue around the effectiveness of punishment is rather more complex than Skinner believed. For example, a fascinating meta-analysis by Balliet and Van Lange (2013) examined whether punishment was more effective at promoting cooperation in high or low trust societies. They reviewed 83 studies involving 7,361 participants across 18 societies and found a rather surprising conclusion: Punishment appears to effectively promote cooperation in societies with high trust. In essence, they argue that where there is a great deal of trust, members of a society adhere to norms that encourage both cooperation and the punishment of those who defy cooperative social norms. Punishment is less effective in societies where there is a lack of trust: They argue that social norms may be less strongly shared and enforced and so punishment may be less effective in these societies.

“A willingness to pay a cost to punish others, especially noncooperative others, is likely to be viewed as a strong concern with collective outcomes. At the same time, such benevolent views of costly punishment may be more likely to occur in societies that contain higher amounts of trust in others, which we conceptualized earlier in terms of beliefs about benevolence toward the self and others.”

An important question for future research is whether ‘benevolent punishment’ is as effective at an organisational level (e.g. a school) as it appears to be at a society level. However, the implication would be that in benevolent, high-trust environments the proportionate use of punishment to support cooperative social norms can be effective.

Another reason why punishment may be effective is a phenomenon called ‘loss aversion’. The work of Tversky and Kahneman suggests that there is an asymmetry between the effects of positive reinforcement and negative punishment – in that where people weigh up similar gains and losses; people tend to prefer avoiding losses to making gains. For example, Hackenberg (2014) Token Reinforcement: A Review And Analysis, reports an experiment where the value of a loss was worth approximately three times more than a gain. It seems highly likely that this effect might also apply to the sorts of token reward systems employed in schools; suggesting that negative punishment (e.g. loss of merits) may be more motivating than opportunities to gain merits.


Skinner believed that rewards were the most effective way of shaping behaviour and focused a great deal of his research attempting to find out the most effective patterns of reinforcement. In his ‘Skinner box’ experiments, he was able to carefully control the ‘schedule of reinforcement’ and measure the concomitant changes in the desired behaviour.

schedules of reinforcement

Intuitively, teachers see the need for consistency where punishments are applied and I’ve sometimes heard teachers argue that rewards should be given with equal consistency. However, Skinner’s work on ‘schedules of reinforcement’ appears to show that such systems tend to be relatively ineffective. The problem with systems seeking high consistency in rewarding students is that whilst the student’s behaviour may be swiftly modified, the desirable behaviour may become highly contingent upon the presence of the reward. The odd thing about rewards is that they appear to work better when they are slightly unpredictable. A simple summary of these differences:

schedules of reinforcement 2

In Skinner’s experiments, the extinction rates (the rate at which the desired behaviour stopped being performed) was quickest where there was continuous reinforcement (i.e. a reward given for every time the behaviour was performed). Where there was variability in the time interval or ratio, then the behaviour persists for longer in the absence of reinforcement. Skinner believed this represents the ‘power’ of the slot machine. The fact that playing it is unpredictably rewarded by a pay-out encourages the person to continue playing – even where they hit a long streak of losing.

In schools, sometimes these reward systems take on the structure of a ‘token economy’ (systems also used in prisons and psychiatric units – where individuals earn tokens for ‘good behaviour’ which can be used to purchase privileges). However, whilst explicit reward schedules have been used with children (e.g. children with ADD or Autism for example), reward systems have a number of problems which often undermines their use in schools.

One issue is ‘satiation’ – particularly older children rapidly lose interest in the tokens (e.g. merit stickers) or even primary reinforcers (e.g. sweets) that teachers hand out for desirable behaviour. I recall a student teacher handing out sweets to reward year 10 students for answering questions in class. Many of the students took part, but I noticed one lad sat there scowling with his arms crossed. Chatting to him, it was clear he knew many of the answers so I asked why he wasn’t putting his hand up – he said, “What’s the point? I can just buy my own sweets if I want them”. This problem often leads into what I call ‘reward inflation’ as teachers either have to constantly find novel rewards or end up handing out more and more tokens to elicit the same desirable behaviour.

Another issue is that reinforcement can have negative effects. It’s devilishly hard in a class of 30 students to accurately assess how much effort students have genuinely put into their class or homework. Giving praise or a merit for work which actually required little effort may inadvertently imply that you have low expectations of that student.

Lastly, children aren’t stupid. They rapidly learn when they are being manipulated by a reward system and sometimes manage to turn the tables on the teacher by learning to manipulate the criteria used to elicit a reward. I knew one teacher who, in an attempt to tame a particularly difficult class, had managed to trap themselves into handing out 4 or 5 merits to a number of the most naughty children every lesson.

Two great articles by Daniel Willingham further explore some of these problems: Should Learning Be Its Own Reward? and How Praise Can Motivate—or Stifle. At the end of this second article, Willingham summarises the way a teacher’s most common form of positive reinforcement – praise – might best be utilised:

“Praise should be sincere, meaning that the child has done something praiseworthy. The content of the praise should express congratulations (rather than express a wish of something else the child should do). The target of the praise should be not an attribute of the child, but rather an attribute of the child’s behavior.”

In summary

Whilst the term ‘behaviourist’ is used in a pejorative way by some teachers, Skinner desired his research to be used to create societies where reinforcement was used to encourage people to do the right thing, rather than punishment. There’s an enormous amount schools could potentially learn from the classic works on operant conditioning and ways to run token economies (which most school reward systems tend to form).

However, there are some interesting reasons why some of Skinner’s ideas may need updating. ‘Benevolent’ punishment and negative punishment (which may tap into our innate loss-aversion bias) may in some cases be equally or more effective than rewards (so long as they are deserved but a little unpredictable). Both potentially can be used to effectively support behaviour in schools.

In the next post in this series, I’m going to take a similar look at the topic of ‘restorative practices’ and some of the ideas from cognitive-behavioural therapy which underlie many of the systems used in schools.


Posted in Psychology for teachers | Tagged , , , , , , | 18 Comments