What really improves teacher quality?

In an interesting article for the Fabian Society, Andrew Old discusses the problems inherent in the desire to raise the quality of teaching.

“‘Focusing on teacher quality’ … sounds agreeable, but much of the detail will be impossibly difficult to work through.
… there is serious ideological disagreement within education as to what a high-quality teacher looks like.”

He identifies two major pitfalls: First, that the bar for subject knowledge is set too low and there are insufficient rewards to encourage individuals with excellent subject knowledge into comprehensive schools. Second, that politicians underestimate the extent to which ideological arguments about the values and purpose of education confound attempts to train teachers in effective pedagogy.

Any attempt to reward good teachers, or increase training for teachers, raises questions about who will judge what a “good teacher” is and what teachers should be trained in.

As it happens, the day I read his comment piece, another article caught my eye which appeared to underline his points.

Studies Highlight Complexities of Using Value-Added Measures published in Education Week, discussed several recently published papers which pointed to the problems of assessing teacher quality. I’m going to comment on the findings of three of them:

Would we know quality teaching if it bit us … ?

The first paper, What Instructional Alignment as a Measure of Teaching Quality (Polikoff and Porter, 2014) was particularly interesting to me as I’ve been piloting one of the measures they used (the MET Tripod student survey) as a coaching tool this year.

The study reported an attempt to measure the extent to which instructional alignment (broadly – how well curriculum, instruction and assessment match up) and teaching quality (as measured by a combination of the Tripod survey and structured observations) correlated with value-added scores (on State and supplementary tests).

The results are quite surprising:

“Overall, the results are disappointing. Based on our obtained sample, we would conclude that there are very weak associations of content alignment with student achievement gains and no associations with the composite measure of effective teaching.”

They point to three possible interpretations for these disappointing results:

“One interpretation of this conclusion is instructional alignment and pedagogical quality are not as important as standards-based reform theory suggests for affecting student learning.”

Which suggests that trying to improve teachers’ instructional methods doesn’t improve student learning. This implies that ‘good teaching’ doesn’t especially improve grades!

“A second interpretation is that instructional alignment and pedagogical quality are as important as previously thought, but that the SEC and FFT do not capture the important elements of instruction.”

This isn’t quite as damning as the first – but it represents an enormous blow to accountability systems. It implies that we simply can’t measure the things about teaching that improve student progress through (fairly reliable and structured) observations and (reliable and validated) student surveys.

“A third interpretation of our findings is that the tests used for calculating VAM are not particularly able to detect differences in the content or quality of classroom instruction.”

Which is probably the more comforting conclusion as it suggests that we can identify something about quality teaching but we can’t measure it using value-added scores. This has ramifications for teacher appraisal systems in the UK. If fine-grained VAM scores cannot establish the quality of a teacher, then how can the relatively simplistic targets based on 3 or 4 levels of progress possibly be fair?

Grow, damn it child … grow!

The second paper, Teacher Effects on Student Achievement and Height: A Cautionary Tale (Bitler et al) was the most amusing article I’ve read since fMRI got slapped with a dead fish.

As a test of VA modelling techniques they decided to use such models to see what effect teachers had on their students’ heights. What was shocking was that teachers appeared to influence the height of their students almost as much as English and maths scores.

“We find that—simply due to chance—teacher effects can appear large, even on outcomes they cannot plausibly affect. The implication is that many value-added studies likely overstate the extent to which teachers differ in their effectiveness, although further research is needed. Furthermore, users of [value-added measures] should take care to ensure their estimates reflect … teacher effectiveness and are not driven by noise.”

This seems strong evidence that even using fairly sophisticated value-added data is too noisy to reliably judge teacher quality – and this makes assessing / rewarding teacher quality through use of data appear arbitrary. Not a great way to attract or retain the best teachers!

You’re Fired!

The last article, Teacher heterogeneity, value-added and education policy (Condie et al), looks at two simulations of policies intended to raise teaching quality in schools.

In the first model, they simply fired the bottom 10% of teachers (as measured by value-added scores). Average student test scores rose as expected. However, then they looked at what would happen if, instead of firing the ‘failing teachers’ in the system, they matched teachers to the subjects and groups of students where they obtained the best value-added scores. They found that average test scores rose … in fact, the increase far exceeded the first model.

“Our results suggest that employers might realize greater gains by increasing the specialization of their employees’ tasks rather than attempting to replace them with hypothetically better employees,”

This suggests that PRP and heavy handed accountability – whilst they sound lovely and tough-talking in the papers – are not the best ways to raise the quality of teaching. Especially since:

“past research suggests that the pool of potential replacements is, at least on average, of lower quality than the pool of current teachers.”

Indeed, the harsh new accountability system for judging teacher quality might actually lower the standards in schools – especially those schools in vulnerable areas which have trouble recruiting.

Improving the quality of teaching in schools

Andrew Old is right that we need a better ways of defining what ‘quality’ is before we can realistically set about improving it. The first paper appears to imply that either we still don’t know enough about what drives quality teaching and/or we simply can’t measure it.

The second paper should be a further warning about the use of data as an accountability stick with which to beat teachers. How can PRP (let alone licensing) work in a system where a teacher’s performance in helping children do well in exams is almost indistinguishable from a teacher’s influence on their height?

The last paper actually leaves us with a positive way forward. It suggests that if we want to raise the quality of schools, we might be better focusing on finding a best-fit between a teacher and their subject / KS cohort specialism than heaping tougher accountability processes onto teachers and schools. Developing that specialism of teachers – finding where they have a positive impact and developing their subject knowledge – might really improve the quality of teaching.

This entry was posted in Education policy and tagged , , , , . Bookmark the permalink.

6 Responses to What really improves teacher quality?

  1. You make such excellent points here! As with so many other educational measures (what proficiency means, how much education actually costs), policy makers often assume that teaching can be measured as easily as the bottom line in an accounting book. However, those truly familiar with the difficulties of translating policy into practice understand that discerning discrete outcomes is rarely so simple in education. Thank you for highlighting yet another way that these complexities affect teaching and teacher training.


    • Yes, if fairly sophisticated and objective measures like VAM cannot reliably discern quality of teaching, we need to think about improving the system in less mechanical ways. Thanks for the kind words.


  2. dodiscimus says:

    Those are really interesting papers you have drawn our attention to – thank you. Possibly reading the third paper would answer this question but shortage of time leads me to cheekily ask you instead. Does the matching to best results analysis take account of the need for someone to teach every class without anyone being overloaded? If the majority of teachers are most effective with top sets, for example, that’s not going to work since there won’t be enough to go round. Also, is there any evidence that some schools already do this well, and benefit from better VA as a result?


    • Good questions. Yes, you’re right to highlight the likely practical difficulties involved in trying to manage teaching load and help teachers specialise. Especially in small schools this might be extremely difficult. Simulations of what happens to the data are certainly not the same as trying to actually implement the suggestion, but I’d say it supports the need for schools to work in close cooperation with one another – something large, urban academy chains likely have as an advantage over fairly isolated schools.

      In terms of generating high VA, you’ve bumped into the limitation of my understanding about how they are calculated in the US. I know they take prior attainment into account a lot more sensitively than the 3 or 4 levels of progress measures used in the UK – and the issue of who teaches ‘top set’ is therefore a much larger issue here. Henry Stewart makes this case well:


      The problem isn’t simply that top sets are much more likely to make ‘good progress’ – but also that children who come with KS2 levels of 5c, 4c or 3c are significantly less likely to make 3 levels than children who attained a or b sub-levels. As I said in the blog – if the more nuanced VA scores cannot reliably measure teacher quality, using the deeply flawed 3 levels of progress as a yardstick is immensely unfair.


  3. Pingback: Observations of teaching are probably biased | Evidence into Practice

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s