“‘Focusing on teacher quality’ … sounds agreeable, but much of the detail will be impossibly difficult to work through.
… there is serious ideological disagreement within education as to what a high-quality teacher looks like.”
He identifies two major pitfalls: First, that the bar for subject knowledge is set too low and there are insufficient rewards to encourage individuals with excellent subject knowledge into comprehensive schools. Second, that politicians underestimate the extent to which ideological arguments about the values and purpose of education confound attempts to train teachers in effective pedagogy.
Any attempt to reward good teachers, or increase training for teachers, raises questions about who will judge what a “good teacher” is and what teachers should be trained in.
As it happens, the day I read his comment piece, another article caught my eye which appeared to underline his points.
Studies Highlight Complexities of Using Value-Added Measures published in Education Week, discussed several recently published papers which pointed to the problems of assessing teacher quality. I’m going to comment on the findings of three of them:
Would we know quality teaching if it bit us … ?
The first paper, What Instructional Alignment as a Measure of Teaching Quality (Polikoff and Porter, 2014) was particularly interesting to me as I’ve been piloting one of the measures they used (the MET Tripod student survey) as a coaching tool this year.
The study reported an attempt to measure the extent to which instructional alignment (broadly – how well curriculum, instruction and assessment match up) and teaching quality (as measured by a combination of the Tripod survey and structured observations) correlated with value-added scores (on State and supplementary tests).
The results are quite surprising:
“Overall, the results are disappointing. Based on our obtained sample, we would conclude that there are very weak associations of content alignment with student achievement gains and no associations with the composite measure of effective teaching.”
They point to three possible interpretations for these disappointing results:
“One interpretation of this conclusion is instructional alignment and pedagogical quality are not as important as standards-based reform theory suggests for affecting student learning.”
Which suggests that trying to improve teachers’ instructional methods doesn’t improve student learning. This implies that ‘good teaching’ doesn’t especially improve grades!
“A second interpretation is that instructional alignment and pedagogical quality are as important as previously thought, but that the SEC and FFT do not capture the important elements of instruction.”
This isn’t quite as damning as the first – but it represents an enormous blow to accountability systems. It implies that we simply can’t measure the things about teaching that improve student progress through (fairly reliable and structured) observations and (reliable and validated) student surveys.
“A third interpretation of our findings is that the tests used for calculating VAM are not particularly able to detect differences in the content or quality of classroom instruction.”
Which is probably the more comforting conclusion as it suggests that we can identify something about quality teaching but we can’t measure it using value-added scores. This has ramifications for teacher appraisal systems in the UK. If fine-grained VAM scores cannot establish the quality of a teacher, then how can the relatively simplistic targets based on 3 or 4 levels of progress possibly be fair?
Grow, damn it child … grow!
The second paper, Teacher Effects on Student Achievement and Height: A Cautionary Tale (Bitler et al) was the most amusing article I’ve read since fMRI got slapped with a dead fish.
As a test of VA modelling techniques they decided to use such models to see what effect teachers had on their students’ heights. What was shocking was that teachers appeared to influence the height of their students almost as much as English and maths scores.
“We find that—simply due to chance—teacher effects can appear large, even on outcomes they cannot plausibly affect. The implication is that many value-added studies likely overstate the extent to which teachers differ in their effectiveness, although further research is needed. Furthermore, users of [value-added measures] should take care to ensure their estimates reflect … teacher effectiveness and are not driven by noise.”
This seems strong evidence that even using fairly sophisticated value-added data is too noisy to reliably judge teacher quality – and this makes assessing / rewarding teacher quality through use of data appear arbitrary. Not a great way to attract or retain the best teachers!
The last article, Teacher heterogeneity, value-added and education policy (Condie et al), looks at two simulations of policies intended to raise teaching quality in schools.
In the first model, they simply fired the bottom 10% of teachers (as measured by value-added scores). Average student test scores rose as expected. However, then they looked at what would happen if, instead of firing the ‘failing teachers’ in the system, they matched teachers to the subjects and groups of students where they obtained the best value-added scores. They found that average test scores rose … in fact, the increase far exceeded the first model.
“Our results suggest that employers might realize greater gains by increasing the specialization of their employees’ tasks rather than attempting to replace them with hypothetically better employees,”
This suggests that PRP and heavy handed accountability – whilst they sound lovely and tough-talking in the papers – are not the best ways to raise the quality of teaching. Especially since:
“past research suggests that the pool of potential replacements is, at least on average, of lower quality than the pool of current teachers.”
Indeed, the harsh new accountability system for judging teacher quality might actually lower the standards in schools – especially those schools in vulnerable areas which have trouble recruiting.
Improving the quality of teaching in schools
Andrew Old is right that we need a better ways of defining what ‘quality’ is before we can realistically set about improving it. The first paper appears to imply that either we still don’t know enough about what drives quality teaching and/or we simply can’t measure it.
The second paper should be a further warning about the use of data as an accountability stick with which to beat teachers. How can PRP (let alone licensing) work in a system where a teacher’s performance in helping children do well in exams is almost indistinguishable from a teacher’s influence on their height?
The last paper actually leaves us with a positive way forward. It suggests that if we want to raise the quality of schools, we might be better focusing on finding a best-fit between a teacher and their subject / KS cohort specialism than heaping tougher accountability processes onto teachers and schools. Developing that specialism of teachers – finding where they have a positive impact and developing their subject knowledge – might really improve the quality of teaching.