Meta-analysis in education: Some cautionary tales from other disciplines

Whilst meta-analysis is a potentially powerful tool, it’s not without its limitations. A good summary of some of the general issues with meta-analysis can be found here.

Within education, meta-analysis has recently become highly prominent. John Hattie’s work is probably the best known; though all draw upon the idea that by examining the effect sizes across a number of studies we can provide a summary of ‘what works’ within education. There’s little doubt that the methodology is here to stay; if medicine is anything to go by, its use will become more common over time. However, we should treat the results of meta-analysis with a healthy dose of professional scepticism.

Indeed, something of that scepticism is already starting to filter into the debate. Most notably, perhaps, Dylan Wiliams identifies a number of issues with Hattie’s analysis that are worth keeping in mind. There’s some good exploration of these criticisms in recent blogs by The Learning Spyand OllieOrange2. It seems a sensible suggestion to divide analysis more clearly into age bands. As Dylan Wiliam noted, effect sizes decline with age group investigated, an effect size of 0.3 might be unimpressive at primary age, but a fairly strong effect at secondary.

However, there are further issues with meta-analysis within medicine and psychology which, whilst I’ve not yet seen much discussion within education, could potentially become issues in the future. Here are a handful of recent articles relating some of these problems and suggestions as to how the field of education might learn from them.

Bias due to lack of blinding:

Within medicine, evidence has started to emerge that bias has had a distorting effect on the evaluation of treatment outcomes. One example of this are the particular problems associated with studies where the outcomes are not easy to measure objectively. For example, a recent meta-epidemiological study reported:

“Since bias associated with lack of adequate allocation concealment or lack of blinding is less for trials with objectively assessed outcomes than trials with subjectively assessed outcomes, efforts to minimise bias are particularly important when objective measurement of outcomes is not feasible

“Authors of systematic reviews, and those critically appraising trials, should routinely assess the risk of bias in results associated with the way each trial was done. Such assessments should be outcome specific. The Cochrane Collaboration has recently formulated detailed guidance on how to do this (see

“Systematic reviewers should present meta-analyses restricted to trials at low risk of bias for each outcome, either as the primary analysis or in conjunction with less restrictive analyses”

Many, if not most, education studies involve at least some subjectivity in their outcome measures, therefore it would seem wise to consider the same advice; excluding trials which involve subjective outcome measures which do not involve blinding. Furthermore, it would seem a sensible measure to routinely assess the risk of bias using something similar to the Cochrane guidance.

The problem of small numbers:

As noted in this Scientific American blog, the strength of evidence varies from outcome to outcome:

“This is the most common trap people fall into about meta-analyses. They say (or think), “A meta-analysis of 65 studies with more than 738,000 participants found x, yand z.” But the answer to x may have come from 2 huge, high-quality studies with a lot of good data on x. The answer to y could have come from 48 lousy little studies with varying quality data on y. And the answer to question z….well, you get the picture.”

In medicine this has been recently identified as a problem as trials with smaller numbers of patients tend to produce larger treatment effects than larger studies. For example, this BMJ article examining meta-analyses of osteoarthritis trials reports:

“In this meta-epidemiological study of 13 meta-analyses of 153 osteoarthritis trials, we found larger estimated benefits of treatment in small trials with fewer than 100 patients per trial arm compared with larger trials.”

It’s interesting that Hattie finds no relationship between the number of studies within each meta-analysis and effect size – but it’s possible he asked the wrong question of the data. Are small scale studies (in terms of the number of pupils involved) more likely to produce larger effect sizes in education research as well?

The file draw effect:

A major issue in both medical and psychological meta-analysis research has been publication bias; the ‘file draw effect’. Not every study conducted, for a variety of reasons, makes it to publication – and this causes problems for the methodology. As the Cochrane Collaboration explains:

“Since published studies are more likely to be included in a meta-analysis than their unpublished counterparts, there is a legitimate concern that a meta-analysis may overestimate the true effect size.”

There are a number of statistical techniques which can be used to assess whether the outcomes of a meta-analysis have been distorted by the publication bias; for example funnel plots. Given that Hattie discovered so many of the studies he included had positive results, it might be argued that publication bias (of one sort or another) may be a significant issue in education research. Therefore, it seems reasonable that education meta-analysis studies assess and report whether this is the case.

Digging through the data:

A related issue to the file draw effect is the failure to pre-register studies before they are conducted. Whilst not a meta-analysis, an example of the problems this can cause comes from this critique of the UK Resilience project.

“Confirmation bias is common in presentation of results of test of interventions, especially when one of the developers of the intervention is among the authors or a consultant. To reduce the risk of bias, investigators are commonly required to preregister their design, including their plans for analysis of data. This commits investigators to a particular choice of outcomes and assessment points for evaluating the intervention.”

Whether or not the researchers pre-planned all the analysis they conducted is unfortunately hard to know. However, this problem isn’t confined to psychological interventions. is attempting to ensure the data from all past trials are published and researchers pre-register all future clinical trials. Here’s a quick list of what they want to see:

“What trial information needs to be registered and reported?
1. Registration
2. Summary results reporting
3. A full report
4. Individual patient data”

Given how easy it is to generate spurious findings by digging through the data until you find something ‘significant’ and the amount of money that can be wasted by investing in education claims that subsequently turn out to be hog-wash, it would seem sensible to insist on a similar convention within education research. This will also help cut down on the problems created by the file-draw effect – we’ll at least know how many studies are out there, even if they don’t end up being published.

Non-disclosure of financial interests:

Another recent article by PLOS related grave concerns that a recent meta-analysis of ‘triple P parenting’ contained significant flaws. What’s troubling was the apparently large financial interests involved, which were not declared:

“… this is a thoroughly flawed meta-analysis conducted by persons with substantial but undeclared, financial interests in portraying triple P parenting as “evidence-supported” and effective

“Journals have policies requiring disclosures of conflict of interest under the circumstances, but at least in the psychological literature, you can find only few examples that these policies are enforced.”

Education is highly vulnerable to this sort of problem. There is undoubtedly a large sum of money that could be siphoned out of schools by organisations claiming to offer ‘evidence-based’ solutions to our problems. At the moment, however, what requirement or enforcement is there to disclose financial interests in education research?

The power of belief:

Before I became a teacher I was a post-graduate researcher working in the field of anomalistic psychology. The potential bias introduced into research by belief has long been apparent in this field– it’s even got a name; ‘the sheep-goat effect’.

This had led to meta-analyses producing some rather spectacular ‘proofs’ of things like ESP (e.g. telepathy) and Psychokinesis (mind over matter). An example of this is the famous / infamous paper by Bem et al (2014) which apparently establishes the existence of precognition.

Unless you work at Hogwarts, this might not seem immediately applicable to education research. However, I think the two fields share some very similar problems – namely, the prevalence of pseudoscience and the potentially strong bias introduced into research by evangelistic fervour for one teaching philosophy or another.

One major reason that many are cautious of Bem’s conclusion is the issue of methodological controls. Edzard Ernst relates one such problem with a meta-analysis of the effectiveness of acupuncture as a treatment for IBS. He suggests that the evidence shows effectiveness, but only if one is very liberal with the truth.

“The largest RCT included in this meta-analysis was neither placebo-controlled nor double blind; it was a pragmatic trial with the infamous ‘A+B versus B’ design. Here is the key part of its methods section: 116 patients were offered 10 weekly individualised acupuncture sessions plus usual care, 117 patients continued with usual care alone. Intriguingly, this was the ONLY one of the 6 RCTs with a significantly positive result!”

Education research is also likely to suffer from these issues, especially where studies are lacking a control group, random allocation or any form of blinding. Thus it would seem sensible to establish whether weaker quality studies are producing the larger effect sizes and distorting the outcomes of a meta-analytical study; for example, by seeing whether there is a correlation between an ordinal measure of study quality and effect size.

For all its flaws, meta-analysis is a potentially powerful and useful tool; it isn’t going away! What we should try to ensure is that we avoid some of the pitfalls uncovered in other disciplines. Otherwise, we risk falling into some of the ‘same old problems’ where evangelical beliefs, financial interests, weak methodology and lack of transparency undermine the practical and theoretical value of education research.

This entry was posted in Research Lead and tagged , , . Bookmark the permalink.

7 Responses to Meta-analysis in education: Some cautionary tales from other disciplines

  1. TheOtherDrX says:

    Interesting and thought-provoking post. See link below for a v. Recent meta-analysis study on active vs non-active learning in STEM subjects in HE Its important for the sole fact that all scientists respect PNAS as a journal of the highest calibre, but same not true for most education journals…
    This important meta-analysis does have many of the safeguards for good mata-analyses (funnel plots etc) but I do suspect the file drawer effect at play here (can you imagine trying to publish that lectures are better than active learning, even if that was observed…) My main concerns are that active learning might represent just 10%, or as much as 100% active learning (the limits of which are not defined) and with hundreds of studies, a direct quantitative correlation between the % of active and outcomes vs non-active could not be demonstrated. Does this mean that 100% active is worse than say 50-50? We just don’t know, which is a pity. Either way its a useful study in my sector if only to discourage lecturers from ‘going at it non-stop’ for 2 hours.


    • I agree, it looks like a genuinely thorough analysis. The results clearly indicate that lecturing on its own isn’t as effective – though you’re right that the next question becomes what mix of ‘active learning’ was in there. Like you seem to imply, I think it’s likely to be a non-linear relationship (likely varying by domain / prior attainment / etc) between the proportion of time spent in active learning and success rates.

      I thought particularly telling: “If the experiments analyzed here had been conducted as randomized controlled trials of medical interventions, they may have been stopped for benefit ..” Certainly a discouragement!


      • TheOtherDrX says:

        Yes, a discouragement certainly but missing the crucial point that good lectures probably have a place in STEM. I would really like to see some analysis of this data of mixed methods however I don’t think this message fits with the authors agenda, given the funding he has had… The author tells a good story and he has a TED talk out there, which made me laugh as the reason he changed from lecture to active was that he suddenly realised from tests that high performing students couldn’t apply knowledge. Clearly not just the fault of the lecture but also his assessment strategy!

        It was also published in a week of ‘lecture bashing’ such as this re-hash of an old blog post which appeared in The Guardian recently Its fair to say it got a strong reaction!

        Liked by 1 person

  2. Pingback: What skills are worth teaching? |

  3. Pingback: A Statistical Battleground | docendo discimus

  4. Pingback: Growth mindset: It’s not magic | Blogcollectief Onderzoek Onderwijs

  5. Pingback: Is collaboration really the best way of working? | chronotope

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s