Last year I wrote that a day may come, when we have evidence about masks that’s so strong, so convincing, only the most extremely biased will still be arguing about it. It is, still, not this day. But digging into a major, wide-ranging new review by Trish Greenhalgh and colleagues clarified several issues for me, and left me optimistic that we will get there.
There already is a stronger scientific consensus about the effectiveness of masks than you might think. It’s hard to see, though, because the amount of literature is overwhelming, and the field is clouded by smoke created by a few bombs.
Firstly there was the chaos caused by several national public health officials discouraging mask use early in the pandemic, driven to some extent by concern about inadequate supply, and wrong assumptions about the way the new disease was transmitted.
Then a couple of trials landed that fuelled enormous debate – the first of them blew past all other scientific articles to take the top spot in Altmetric’s list of the most discussed in 2021. In addition, there were controversial updates of the Cochrane systematic review by Tom Jefferson and colleagues. Based solely on the results of randomized trials, that review has been widely misrepresented as proving that masks don’t work – including by its lead author. It was weaponized by critics of masks and pandemic interventions generally, while taking a beating from people defending mask effectiveness – in The New York Times and Scientific American, for example.
One of the results of all this has been excessive claims about the necessity of paying attention only to randomized trials from mask critics on the one hand, and about the weaknesses of trials from mask defenders on the other. One opinion piece in the latter camp even had the headline that trials “are the worst way to answer the question.”
I have argued since the start of the pandemic that we need to consider more than randomized trials on mask effectiveness. I don’t agree, though, that trials are fundamentally unsuitable for the range of questions about masks, which includes the relative effectiveness of different types of them. Even if randomized trials weren’t all that useful, they can’t simply be ignored: They now constitute canon for mask critics. Totally dismissing trials at a conceptual level isn’t going to bring us closer to broad consensus on the evidence about masks.
What’s more, it in effect cedes the ground to a problematic interpretation of the trials. At the heart of the debate set off by the Jefferson review is meta-analysis. Meta-analysis involves statistical methods to combine the results of the trials. There isn’t just a single way to do it, though. The process involves many choices, from which studies to include, to which of various analysis methods to use.
After all that, there is the subjective process of drawing conclusions based on the summary of effectiveness produced by meta-analysis, alongside any trials that aren’t in the meta-analysis: Just because some trials were similar enough to combine, it doesn’t mean every single one was.
I haven’t seen a thorough analysis of all the systematic reviews on masks, but I’ve looked at a lot of them. Unlike Jefferson’s, almost all of them conclude, more or less, that the trials suggest that masks work – as does the new Greenhalgh review. The Greenhalgh review includes meta-analysis of randomized trials, and that provided a great opportunity to dig into the weeds. I used one of the Jefferson and Greenhalgh meta-analyses as a way into understanding how the Jefferson analysis led to such an outlier result.
Note: I’ve been deeply involved in the systematic review world for decades, so I know some of the authors of both reviews – disclosures below.
The specific meta-analysis I used from each review was the effect on one of the outcomes – influenza- or Covid-like illness of community use of masks (as opposed to healthcare professionals in hospitals etc). These are the combined results for that question in the respective reviews:
Jefferson review | Greenhalgh review |
---|---|
RR 0.95 [0.84, 1.09] | RR 0.89 [0.87, 0.91] |
That’s not an enormous difference, but it’s a critical one. The Jefferson estimate shows only a small difference (5%), and it’s not a definite one because of the number I have put in bold. Whereas the Greenhalgh estimate is a bit stronger, and it crosses the line into “this seems to have worked” territory because all the numbers are below 1. If the numbers were all above 1, then the estimate would have slipped over the line into “this didn’t seem to work.”
The Greenhalgh estimate doesn’t show a large reduction in risk for this combination of diseases (11%). This analysis includes all the people in the mask groups who didn’t wear them consistently, as well as the people in the control groups who did – a pretty common problem in these trials. For a public health intervention, that effect could make a real difference, especially over time.
So why do these meta-analyses come down on opposite sides of the ledger? There are differences between these meta-analyses in which trials are included, and in the calculations for the included trials as well. I didn’t try to assemble all the data myself to work out for sure where the biggest impact came from, but I think it was probably to do with a specific trial that Greenhalgh has in the meta-analysis and Jefferson doesn’t.
Before we get to that, though, I’ve put the data from the trials they both included in a table below, just to show you that even when the trials in a meta-analysis are the same, the analysis can be different. (For background, I have a post of tips on understanding data in meta-analysis here.)
Trial | Jefferson review | Greenhalgh review |
---|---|---|
Abaluck 2022 | 0.87 [0.81, 0.94] | 0.89 [0.87, 0.91] |
Aiello 2012 | 1.10 [0.88, 1.38] | 0.85 [0.59, 1.24] |
Cowling 2008 | 1.00 [0.34, 2.27] | 1.00 [0.54, 1.84] |
MacIntyre 2009 | 1.11 [0.64, 1.91] | 1.26 [0.69, 2.31] |
Suess 2012 | 0.61 [0.20, 1.87] | 0.51 [0.21, 1.25] |
There are a few differences, though, in which trials are included at all. If you want to dig into the different inclusions, I have included a table on this below this post. Generally, it comes down to Greenhalgh analyzing source control trials separately – some trials based on people already diagnosed with influenza, and Jefferson classifying the outcome “clinical respiratory illness” as if it is the same as “influenza-like illness.”
The exception that I think could be having a particularly strong influence on the result is an influenza trial in the Greenhalgh analysis (Aiello 2010). It’s an included trial in the Jefferson review, too, but the authors wrote that they didn’t include it in the meta-analysis “since we did not consider ‘randomisation’ of three clusters to three arms to be a proper randomised trial.” This is odd to say the least, for a few reasons.
There isn’t a category of “proper” randomized trials. It can be a hard call to decide if a trial is randomized, but it is a call you have to make. If it’s not a randomized trial, it shouldn’t be an included study at all. There’s no justification in the methods of this particular review for excluding it from a meta-analysis, and Cochrane guidance points to doing sensitivity analyses for studies with concerns. That means you include it in the meta-analysis, and then re-run without it to see what difference it makes.
The authors decided not to include the trial, without reporting what difference it makes to the analysis. Many other reviewers do include it. Indeed, the authors of the Jefferson review themselves had included this trial in their meta-analysis in a preprint version of its first update in the pandemic – when they still recommended the use of masks (in combination with hand hygiene etc).
That said, not everyone agrees with the Jefferson and Greenhalgh groups that this body of trials should be combined at all – for example, combining influenza and Covid pandemic trials. Some believe even within those disease categories, the differences in the trials are too great for combination.
On the other hand, I think there’s a strong scientific consensus that the complications and weaknesses of trials on this question means we have to consider their results in the context of strong non-randomized studies that have methods for considering confounding variables. Before the pandemic, so did the Jefferson group. The rationale they gave for dropping non-randomized studies in 2020 was that, “for this update there were sufficient randomised studies to address our study aims.” It wasn’t sufficient to provide a clear answer, though. Having multiple trials for an analysis is not the same as having enough of them.
So where does all this leave us? There’s another pair of systematic reviews that I think are critical, both of which analyzed non-randomized trials. The first, by Derek Chu and colleagues, came early in the pandemic. It included the question of masks, for Covid, the original SARS, and MERS. They concluded face masks could reduce the risk of infection, from low-certainty evidence.
The second was a living, rapid review for Covid only, by Roger Chou and colleagues, originally funded by AHRQ, the US Agency for Healthcare Research and Quality, with a final update in May 2023. This included the Covid trials, but the authors didn’t think they were suitable for meta-analysis. Chou concludes, on the basis of the trials and non-randomized studies, that community mask use reduces the risk of Covid, though we still need stronger evidence. They rated the evidence as low to moderate.
There are more trials underway, and more non-randomized studies are sure to be in the pipeline too. Even if the evidence is still moderate at best, I think we need to have a systematic review from a highly trusted source – and it has to be so well-executed, so objective, and so respected that only contrarians could dismiss it.
As valuable as the Greenhalgh review is for many reasons, it’s not a full systematic review of the randomized trials and very strong non-randomized studies. And one of the authors (MacIntyre) is the leader of several of the influenza trials. I’ve written before about why I think this is a conflict of interest that leaves systematic reviews open to being discredited. That’s not to dismiss the value of this sweeping review – which, along with other strengths, has the best section on the potential harms I’ve read.
A forensic analysis of the trials and best non-randomized studies could also provide guidance on the best bets for feasible studies that could lift this body of evidence to strong. Also on my wish list: Guidance on the kinds of trial and/or non-randomized studies that should be ready to roll in a pandemic of a new airborne disease.
The ideal review may be too much to hope for, though I still have my fingers crossed for something at least close. Our experiences in this pandemic should leave no doubt that the stakes are so high, it’s worth investing in the knowledge we need to navigate outbreaks of airborne disease.
~~~~
Disclosures: I wrote about masksin WIRED and here on this blog in 2020 and in 2021, and in my newsletter in 2023. I discuss masks in another few posts (in 2020, in 2021 and in 2024).I’m one of the founders of the Cochrane Collaboration and participated in the development of its methods, and I studied them as part of my PhD. In the early days, I was a Coordinating Editor of a Cochrane Review Group for a few years. These days, I advise Cochrane on some controversial reviews, including chairing independent stakeholder groups for reviews on HPV vaccines, and exercise for people with ME/CFS. I know several of the authors of reviews I analyze in this post, including the lead authors, Trish Greenhalgh and Tom Jefferson. My PhD supervisors were both co-authors of the Cochrane review (Paul Glasziou and Chris Del Mar) referred to in this post, and I have butted heads with Tom Jefferson on other issues in the past.
Note: Disclosures corrected on June 11, with thanks to Eleanor Rees for her comment on Mastodon. And on June 12, I realized I had alluded to masks in a Statistically-Funny post in 2020, and added that.
Return to top
The cartoons are my own(CC BY-NC-NDlicense).(More cartoonsatStatistically Funny.)
Table: Comparison of studies in meta-analyses in the Jefferson (Cochrane) and Greenhalgh reviews on influenza- or Covid-like illness in randomized trials of masks versus no masks in the community
All studies | Jefferson | Greenhalgh | Reason for discordance |
---|---|---|---|
Abaluck 2022* | Yes | Yes | |
Aiello 2010 | No | Yes | Excluded from the meta-analysis in Jefferson |
Aiello 2012 | Yes | Yes | |
Alfelali 2020 | Yes | No | Clinical respiratory illness counted as influenza-like illness in Jefferson; separated in Greenhalgh |
Barasheed 2014 | Yes | No | Source control trial; not eligible for Greenhalgh |
Canini 2010 | Yes | No | Source control trial; not eligible for Greenhalgh |
Cowling 2008 | Yes | Yes | |
MacIntyre 2009 | Yes | Yes | |
MacIntyre 2016 | Yes | No | Source control trial; not eligible for Greenhalgh |
Suess 2012 | Yes | Yes |
Return to top
The cartoon is my own(CC BY-NC-NDlicense).(More cartoonsatStatistically Funny.)