Sometimes, a dispute with a consumer movement comes along that has profound implications for far more than the people in it. I think the dramatic clash between the ME/CFS patient community and a power base in the evidence community is one of those. It points to weaknesses in research methodology and practice that don’t get enough critical attention. It raises uncomfortable questions about the relationship between researchers and policy communities. And it pushes the envelope on open data in clinical trials, too.
All of that underscores the importance of consumer critique of research. Yet the case also shows how conditional researcher acceptance of consumers can be.
ME/CFS is Myalgic Encephalomyelitis (ME) / Chronic Fatigue Syndrome (CFS), a serious, debilitating condition with a complex of symptoms, and a lot of unknowns. The argument about exercise is rooted in competing views about the condition itself.
A biopsychosocial camp has contended since the 1990s that regardless of how it starts, people with ME/CFS dig themselves into a hole with their attitudes and behaviors, leading to physical deconditioning (sort of the opposite of getting fit). They argue that people can dig themselves out of it by changing their beliefs and behaviors with the help of cognitive behavioral therapy (CBT) and graded exercise therapy (GET).
One of the key people from this camp is Peter White, who explained GET this way with Lucy Clark back in 2005:
A vicious circle of increased exercise avoidance and symptoms occurs, which serves to perpetuate fatigue, and therefore CFS….Patients can be released from their self-perpetuating cycle of inactivity if the impairments that occur due to inactivity and their physiological deconditioning can be reversed. This can occur if they are willing to gradually exceed their perceived energy limits, and recondition their bodies through GET.
ME/CFS groups see it differently. For them, the condition is more complex than fatigue and fitness, and believing you can recover won’t overcome it. From their experience and surveys since 2001, GET in practice causes relapses, worsening symptoms for most people; CBT doesn’t change the condition’s symptoms; and self-management “pacing” does help. The UK group, Action for ME, describes pacing this way:
Taking a balanced, steady approach to activity counteracts the common tendency to overdo things. It avoids the inevitable ill effects that follow. Pacing gives you awareness of your own limitations which enables you positively to plan the way that you use your energy, maximising what you can do with it. Over time, when your condition stabilises, you can very gradually increase your activities to work towards recovery. [PDF]
By the early 2000s, the biopsychosocial camp could claim a few small trials in support of their position, but it was a very weak evidence base. They saw their treatment approach as a good news story for patients, though, and they had a lot of supporters. Exercise is something of a sacred cow to many – including in the evidence community. So there was a constituency that wasn’t going to be as critical of studies claiming advantages of exercise as they might be for other treatments.
There was another powerful camp that was attracted to the idea that cheap short interventions could get rid of ME/CFS, if only the person was willing to make an effort: insurance and welfare stakeholders. It was in their interests to reduce treatment expenditure and income dependency for this fairly common condition. The close associations disclosed by ME/CFS researchers in the biopsychosocial camp with these policy communities have the potential to be conflicts of interest – and they are certainly seen that way by many.
For people with ME/CFS, as George Faulkner explained, “unreasonable expectations of recovery” based on weak research was “presenting policies which reduce their options and income as benevolent and empowering interventions”. [PDF] Yet, even the most optimistic estimates show the overwhelming majority don’t benefit. And believing ME/CFS is a condition that can be cured by attitude and effort is stigmatizing, no matter how carefully you try to frame it. That’s harmful, for individuals who blame themselves (or get blamed) for their suffering, and for the collective of people deeply affected by a condition in effect tagged as “all in the mind”.
A small evidence base means the pool of researchers focusing on the topic is small, and the field can be relatively under-developed. White’s group, who were convinced CBT and GET were beneficial, had tried, unsuccessfully, to get a UK trial of the 2 therapies funded in 1998. That same year, the UK’s Chief Medical Officer established a working group to advise on the ME/CFS. Its report in 2002 was followed by the establishment of a Medical Research Council (MRC) research advisory group that recommended larger treatment trials – and stressed the importance of consumer involvement.
White’s group then brought the consumer group, Action for ME, on board, adding arms to study pacing and specialist medical care alone, plus a greater focus on adverse effects. (This development is described by White’s group here, and by Action for ME here.) The alliance wouldn’t survive the coming storm.
White and colleagues’ proposal became the PACE trial in 2005, after the MRC awarded it the biggest grant ever for an ME/CFS trial. Ultimately, the MRC contributed £2,779,361 (US$3.6 million), a substantial chunk of which came from the Scottish Government’s Chief Scientist Office, the National Institute for Health Research (NIHR), the Department of Health, and the Department for Work and Pensions. Previous trials each had at most about 40 participants comparing 2 treatments. The PACE trial was shooting for more than 600 people who weren’t severely affected, comparing 4 options. So right or wrong, that trial’s result was going to dominate the evidence base.
The academics responsible for the trial were Peter White, Trudie Chalder, and Michael Sharpe. In 1993, Chalder was the lead author of the trial’s primary outcome measure tool – the Chalder Fatigue Scale, [PDF] and in 1991, Sharpe was the lead author of “the Oxford criteria” used to diagnose who was eligible for the trial.
Let’s stop right there. Those 2 research instruments are pivotal for this trial. And the intellectual investment in them by these co-authors doomed this trial before the first participant was ever recruited. Why?
To rely on the trial’s results for decision making in ME/CFS, for starters you would need to know that the people in it really had ME/CFS. The Oxford criteria for ME/CFS used to choose who was included can’t deliver that reliably. This is what the US Agency for Healthcare Research and Quality (AHRQ) concluded in a systematic review of diagnosis and treatment for ME/CFS in 2014, several years after the PACE trial:
None of the current diagnostic methods have been adequately tested to identify patients with ME/CFS when diagnostic uncertainty exists.
AHRQ dug into this further in 2016, after an NIH meeting agreed with them, concluding:
The Oxford (Sharpe, 1991) case definition is the least specific of the definitions and less generalizable to the broader population of patients with ME/CFS. It could identify individuals who have had 6 months of unexplained fatigue with physical and mental impairment, but no other specific features of ME/CFS such as post-exertional malaise which is considered by many to be a hallmark symptom of the disease. As a result, using the Oxford case definition results in a high risk of including patients who may have an alternate fatiguing illness or whose illness resolves spontaneously with time. In light of this, we recommended in our report that future intervention studies use a single agreed upon case definition, other than the Oxford (Sharpe, 1991) case definition. If a single definition could not be agreed upon, future research should retire the use of the Oxford (Sharpe, 1991) case definition. The National Institute of Health (NIH) panel assembled to review evidence presented at the NIH Pathways to Prevention Workshop agreed with our recommendation, stating that the continued use of the Oxford (Sharpe, 1991) case definition “may impair progress and cause harm.”
When they analyzed what this means for the evidence on treatment for people with ME/CFS, it’s devastating:
Blatantly missing from this body of literature are trials evaluating effectiveness of interventions in the treatment of individuals meeting case definitions for ME or ME/CFS.
AHRQ argued that you need specific ME/CFS symptoms in your criteria – like experiencing post-exertional malaise. The PACE trial measured that specific symptom, because it was a secondary outcome. So how many of the trial participants had it before the trial treatments started? The baseline data show it ranged from only 82% in the GET group to 87% in the treatment-as-usual group. That underscores AHRQ’s point about the risk of the Oxford criteria, doesn’t it?
The second blow in this one-two punch is the validity of the Chalder fatigue scale, which they used as a primary outcome. It’s a patient-reported outcome measure (PRO or PROM) – a questionnaire for participants. In 2011, a guideline for how to reliably assess the validity of PROMs was released, called COSMIN for short. And in 2012, Kirstie Haywood and colleagues followed it in their systematic review of PROMs for ME/CFS.
They paint a dismal picture of the quality of PROMs for ME/CFS and a lack of robust enough development and evaluation, including for the Chalder fatigue scale and the first version of the other PROM primary outcome for physical function (SF-36: they used version 2). Neither is strong enough to lean on for ME/CFS. The result for the evidence on ME/CFS treatment, they believe, is as devastating as the implications for AHRQ on diagnosis:
[F]ailure to measure genuinely important patient outcomes suggests that high quality and relevant information about treatment effect is lacking.
Lisa Whitehead did a pre-COSMIN systematic review in 2009: she came to the same conclusion about the lack of adequate evidence to support the Chalder fatigue scale.
Depressing, isn’t it? But it’s not surprising, either. These kinds of weak links in the science of doing research are common, fueling confusion and controversies. When the trial’s results were published in 2011, consumers studying the trial had a wealth of methodological weaknesses to zone in on – and plenty of information on the science of doing research to point the way.
I think this brings us to one of the important lessons from this controversy: involving a consumer organization isn’t likely to be enough, on its own, to prepare for wading into a controversial area where there is a strongly networked community with a lot at stake. If your methods aren’t in very sound scientific territory and your findings aren’t welcome, you will get hammered.
That could, and should, have a positive impact on research standards. “[Taking] sides in pre-existing methodological disputes” is one of the main ways HIV activists came to have such an influence on the conduct of clinical trials, according to Steven Epstein’s analysis.
Let’s jump forward to the current battleground with consumers over the Cochrane review that included the PACE trial, supporting its conclusions about benefits of CBT and GET and lack of effectiveness of its pacing intervention. Systematic reviews like the ones from Cochrane, AHRQ, and NICE are critical influences on the policy and clinical impact of a trial, and potentially for research funding. That’s especially key for one as big as PACE in an area where stronger trials that could confirm or challenge its results can’t be taken for granted.
Robert Courtney and Tom Kindlon are both people with lives severely affected by ME/CFS, who critiqued the PACE trial intensively. They posted detailed analyses of the Cochrane review and its verdict on the PACE trial and other evidence in Cochrane’s commenting system. Their comments from 2015 to 2016 raised important issues, but the review’s authors rather brushed off their concerns.
There is only 1 Cochrane review journal, but the reviews are produced and edited in Cochrane review groups, not centrally. There are 53 of those groups. The group that deals with ME/CFS is the Common Mental Disorders group, reflecting the disproportionate power base that psychiatry and psychology have established around this illness and its treatment. Courtney complained to the editor in chief in February 2018, when the review group was so unresponsive.
In November 2018, Cochrane reported action. Sadly, Courtney didn’t get to see this: he died by suicide in March, after a long struggle with health-related problems. Cochrane attached a note to the review, stating that Cochrane’s editor in chief, David Tovey, was not satisfied that the authors had made enough revision to their review in response to Courtney’s feedback, and a full update was in the works:
The Editor in Chief and colleagues recognise that the author team has sought to address the criticisms made by Mr Courtney but judge that further work is needed to ensure that the review meets the quality standards required, and as a result have not approved publication of the re‐submission. The review is also substantially out of date and in need of updating.
That update hasn’t been published yet. It could take a while, especially if a new set of editors is dealing with it in a new review group. That’s not just because of different people. Cochrane review groups have some standardized methods they are supposed to adhere to, but there can be big differences, too.
Courtney’s complaint also raised concerns about another review in progress at Cochrane. It was a review based on individual patient data from the exercise review. Tovey and his colleagues reviewed the version that had been submitted for publication by Cochrane too, and it was rejected. The already-published protocol for that review was then retracted (withdrawn in Cochrane-speak, with no reason given). And finally, the review group and Tovey agreed to transfer the responsibility for ME/CFS Cochrane reviews to a different review group, in acknowledgment of the fact that it isn’t a mental health disorder.
That’s a lot to achieve from consumer pushback on research, so kudos to Courtney and Kindlon for their effort. But will the changes in the updated review be critical enough of the evidence base, and closer to the AHRQ position? I think they should be. And I think the reasons for that are included in Courtney’s and Kindlon’s comments. There are a lot of them, but I’ll focus on one major one: the risk of selective reporting bias. (Note: I’ve removed and described or linked the references in quotes below for simplicity.)
From Kindlon’s comment (page 72 of PDF of the Cochrane review):
I don’t believe that White et al. (2011) (the PACE Trial) should be classed as having a low risk of bias under “Selective reporting (outcome bias)”. According to the Cochrane Collaboration’s tool for assessing risk of bias, the category of low risk of bias is for: “The study protocol is available and all of the study’s pre-specified (primary and secondary) outcomes that are of interest in the review have been reported in the pre-specified way”. This is not the case in the PACE Trial.
He’s right. The criteria for “low risk of bias” he referenced came from the table in 8.5 of the Cochrane Collaboration’s handbook.
The PACE trial began recruiting participants – and therefore collecting data – early in 2005, and the protocol was published in 2007. Data collection ended in January 2010. There was a data analysis plan, but it wasn’t published till 2013 (after the results of the trial). At some point between the protocol and the final data analysis, they dropped a primary outcome and elevated a secondary one to primary status, and lowered the thresholds for what would be classified as “recovery”.
Before and after this change, there were different ways of interpreting the data from the Chalder fatigue scale. The one in the protocol set a higher bar for benefit than the one they swapped over to: had they stuck with it, it would be harder to conclude one treatment option had better results than another. And that was, according to their first results publication, the intent:
Before outcome data were examined, we changed the original bimodal scoring of the Chalder fatigue questionnaire (range 0–11) to Likert scoring to more sensitively test our hypotheses of effectiveness.
Here, responding to criticisms about this, they said they made the change:
…because, after careful consideration and consultation, we concluded that they were simply too stringent to capture clinically meaningful recovery.
Here, they wrote that the first way of using the fatigue scale came from levels in the small 1993 Chalder paper, and a larger Chalder paper in 2010 was responsible for re-calibrating their approach. That 2010 paper didn’t measure fatigue in people in ME/CFS.
Courtney, in his comments on this point (page 78 of the review), wrote that something else happened early in 2010, too – PACE’s so-called “sister trial”, FINE, published their results:
The FINE trial investigators had found no significant effect for their primary endpoint when using the bimodal scoring system for Chalder fatigue but determined a significant effect using the Likert system in an informal post-hoc analysis.
The same thing happens, apparently, with the PACE data. In 2016, Carolyn Wilshire and colleagues (including Kindlon) published a re-analysis of the data made publicly available after a freedom of information request:
Publications from the PACE trial reported that 22% of chronic fatigue syndrome patients recovered following graded exercise therapy (GET), and 22% following a specialised form of CBT. Only 7% recovered in a control, no-therapy group. These figures were based on a definition of recovery that differed markedly from that specified in the trial protocol…
When recovery was defined according to the original protocol, recovery rates in the GET and CBT groups were low and not significantly higher than in the control group (4%, 7% and 3%, respectively).
The second threshold for fatigue had been lowered so much, they point out, that 13% of the participants were “recovered” by that measure before the trial treatments started.
The PACE trial authors responded, accepting the accuracy of the re-analysis. They pointed out, as they did in their original paper, that there is no gold standard way of measuring recovery from ME/CFS. They also wrote a paper on recovery in the PACE trial, which is about more than the primary outcomes.
The Cochrane review authors defended their assessment of low risk of reporting bias by arguing that it was a reported protocol variation, and therefore not a problem. Their note on their risk of bias judgement quotes a sentence from the trial publication which points to a second source of selective reporting:
“These secondary outcomes were a subset of those specified in the protocol, selected in the statistical analysis plan as most relevant to this report.” Our primary interest is the primary outcome reported in accordance with the protocol, so we do not believe that selective reporting is a problem.
This is problematic for several reasons. Firstly, the primary outcomes differed from those in the protocol. The review authors argue because they were still specified before data analysis, “it is hard to understand how the changes contributed to any potential bias”.
The Cochrane Handbook states clearly in the rationale about selective outcome reporting: “The particular concern is that statistically non-significant results might be selectively withheld from publication”. What if you change your mind about an outcome after data collection has ended, for a trial going on in your own clinic, because you are now concerned that your results won’t be significant? As Courtney pointed out “Investigators of an open-label trial can potentially gain insights into a trial before formal analysis has been carried out”.
Secondly, the Cochrane reporting bias criteria are for the study as a whole – not just for primary outcomes. What’s more, they did rate one other study (Wearden 1998) at high risk of reporting bias – and that was based on concerns about secondary outcomes (here). And that was valid. That is explicitly stated in the Handbook, too:
Selective omission of outcomes from reports: Only some of the analysed outcomes may be included in the published report. If that choice is made based on the results, in particular the statistical significance, the corresponding meta-analytic estimates are likely to be biased.
I think rating the PACE trial at high risk of selective outcome reporting bias would be consistent with Cochrane methods and the rating for Wearden 1998.
Where does all this leave us? We’re still only arguing about subjective measures: there still isn’t much evidence of benefit on objective outcomes at all. That leaves us in uncertainty. We don’t know how different approaches to exercise, “graded” or “paced”, really compare yet. There is too much uncertainty.
A similar message has been coming from multiple trials of GET – but only one particular form of pacing has been tested in a trial. The message could be fairly consistent on GET because it really does have a small amount of benefit for a minority of people (about 15% over and above regular care in PACE). Or the message could be fairly consistent because the trials have all relied on similar and deeply flawed methods. We won’t know till there are more rigorous methods to use to test these questions, and strong large trials are done that learn from the errors in the story so far.
This clash around the PACE trial raised other critical issues about the practice of science – especially around openness and response to consumer criticism. Activists pounded away to get access to data, using a variety of official channels and a lot of pressure. There’s a lot to learn from this. We obviously still haven’t got open data practice sorted out yet.
In this case, defensiveness won out over transparency, and that was throwing gasoline at a fire. Although official processes have seen important data and other information released, the researchers have also removed a lot of material that used to be in the public domain. So much so, that it’s now impossible to assess for yourself some of the concerns, because the materials have disappeared. That’s bad enough for any research, but it’s unacceptable for a publicly funded clinical trial.
Instead of responsiveness to criticism, some – not all – researchers have put massive effort into discrediting the whole community and rallying other researchers to their defense. It’s been a collective ad hominem attack. Being responsive when consumers are making mistakes or unjustified criticisms isn’t always easy, especially when there’s a barrage, with extremists in the mix. But it’s not only consumers who do that, is it? And there are important and legitimate issues here – not just people who don’t agree with the results.
This kind of clash happened to me early in my involvement in research, and I’ve written about that before. I don’t think there’s any shortcut. People need to keep listening and responding though, and as many bridges need to be built as possible. Carolyn Wilshire, a psychologist co-author of the PACE trial data analysis, has written of the concerns about researchers’ therapeutic allegiances and other relevant points. People’s positions can make it very difficult for them to be fair when developing and doing research. Even if they weren’t entrenched beforehand, they can quickly become so when their research is under attack. People shouldn’t take it personally, but they do. And the time that it will soak up isn’t usually factored into the research process.
As Epstein pointed out from analyzing the battles over HIV clinical trials, getting these relationships right can advance science and social progress – even though that’s not always easy. As he said, you can’t grab onto a consumer group in a controversial area “solely in passive terms – as a resource available for use, or an ally available for enrollment”. But that’s too often how public involvement in science is approached. (A recent systematic review on patient and public involvement in clinical trials, for example, had as its primary outcomes effects on recruiting and retaining participants.) We have to get past that, and deal better with these kinds of clashes. They are inevitable.
As for the dispute over the validity of key methods here? I think the balance should, and ultimately will, tip towards the ME/CFS consumer movement on this particular contested evidence.