How to Use an Article Reporting a Multiple Treatment Comparison Meta-analysis
Transcription
How to Use an Article Reporting a Multiple Treatment Comparison Meta-analysis
USERS’ GUIDES TO THE MEDICAL LITERATURE How to Use an Article Reporting a Multiple Treatment Comparison Meta-analysis Edward J. Mills, PhD, MSc John P. A. Ioannidis, MD, DSc Kristian Thorlund, PhD, MSc Holger J. Schu¨nemann, MD, PhD, MSc Milo A. Puhan, MD, PhD Gordon H. Guyatt, MD, MSc CLINICAL SCENARIO You are seeing a 45-year-old patient for whom, 6 weeks previously, you prescribed paroxetine, a selective serotonin reuptake inhibitor (SSRI), for treatment of generalized anxiety disorder (GAD). The patient reports reduced anxiety, but also insomnia and a reduced interest in sex. You wonder if there is another drug the patient might tolerate better while still maintaining treatment response. You retrieve a multiple treatment comparison (MTC) metaanalysis that evaluates response and tolerability of all available GAD drugs.1 You are not familiar with this type of study, and you wonder if there are special issues to which you should attend in evaluating its methods and results. MTC META-ANALYSES Traditionally, a systematic review addresses the merits of one intervention vs another (eg, placebo, or another active intervention). Data are combined from all studies—often randomized clinical trials (RCTs)—that meet eligibility criteria in what we will term a pairwise meta-analysis. Compared with single RCTs, meta-analysis improves the power to detect differences and also facilitates the examination of the extent to which there are important differences in treatment effects across eligible RCTs—variability that is frequently called heterogeneity.2,3 Large, 1246 JAMA, September 26, 2012—Vol 308, No. 12 Multiple treatment comparison (MTC) meta-analysis uses both direct (headto-head) randomized clinical trial (RCT) evidence as well as indirect evidence from RCTs to compare the relative effectiveness of all included interventions. The methodological quality of MTCs may be difficult for clinicians to interpret because the number of interventions evaluated may be large and the methodological approaches may be complex. Clinicians and others evaluating an MTC should be aware of the potential biases that can affect the interpretation of these analyses. Readers should consider whether the primary studies are sufficiently homogeneous to combine; whether the different interventions are sufficiently similar in their populations, study designs, and outcomes; and whether the direct evidence is sufficiently similar to the indirect evidence to consider combining. This article uses the existing Users’ Guides format to address study validity, interpretation of results, and application to a patient scenario. www.jama.com JAMA. 2012;308(12):1246-1253 unexplained heterogeneity may reduce a reader’s confidence in estimates of treatment effects. A drawback of pairwise metaanalysis is that it evaluates the effects of only 1 intervention vs 1 comparator and does not permit inferences about the relative effectiveness of several interventions unless all have been compared directly in head-to-head trials. Yet, for many medical conditions, there are numerous available interventions that have—unfortunately—most frequently been compared with placebo and seldom with one another.4,5 For example, despite 91 completed and ongoing RCTs addressing the effectiveness of drugs for the treatment of rheumatoid arthritis, only 5 compare directly against each other.4 Recently, another form of metaanalysis, the MTC meta-analysis (also known as network meta-analysis because it involves creating a network of treatments), has emerged.6,7 The MTC approach provides estimates of effect sizes for all possible pairwise comparisons whether or not they have been compared head-to-head in RCTs. FIGURE 1 displays examples of common networks of treatments. The eAppendix (available at http://www.jama .com) provides a glossary of common nomenclature found in MTCs. When 2 interventions, A and B (eg, paroxetine and lorazepam in FIGURE 2A), Author Affiliations: Faculty of Health Sciences, University of Ottawa, Ottawa, Ontario, Canada (Dr Mills); Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada (Drs Mills, Thorlund, Schu¨ nemann, and Guyatt); Stanford Prevention Research Center, Departments of Medicine and Health Research and Policy, Stanford University School of Medicine, and Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California (Dr Ioannidis); Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland (Dr Puhan). Corresponding Author: Edward J. Mills, PhD, MSc, Faculty of Health Sciences, University of Ottawa, Room 031, Thompson Hall, 35 University Private, Ottawa, ON, K1N 7K4, Canada (edward.mills@uottawa.ca). Users’ Guides to the Medical Literature Section Editor: Drummond Rennie, MD, Deputy Editor, JAMA. ©2012 American Medical Association. All rights reserved. USERS’ GUIDES TO THE MEDICAL LITERATURE Figure 1. Examples of Possible Network Geometry Treatment or intervention node Direct comparison in RCT Star network Single closed loop B A G Complex network Connected network A B A B C G A C C F D B D D C E E F The figure shows 4 network graphs. In each graph, lines show where direct comparisons exist from one or more trials. The star shows a network for which all interventions have a single mutual comparator. A single closed loop involves 3 interventions and can provide data to calculate both direct comparisons and indirect comparisons. A well-connected network in which all interventions have been compared against each other in multiple randomized controlled trials (RCTs). The complex network has multiple loops and arms that may have sparse connections. have not been compared directly, one can still estimate their relative effect if each has been compared directly against another intervention, C (eg, placebo). This is called an adjusted indirect comparison of A and B.8 Multiple treatment comparisons simultaneously include both direct and indirect evidence. Indirect comparisons (and MTCs) make the assumption that the relevant trials are similar enough in essential features (eg, patient characteristics, definitions and measurements of outcomes, and risk of bias in the studies)9 to be combined. There are 3 chief questions concerning the conduct of an MTC. First, among trials available for pairwise comparisons, are the studies sufficiently homogeneous to combine for each intervention? Second, across the trials involved in all interventions, are the studies sufficiently similar, with the exception of the intervention (eg, in important features such as populations, design, or outcomes), that they can be compared? Third, where direct and indirect evidence exists, are the findings sufficiently consistent that both direct and indirect evidence can be relied upon? By including evidence from both direct and indirect comparisons, an MTC may increase precision in estimates of the relative effects of treatments and facilitate simultaneous comparisons, or even ranking, of these treatments.7 However, because MTCs are methodologically sophisticated, they are often challenging to interpret.10 Herein, we will demystify the MTC by using the 3-step validity-resultsapplicability approach of other Users’ Guides.11 BOX 1 includes all issues relevant to evaluating systematic reviews, with a discussion that highlights the issues most important in MTCs. ARE THE RESULTS OF THE STUDY VALID? Figure 2. A Simple Indirect Comparison and Simple Closed Loop Treatment or intervention node Direct comparison in RCT Indirect comparison A Indirect comparison Placebo Paroxetine Lorazepam Did the Review Explicitly Address a Sensible Clinical Question? One can formulate questions of optimal patient management in terms of patients, interventions, comparators, and outcomes. Broader eligibility criteria may enhance generalizability of the results, but may be misleading if participants are too dissimilar and, as a consequence, heterogeneity is large. Diversity of interventions may also be excessive if authors pool results from different doses, or even different agents in the same class (eg, all statins), based on the assumption that effects are similar. Readers should ask whether investigators have been too broad in their inclusion of different populations, of different doses or different agents in the same class, or of different outcomes, and ©2012 American Medical Association. All rights reserved. B Closed loop Nicotine replacement therapy Varenicline Bupropion A, In an indirect comparison, there is direct evidence from paroxetine compared with placebo and direct evidence of lorazepam compared with placebo. Therefore, the indirect comparison can be applied to determine the effect of paroxetine compared with lorazepam, even if no direct head-to-head comparison exists for these 2 agents. B, In the closed loop, there is direct evidence that compares nicotine replacement therapy with both varenicline and also bupropion. There is also direct evidence comparing bupropion with varenicline. Therefore, enough information exists to evaluate whether the results are coherent between direct and indirect evidence. JAMA, September 26, 2012—Vol 308, No. 12 1247 USERS’ GUIDES TO THE MEDICAL LITERATURE Box 1. Critical Appraisal Guide to a Multiple Treatment Comparison Meta-analysis A. Are the results of the study valid? Did the review explicitly address a sensible clinical question? Was the search for relevant studies exhaustive? Were there major biases in the primary studies? B. What are the results? What was the amount of evidence in the network? Were the results similar from study to study? Were the results consistent in direct and indirect comparisons? What were the overall treatment effects and their uncertainty, and how did the treatments rank? Were the results robust to sensitivity assumptions and potential biases? C. How can I apply the results to patient care? Were all patient-important outcomes considered? Were all potential treatment options considered? Are any postulated subgroup effects credible? What is the overall quality and what are limitations of the evidence? Box 2. Using the Guide Returning to our opening scenario, the multiple treatment comparison we identified compared the efficacy and tolerability of available general anxiety disorder (GAD) drugs, with a specific focus on drugslicensedintheUnitedKingdom(because it was a British study) using RCT evidence.1 (P, GAD patients; I and C, available drugs; O, efficacy and tolerability.) Patients in the included RCTs met similarly broad diagnostic criteria, including being older and receiving inpatient and outpatient care.29 The authors assumed all doses of each drug were equivalent (no good evidence exists for or against this assumption).Foroutcomes,theauthorsconsidered response (ⱖ50% reduction in Hamilton Anxiety Scale), remission (final score ⱕ7), and tolerability (not withdrawing because of adverse events). These outcomes are important to patients, and the should look carefully at the resulting heterogeneity. When substantial clinical variability or statistical heterogeneity is present, authors may conduct subgroup analyses or meta-regression to explain heterogeneity. If such analyses are successful in explaining heterogeneity, the 1248 JAMA, September 26, 2012—Vol 308, No. 12 definitions of the outcomes were consistent across trials. The search for published literature was comprehensive, but the authors did not search for unpublished data and did not include RCTs of reboxetine, buspirone, or alprazolam, interventions for which data were available. Because many mental health RCTs are unpublished,20 the risk of publication bias is substantial. Eligibility assessment was done by a single individual with a 10% random selection of articles additionally reviewed by another reviewer. These same reviewers independently extracted prespecified information on study characteristics and outcomes. The authors did not perform a risk-of-bias assessment of the included trials. P indicates patients; I, interventions; C, comparators; and O, outcomes. MTC may provide results that more optimally match the clinical settings and the characteristics of the patient.12 For example, in an MTC evaluating different statins for cardiovascular disease protection, the authors used metaregression to address whether it was appropriate to combine results across primary and secondary prevention populations, different statins, and different doses of statins by examining whether trials with these features exhibited different treatment effects.13 Including multiple control interventions (eg, placebo, no intervention, or older standard of care) may enhance the robustness and connectedness of the treatment network. However, it is important to gauge and account for potential differences between control groups. For example, due to potential placebo effects, patients receiving placebo in a blinded RCT may have differing responses than patients receiving no intervention in a nonblinded RCT. Thus, if 2 of 3 active treatments, A and B, have been compared with placebo, and 2 of 3 active treatments, B and C, have been compared with no intervention, the different choice of “control groups” may produce misleading results. By examining whether certain trials exhibit different treatment effects, meta-regression may address this problem. For example, in an MTC evaluating the effectiveness of smoking cessation therapies, the authors combined placebo-controlled groups with standardof-care control groups and then used meta-regression to examine whether the choice of control changed the effect size.14 The authors found that trials using placebo controls had smaller effect sizes than those using standard of care, and this explained the identified heterogeneity. Was the Search for Relevant Studies Exhaustive? Some published MTCs have used the search strategies from other systematic reviews as the basis for identifying potentially eligible trials. Readers can be confident in such approaches only if the authors have updated the search to include recently published trials.15 The eligible interventions can be unrestricted. Sometimes, however, the authors may choose to include only a specific set of interventions; eg, those available in their country. Some indus- ©2012 American Medical Association. All rights reserved. USERS’ GUIDES TO THE MEDICAL LITERATURE try-initiated MTCs may choose to consider only a sponsored agent and its direct competitors.16 This may omit the optimal agent for some situations and tends to give a fragmented picture of the evidence. It is typically best to include all interventions.17 Data on clearly suboptimal or abandoned interventions may still offer indirect evidence for other comparisons.17 In an MTC of 12 treatments for major depression, the authors chose to exclude placebo-controlled RCTs and included only head-to-head active treatment RCTs.18 However, publication bias in the antidepressant literature is well acknowledged,19,20 and by excluding placebo-controlled trials, the analysis loses the opportunity to benefit from additional available evidence.21 Exclusion of eligible interventions, in this case placebos, may not just decrease statistical power. Placebocontrolled trials may be different from head-to-head comparison trials in their conduct or in the degree of bias (eg, they may have more or less publication bias or selective outcome and analysis reporting). Thus, their exclusion may also affect the point estimates of the effects of pairwise comparisons and may affect the relative ranking of regimens. Placing too great an emphasis on the interpretation of rankings rather than the relative comparisons may be misleading to clinicians and patients. When a network meta-analysis of second-generation antidepressants was later conducted and did include placebo-controlled trials, relying only on the relative differences between treatments using the same depression scale, the authors reached a different interpretation from the earlier MTC.18,22,23 Finally, original trials often address multiple outcomes. Selection of MTC outcomes should not be data-driven but should be based on importance for patients and consider both benefit and harm outcomes. Were There Major Biases in the Primary Studies? Trial-level limitations (eg, lack of concealment or blinding, or loss to followup) may bias results of RCTs.24 The Box 3. Potential Reasons for Incoherence Between the Results of Direct and Indirect Comparisons Chance Genuine diversity Differences in enrolled participants (eg, entry criteria, clinical setting, disease spectrum, baseline risk, selection based on prior response) Differences in the interventions (eg, dose, duration of administration, prior administration [second-line treatment]) Differences in background treatment and management (eg, evolving treatment and management in more recent years) Differences in definition or measurement of outcomes Bias in head-to-head (direct) comparisons Optimism bias with unconcealed analysis Publication bias Selective reporting of outcomes and analyses Inflated effect size in early stopped trials and in early evidence Limitations in allocation concealment, blinding, loss to follow-up, analysis as randomized Bias in indirect comparisons Each of the biasing issues above can affect the results of the direct comparisons on which the indirect comparisons are based Cochrane Collaboration provides detailed advice on dealing with triallevel biases in meta-analysis.25,26 Publication bias and selective outcome reporting bias 27,28 are major threats to the results of pairwise metaanalyses and may affect the availability of information among MTCs. The effect of these biases depends also on whether specific interventions and comparisons are affected more than others (BOX 2). WHAT ARE THE RESULTS? What Was the Amount of Evidence in the Treatment Network? One can gauge the amount of evidence in the treatment network from the number of trials, total sample size, and number of events for each treatment and comparison. Furthermore, the extent to which the treatments are connected in the network is an important determinant in the quality of the evidence. Understanding the geometry of the network (nodes and links) will permit the reader to examine the larger picture and see what is com- ©2012 American Medical Association. All rights reserved. pared with what.30 The authors should present the structure of the network (as in Figure 1 examples). Multiple treatment comparison analyses are feasible when there is a consistent connection in the network. When different interventions have only been compared with a single common comparator (eg, placebo), this is a star network (Figure 1). A star network only allows for indirect comparison between active treatments, which reduces confidence in effects, particularly if there are a limited number of trials, patients, and events.31 When there are data available using both direct and indirect evidence of the same interventions, this is a closed loop. The presence of direct evidence increases confidence in the estimates of interest. Often, a treatment network will include a mixture of exclusively indirect links and closed loops. Most networks have unbalanced shapes with many trials of some comparisons but few or none of others.30 In this situation, evidence may be of high quality for some treatments and comparisons but of low JAMA, September 26, 2012—Vol 308, No. 12 1249 USERS’ GUIDES TO THE MEDICAL LITERATURE Box 4. Using the Guide Returningtoourclinicalscenario,Figure3 displays the network of considered treatments. The authors excluded 13 relevant RCTs because the interventions could not be connected to any of the treatments in the network. All treatments are informed by placebo comparisons, some by more trials than others, and some by direct comparison of active treatments . There are different degrees of evidence supporting the comparative effectiveness of each of the treatments.Forexample,thefluoxetineevidence comes only from a subgroup analysisofalargertrialthatcomparesitwithvenlafaxine and placebo. The evidence base for fluoxetine is thus a very weak mix of direct (1 trial) and indirect (1 fluoxetine vs venlafaxine trial and 8 venlafaxine vs placebo trials). In contrast, evidence for pregabaline is based on a larger amount of evidencefrombothdirectandindirectcomparisons. Five trials have compared pregabalinewithplacebo,2trialshavecompared pregabalinewithlorazepam,and1trialhas compared pregabaline with venlafaxine. Here, the availability of 3 head-to-head trials increases our confidence in effect estimates. Of 27 included RCTs, 23 reported response to treatment, 23 reported withdrawals due to adverse effects, and 14 reported on remission, raising concern about missing outcomes and selective quality for others. Unfortunately, MTC studies often fail to point out these differences in the evidence quality. Were the Results Similar From Study to Study? Marked, unexplained differences in treatment effects across trials in direct comparisons of alternative agents, or of single agents with no-treatment controls, lowers confidence in effect estimates. There are a number of statistical measures available to guide assessment of whether the variability (heterogeneity) of results is high; eg, I2 and other metrics.32 Authors should report the degree of heterogeneity in all comparisons—the greater the heterogeneity, the less certain the results. 1250 JAMA, September 26, 2012—Vol 308, No. 12 reporting,inparticularforremission.Studiespredominantlyincludedpatientsreceiving treatment for between 4 and 12 weeks. Because the authors did not report tests for pairwise heterogeneity, we cannot tell if the magnitude of effect was similar from study to study in each of the direct comparisons or in each drug-placebo comparison. The authors did, however, conduct severalsensitivityanalyses.Theauthorsalso checked the coherence between direct and indirectcomparisonsfromclosedloopsand did not find significant incoherence. They found that all drugs, except tigabine, exhibited significantly better response rates compared with placebo. Most treatments offered about a 50% improvement in response rates over placebo. Regarding our patient’s current drug (paroxetine), no drug appeared significantly better. When the authors calculated the probability of which drug was best, fluoxetine ranked the highest (63%), whereas paroxetine ranked third. Paroxetine had a higher rate of drug discontinuation than placebo (odds ratio, 2.51; 95% credible interval, 1.50-4.19), and ranked fourth for tolerability. By comparison, fluoxetine ranked first for response and third for tolerability. The results were not entirely robust in sensitivity analysis because the best-ranked treatment for effectiveness changed from fluoxetine to duloxetine. Possible explanations of differences in treatment effects can be examined using subgroup analysis and metaregression. However, these analyses are limited in the presence of small numbers of trials, and apparent subgroup effects often prove spurious, an issue to which we will return in our discussion of applicability.33-35 Were the Results Consistent in Direct and Indirect Comparisons? Head-to-head direct comparisons of treatments are generally more trustworthy than indirect comparisons. However, head-to-head trials can also yield misleading estimates; eg, when conflicts of interest influence the choice of comparators used or result in selec- tive reporting. Therefore, indirect comparisons may on occasion provide more trustworthy estimates.36 One can assess whether direct and indirect estimates yield similar effects whenever there is a closed loop in the network (as in Figure 2B). Statistical methods exist for checking this type of inconsistency, typically called a test for incoherence.37,38 An evaluation examining 112 examples of direct vs indirect evidence found that the results were statistically inconsistent 14% of the time.9 This same evaluation found that comparisons with smaller number of trials and measuring subjective outcomes had a greater risk of incoherence. When incoherence is present, there are many explanations for the authors—and for readers—to consider (BOX 3). For example, a meta-analysis examining the analgesic efficacy of paracetamol (acetaminophen) plus codeine in surgical pain displayed a direct comparison indicating the intervention was more efficacious than paracetamol alone (mean difference in pain intensity change, 6.97; 95% CI, 3.56 to 10.37). The adjusted indirect comparison did not show a significant difference between paracetamol plus codeine and paracetamol alone (−1.16, 95% CI; −6.95 to 4.64). 39 In this example, the direct and indirect evidence was statistically significantly incoherent (P=.02). This indirect analysis addresses a continuous subjective outcome (ie, pain intensity), and is thus more likely to statistically demonstrate incoherence. The explanation for incoherence may be that the direct trials included patients who had lower pain intensity at baseline, who had greater representation in the direct trials, and who may have been more responsive to the addition of codeine. Statistical testing is often not sufficient to document incoherence. Because MTCs are often imprecise, important differences may still exist in the absence of a statistically significant difference. Authors should report on whether they assessed incoherence and, if they did, what they found. ©2012 American Medical Association. All rights reserved. USERS’ GUIDES TO THE MEDICAL LITERATURE What Were the Overall Treatment Effects and Their Uncertainty, and How Did Treatments Rank? The treatment effects in an MTC are typically displayed with common effect estimates along with 95% credible intervals (CrIs). Credible intervals are the Bayesian equivalent to the more commonly understood, frequentist confidence intervals. When there are K interventions included in the treatment network, there are K × (K −1)/2 possible pairwise comparisons. For example, if there are 7 interventions, then there are 21 [7⫻(7 −1)/2] possible pairwise comparisons. Besides presenting treatment effects, authors may also present the probability that each treatment is superior to all other treatments, allowing ranking of treatments.40,41 Although this approach is appealing, it may also oversimplify an analysis because of fragility in the rankings, because differences between the ranks may be too small to be important, or because bias in the MTC may importantly affect the rank order. For example, an MTC examining directacting agents for hepatitis C found no statistical difference for sustained virological response between teleprevir and boceprevir (odds ratio, 1.42; 95% CrI, 0.89-2.25); based on these results, the probability of being the best favors teleprevir by far (93%) over boceprevir (7%).42 However, this 93% probability provides a misleadingly strong endorsement for teleprevir. One may wish to know the probability that teleprevir is better than boceprevir by a clinically important margin, eg, at least 50% better, and this probability is only about 40%. Therefore, probabilities of being the best should be interpreted with caution. Were the Results Robust to Sensitivity Assumptions and Potential Biases? Given the complexity of some MTC meta-analyses, authors may assess the robustness of their study findings by applying sensitivity analyses that show how the results change if some criteria or assumptions change. Sensitivity Figure 3. Treatment Network for the Drugs Considered in the Example Multiple Treatment Comparison on Generalized Anxiety Disorder Treatment or intervention node n Paroxetine No. of RCTs in direct comparison Lorazepam Pregabalin 2 1 2 Fluoxetine Sertraline 1 1 3 Duloxetine Tiagabine 5 3 1 2 2 1 Escitalopram Venlafaxine 5 4 8 Placebo The lines between treatment nodes indicate the comparisons made throughout randomized clinical trials (RCTs). The numbers on the lines indicate the number of RCTs informing a particular comparison. (The figure is based on Baldwin et al.1) analyses may include restricting the analyses to trials with a low risk of bias only or by examining different but related outcomes. The Cochrane Handbook for Systematic Reviews of Interventions 2 5 provides a discussion of sensitivity analyses. In an MTC on prevention of chronic obstructive pulmonary disease (COPD) exacerbations, the authors used the incidence rate as the primary outcome. However, there is some debate on whether incidence rates should be used in COPD trials,43 so the authors conducted sensitivity analyses using the binary outcome of ever having an exacerbation. The results were sufficiently similar to consider the analyses robust44 (BOX 4, FIGURE 3). HOW CAN I APPLY THE RESULTS TO PATIENT CARE? Were All Patient-Important Outcomes Considered? Many MTCs report only 1 or a few selected outcomes of interest. For example, a recent MTC comparing the efficacy of antihypertensive treatments only looked at heart failure and mortality,45 whereas an older MTC of an- ©2012 American Medical Association. All rights reserved. tihypertensive treatments also considered coronary heart disease and stroke.46 Adverse events are infrequently assessed in meta-analyses and in MTCs, reflecting poor reporting in primary trials.47,48 Multiple treatment comparisons conducted in the context of health technology assessment submissions and evidence-based practice reports are more likely to include multiple outcomes and assessments of harms than the less lengthy, journal-based MTCs. Were All Potential Treatment Options Considered? Multiple treatment comparisons may place restrictions on what treatments are examined. For example, for irritable bowel syndrome, an MTC may focus on pharmacological agents, neglecting RCTs of diet, peppermint oil, and counseling. 49 Decisions to focus on subclasses of drugs may also be problematic. For example, in rheumatoid arthritis, biologic diseasemodifying antirheumatic drugs (biologics) are used for patients failing conventional drugs. Five of the 9 available biologics are antitumor necrosis factor agents (anti-TNFs). JAMA, September 26, 2012—Vol 308, No. 12 1251 USERS’ GUIDES TO THE MEDICAL LITERATURE Box 5. Using the Guide The assessed outcomes (response, remission, and overall tolerability) are important, but some additional outcomes would be useful to know (eg, specific harms such as sexual dysfunction or insomnia). A concern is that although 9 treatments were evaluated, only one randomized controlled trial evaluated the drug that ranked the best (fluoxetine, comparing it to placebo and venlafaxine) and that evidence was from a subgroup analysis in the original trial where fluoxetine was not better than its active comparator (venlafaxine). Only 3 drugs had been evaluated in direct comparisons against at least 2 other drugs. In the subgroup analysis examining only licensed dose drugs available in the United Kingdom, the results and rankings were different than the primary analysis, thus lowering our confidence. In summary, confidence in estimates is limited by uncertainty regarding the risk of bias in individual studies, the possibility of publication bias, the small number of patients studied and the correspondingly imprecise estimates, the lack of information regarding heterogeneity in individual comparisons, and a paucity of direct comparisons. Are Any Postulated Subgroup Effects Credible? There are very few situations in which investigations have convincingly established important differences in the relative effect of treatment according to patient characteristics.51 Criteria exist for determining the credibility of subgroup analyses.51 Multiple treatment comparisons allow a greater number of RCTs to be evaluated and may offer more opportunities for subgroup analysis, but with due skepticism and respect for credibility criteria. In an MTC examining inhaled drugs for COPD, the authors examined whether a patient’s disease stage effected response according to severity of airflow obstruction, measured by forced expiratory volume in 1 second (FEV1).52 If FEV1 was 40% or less of predicted, then long-acting anticholinergics, inhaled corticosteroids, and combination treatment reduced exacerbations significantly compared with long-acting -agonists alone, but not if FEV1 was greater than 40% predicted. This difference was significant for inhaled corticosteroids (P=.02 for interaction) and combination treatment (P=.01), but not for long-acting anticholinergics (P=.46). What Is the Overall Quality of the Evidence, and What Are the Limitations? Box 6. Clinical Resolution You conclude that low-quality evidence allows only weak inferences regarding the effectiveness of the treatment your patient is currently receiving relative to other options. You recognize, however, that other drugs appear to have better tolerability profiles and will discuss with your patient the possibility of switching to one of these agents. One recent MTC only considered anti-TNFs and excluded other biologics.50 To the extent that the other biologic agents are equivalent or superior to the anti-TNFs, their exclusion risks giving clinicians the wrong impression of the best biologic agents. 1252 JAMA, September 26, 2012—Vol 308, No. 12 It is important to determine the quality of the overall MTC so that a reader can determine if the evidence provides strong inferences. The following aspects are hallmarks of high-quality evidence in an MTC: individual studies are at low risk of bias and publication bias is unlikely; results are consistent in individual direct comparisons and individual comparisons with no-treatment controls and are consistent between direct and indirect comparisons; sample size is large and confidence intervals are correspondingly narrow; and evidence includes direct comparisons. If all of these hallmarks are present and the differences in effect sizes are large, high confidence in estimates may be warranted. However, in most cases, some but not all of these hallmarks are present, and confidence in key estimates are likely to warrant only moderate or low confidence. As MTC methods become popular, clinicians will find more than one metaanalysis addressing the same question and—as in the case of the secondgeneration antidepressants that we have already mentioned—may sometimes arrive at different conclusions.18,54 Differences in eligibility criteria and outcomes may explain these differences. To illustrate, there are at least 15 published MTCs and health technology assessments on the comparative effectiveness of biologics for rheumatoid arthritis. Across these published reports, one can observe considerable discrepancies with respect to the numbers and sets of included trials, assessment of heterogeneity and risk of bias, and inclusion/ exclusion/modification of outcomes.53 Concerns with each of these issues, as well as discrepancies between MTCs, leaves uncertainty regarding which of the biologics has the greatest treatment effect (BOX 5 and BOX 6). Author Contributions: Dr Mills had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Financial Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Dr Mills reported consulting for Merck & Co Inc, Pfizer Ltd, Novartis and Takeda on MTC issues; receiving grant funding from the Canadian Institutes of Health Research (CIHR) Drug Safety & Effectiveness Network (DSEN) NETMAN project to develop methods and educational materials on MTCs; and receiving salary support from the CIHR through a Canada Research Chair. Dr Ioannidis reported receiving grant funding from the CIHR DSEN NETMAN project to develop methods and educational materials on MTCs. Dr Thorlund reported consulting to Merck & Co Inc, Pfizer Ltd, Novartis, Takeda, or GlaxoSmithKline on MTC issues; receiving grant funding from the CIHR DSEN NETMAN project to develop methods and educational materials on MTCs; and receiving salary support from the CIHR DSEN NETMAN project. Dr Schu¨nemann is a coinvestigator on a grant from the CIHR DSEN NETMAN project to develop methods and educational materials on MTCs. Dr Guyatt reported consulting to BristolMyers Squibb and UpToDate Inc on MTC issues and is a coinvestigator on a grant from the CIHR DSEN NETMAN project to develop methods and educational materials on MTCs. Funding/Support: The CIHR, through the DSEN NETMAN project, provided support for this article. Role of the Sponsor: The funding agency had no role in the design or conduct of the study; in the collection, management, analysis, or interpretation of the data; or in the preparation, review, or approval of the manuscript. DSEN had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript. ©2012 American Medical Association. All rights reserved. USERS’ GUIDES TO THE MEDICAL LITERATURE Disclaimer: Several examples used to demonstrate concepts in this article originated from consulting projects. Online-Only Material: The eAppendix is available at www.jama.com. Additional Contributions: We thank David Baldwin, MBBS, DM, FRCPsych (University of Southampton), for clarifications on his study. REFERENCES 1. Baldwin D, Woods R, Lawson R, Taylor D. Efficacy of drug treatments for generalised anxiety disorder: systematic review and meta-analysis. BMJ. 2011; 342:d1199. 2. Lau J, Ioannidis JP, Schmid CH. Summing up evidence: one answer is not always enough. Lancet. 1998; 351(9096):123-127. 3. Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC. Meta-analyses of randomized controlled trials. N Engl J Med. 1987;316(8):450-455. 4. Estellat C, Ravaud P. Lack of head-to-head trials and fair control arms: randomized controlled trials of biologic treatment for rheumatoid arthritis. Arch Intern Med. 2012;172(3):237-244. 5. Lathyris DN, Patsopoulos NA, Salanti G, Ioannidis JP. Industry sponsorship and selection of comparators in randomized clinical trials. Eur J Clin Invest. 2010; 40(2):172-182. 6. Salanti G, Higgins JP, Ades AE, Ioannidis JP. Evaluation of networks of randomized trials. Stat Methods Med Res. 2008;17(3):279-301. 7. Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med. 2004;23(20):3105-3124. 8. Bucher HC, Guyatt GH, Griffith LE, Walter SD. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. J Clin Epidemiol. 1997;50(6):683-691. 9. Song F, Xiong T, Parekh-Bhurke S, et al. Inconsistency between direct and indirect comparisons of competing interventions: meta-epidemiological study. BMJ. 2011;343:d4909. 10. Mills EJ, Bansback N, Ghement I, et al. Multiple treatment comparison meta-analyses: a step forward into complexity. Clin Epidemiol. 2011;3:193-202. 11. Guyatt G, Straus S, Meade MO, et al. Therapy (randomized trials). In: Guyatt B, Rennie D, Meade MO, Cook DJ, eds. Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. 2nd ed. New York, NY: McGraw-Hill Co; 2008:67-86. 12. Nixon RM, Bansback N, Brennan A. Using mixed treatment comparisons and meta-regression to perform indirect comparisons to estimate the efficacy of biologic treatments in rheumatoid arthritis. Stat Med. 2007;26(6):1237-1254. 13. Mills EJ, Wu P, Chong G, et al. Efficacy and safety of statin treatment for cardiovascular disease: a network meta-analysis of 170,255 patients from 76 randomized trials. QJM. 2011;104(2):109-124. 14. Mills EJ, Wu P, Lockhart I, Thorlund K, Puhan M, Ebbert JO. Comparisons of high dose and combination nicotine replacement therapy, varenicline and bupropion for smoking cessation: a systematic review and multiple treatment meta-analysis [published online ahead of print August 6, 2012]. Ann Med. 2012;44 (6):588-597. 15. Liberati A, Altman DG, Tetzlaff J, et al. The PRISMA statement for reporting systematic reviews and metaanalyses of studies that evaluate health care interventions: explanation and elaboration. Ann Intern Med. 2009;151(4):W65-94. 16. Sutton A, Ades AE, Cooper N, Abrams K. Use of indirect and mixed treatment comparisons for technology assessment. Pharmacoeconomics. 2008; 26(9):753-767. 17. Kyrgiou M, Salanti G, Pavlidis N, Paraskevaidis E, Ioannidis JP. Survival benefits with diverse chemotherapy regimens for ovarian cancer: meta-analysis of multiple treatments. J Natl Cancer Inst. 2006;98 (22):1655-1663. 18. Cipriani A, Furukawa TA, Salanti G, et al. Comparative efficacy and acceptability of 12 newgeneration antidepressants: a multiple-treatments meta-analysis. Lancet. 2009;373(9665):746-758. 19. Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med. 2008;358(3):252-260. 20. Ioannidis JP. Effectiveness of antidepressants: an evidence myth constructed from a thousand randomized trials? Philos Ethics Humanit Med. 2008;3:14. 21. Higgins JP, Whitehead A. Borrowing strength from external trials in a meta-analysis. Stat Med. 1996; 15(24):2733-2749. 22. Ioannidis JP. Ranking antidepressants. Lancet. 2009;373(9677):1759-1760. 23. Gartlehner G, Hansen RA, Morgan LC, et al. Comparative benefits and harms of second-generation antidepressants for treating major depressive disorder: an updated meta-analysis. Ann Intern Med. 2011; 155(11):772-785. 24. Devereaux PJ, Choi PT, El-Dika S, et al. An observational study found that authors of randomized controlled trials frequently use concealment of randomization and blinding, despite the failure to report these methods. J Clin Epidemiol. 2004;57(12): 1232-1236. 25. Sterne JAC, Egger M, Moher D. Addressing reporting biases. In: Higgins JPT, Green S, eds. Cochrane Handbook for Systematic Reviews of Intervention. Version 5.1.0 (updated March 2011). The Cochrane Collaboration; 2011. 26. Moher D, Pham B, Jones A, et al. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet. 1998;352(9128):609-613. 27. Chan AW, Hro´bjartsson A, Haahr MT, Gøtzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA. 2004; 291(20):2457-2465. 28. Young NS, Ioannidis JP, Al-Ubaydli O. Why current publication practices may distort science. PLoS Med. 2008;5(10):e201. 29. Wittchen HU, Kessler RC, Beesdo K, Krause P, Ho¨fler M, Hoyer J. Generalized anxiety and depression in primary care: prevalence, recognition, and management. J Clin Psychiatry. 2002;63(Suppl 8): 24-34. 30. Salanti G, Kavvoura FK, Ioannidis JP. Exploring the geometry of treatment networks. Ann Intern Med. 2008;148(7):544-553. 31. Mills EJ, Ghement I, O’Regan C, Thorlund K. Estimating the power of indirect comparisons: a simulation study. PLoS One. 2011;6(1):e16237. 32. Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003; 327(7414):557-560. 33. Smith GD, Egger M. Going beyond the grand mean: subgroup analysis in meta-analysis of randomised trials. In: Smith GD, Egger M, Altman DG, eds. Systematic Reviews in Health Care: Metaanalysis in Context. 2nd ed. London, England: BMJ Publishing Group; 2001:143-156. 34. Thompson SG, Higgins JP. How should metaregression analyses be undertaken and interpreted? Stat Med. 2002;21(11):1559-1573. 35. Jansen J, Schmid C, Salanti G. When do indirect and mixed treatment comparisons result in invalid findings? a graphical explanation. Poster presented at: 19th Cochrane Colloquium; October 19-22, 2011; Madrid, Spain. P3B379. 36. Song F, Harvey I, Lilford R. Adjusted indirect comparison may be less biased than direct comparison for evaluating new pharmaceutical interventions. J Clin Epidemiol. 2008;61(5):455-463. ©2012 American Medical Association. All rights reserved. 37. Lu G, Ades A. Assessing evidence inconsistency in mixed treatment comparisons. J Am Stat Assoc. 2006;101:447-459. 38. Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment comparison meta-analysis. Stat Med. 2010;29(7-8):932-944. 39. Zhang WY, Li Wan Po A. Analgesic efficacy of paracetamol and its combination with codeine and caffeine in surgical pain—a meta-analysis. J Clin Pharm Ther. 1996;21(4):261-282. 40. Salanti G, Ades AE, Ioannidis JP. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial. J Clin Epidemiol. 2011;64(2):163-171. 41. Golfinopoulos V, Salanti G, Pavlidis N, Ioannidis JP. Survival and disease-progression benefits with treatment regimens for advanced colorectal cancer: a meta-analysis. Lancet Oncol. 2007;8(10):898-911. 42. Diels J, Cure S, Gavart S. PIN7 the comparative efficacy of telaprevir versus boceprevir in treatmentnaive and treatment-experienced patients with genotype 1 chronic hepatitis. Value Health. 2011;14(7): A266-A266. 43. Aaron SD, Fergusson D, Marks GB, et al; Canadian Thoracic Society/Canadian Respiratory Clinical Research Consortium. Counting, analysing and reporting exacerbations of COPD in randomised controlled trials. Thorax. 2008;63(2):122-128. 44. Mills EJ, Druyts E, Ghement I, Puhan MA. Pharmacotherapies for chronic obstructive pulmonary disease: a multiple treatment comparison meta-analysis. Clin Epidemiol. 2011;3:107-129. 45. Sciarretta S, Palano F, Tocci G, Baldini R, Volpe M. Antihypertensive treatment and development of heart failure in hypertension: a Bayesian network metaanalysis of studies in patients with hypertension and high cardiovascular risk. Arch Intern Med. 2011;171(5): 384-394. 46. Psaty BM, Lumley T, Furberg CD, et al. Health outcomes associated with various antihypertensive therapies used as first-line agents: a network meta-analysis. JAMA. 2003;289(19):2534-2544. 47. Hernandez AV, Walker E, Ioannidis JP, Kattan MW. Challenges in meta-analysis of randomized clinical trials for rare harmful cardiovascular events: the case of rosiglitazone. Am Heart J. 2008;156(1):2330. 48. Ioannidis JP, Evans SJ, Gøtzsche PC, et al; CONSORT Group. Better reporting of harms in randomized trials: an extension of the CONSORT statement. Ann Intern Med. 2004;141(10):781-788. 49. Ford AC, Talley NJ, Spiegel BM, et al. Effect of fibre, antispasmodics, and peppermint oil in the treatment of irritable bowel syndrome: systematic review and meta-analysis. BMJ. 2008;337:a2313. 50. Schmitz S, Adams R, Walsh CD, Barry M, FitzGerald O. A mixed treatment comparison of the efficacy of antiTNF agents in rheumatoid arthritis for methotrexate nonresponders demonstrates differences between treatments: a Bayesian approach. Ann Rheum Dis. 2012; 71(2):225-230. 51. Sun X, Briel M, Walter SD, Guyatt GH. Is a subgroup effect believable? Updating criteria to evaluate the credibility of subgroup analyses. BMJ. 2010; 340:c117. 52. Puhan MA, Bachmann LM, Kleijnen J, Ter Riet G, Kessels AG. Inhaled drugs to reduce exacerbations in patients with chronic obstructive pulmonary disease: a network meta-analysis. BMC Med. 2009;7:2. 53. Mills EJ, Thorlund K, Bansback N. Multiple treatment meta-analysis in rheumatoid arthritis can provide weak inferences due to methodological approaches. In: 2012 CADTH Symposium; April 15-17, 2012; Ottawa. OB1. 54. Gartlehner G, Morgan LC, Thieda P, et al. Drug Class Review: Second Generation Antidepressants: Final Report Update 4. Portland, OR: Oregon Health & Science University; 2008. JAMA, September 26, 2012—Vol 308, No. 12 1253