Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Introduction to Meta-Analysis
Introduction to Meta-Analysis
Introduction to Meta-Analysis
Ebook720 pages10 hours

Introduction to Meta-Analysis

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book provides a clear and thorough introduction to meta-analysis, the process of synthesizing data from a series of separate studies. Meta-analysis has become a critically important tool in fields as diverse as medicine, pharmacology, epidemiology, education, psychology, business, and ecology. Introduction to Meta-Analysis:
  • Outlines the role of meta-analysis in the research process
  • Shows how to compute effects sizes and treatment effects
  • Explains the fixed-effect and random-effects models for synthesizing data
  • Demonstrates how to assess and interpret variation in effect size across studies
  • Clarifies concepts using text and figures, followed by formulas and examples
  • Explains how to avoid common mistakes in meta-analysis
  • Discusses controversies in meta-analysis
  • Features a web site with additional material and exercises

A superb combination of lucid prose and informative graphics, written by four of the world’s leading experts on all aspects of meta-analysis. Borenstein, Hedges, Higgins, and Rothstein provide a refreshing departure from cookbook approaches with their clear explanations of the what and why of meta-analysis. The book is ideal as a course textbook or for self-study. My students, who used pre-publication versions of some of the chapters, raved about the clarity of the explanations and examples. David Rindskopf, Distinguished Professor of Educational Psychology, City University of New York, Graduate School and University Center, & Editor of the Journal of Educational and Behavioral Statistics.

The approach taken by Introduction to Meta-analysis is intended to be primarily conceptual, and it is amazingly successful at achieving that goal. The reader can comfortably skip the formulas and still understand their application and underlying motivation. For the more statistically sophisticated reader, the relevant formulas and worked examples provide a superb practical guide to performing a meta-analysis. The book provides an eclectic mix of examples from education, social science, biomedical studies, and even ecology. For anyone considering leading a course in meta-analysis, or pursuing self-directed study, Introduction to Meta-analysis would be a clear first choice. Jesse A. Berlin, ScD 

Introduction to Meta-Analysis is an excellent resource for novices and experts alike. The book provides a clear and comprehensive presentation of all basic and most advanced approaches to meta-analysis. This book will be referenced for decades. Michael A. McDaniel, Professor of Human Resources and Organizational Behavior, Virginia Commonwealth University

LanguageEnglish
PublisherWiley
Release dateAug 24, 2011
ISBN9781119964377
Introduction to Meta-Analysis

Related to Introduction to Meta-Analysis

Related ebooks

Medical For You

View More

Related articles

Reviews for Introduction to Meta-Analysis

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Introduction to Meta-Analysis - Michael Borenstein

    CHAPTER 1

    How a Meta-Analysis Works

    Introduction

    Individual studies

    The summary effect

    Heterogeneity of effect sizes

    INTRODUCTION

    Figure 1.1 illustrates a meta-analysis that shows the impact of high dose versus standard dose of statins in preventing death and myocardial infarction (MI). This analysis is adapted from one reported by Cannon et al. and published in the Journal of the American College of Cardiology (2006).

    Our goal in presenting this here is to introduce the various elements in a meta-analysis (the effect size for each study, the weight assigned to each effect size, the estimate of the summary effect, and so on) and show where each fits into the larger scheme. In the chapters that follow, each of these elements will be explored in detail.

    INDIVIDUAL STUDIES

    The first four rows on this plot represent the four studies. For each, the study name is shown at left, followed by the effect size, the relative weight assigned to the study for computing the summary effect, and the p-value. The effect size and weight are also shown schematically.

    Effect size

    The effect size, a value which reflects the magnitude of the treatment effect or (more generally) the strength of a relationship between two variables, is the unit of currency in a meta-analysis. We compute the effect size for each study, and then work with the effect sizes to assess the consistency of the effect across studies and to compute a summary effect.

    Figure 1.1 High-dose versus standard-dose of statins (adapted from Cannon et al., 2006).

    figure

    The effect size could represent the impact of an intervention, such as the impact of medical treatment on risk of infection, the impact of a teaching method on test scores, or the impact of a new protocol on the number of salmon successfully returning upstream. The effect size is not limited to the impact of interventions, but could represent any relationship between two variables, such as the difference in test scores for males versus females, the difference in cancer rates for persons exposed or not exposed to second-hand smoke, or the difference in cardiac events for persons with two distinct personality types. In fact, what we generally call an effect size could refer simply to the estimate of a single value, such as the prevalence of Lyme disease.

    In this example the effect size is the risk ratio. A risk ratio of 1.0 would mean that the risk of death or MI was the same in both groups, while a risk ratio less than 1.0 would mean that the risk was lower in the high-dose group, and a risk ratio greater than 1.0 would mean that the risk was lower in the standard-dose group.

    The effect size for each study is represented by a square, with the location of the square representing both the direction and magnitude of the effect. Here, the effect size for each study falls to the left of center (indicating a benefit for the high-dose group). The effect is strongest (most distant from the center) in the TNT study and weakest in the Ideal study.

    Note. For measures of effect size based on ratios (as in this example) a ratio of 1.0 represents no difference between groups. For measures of effect based on differences (such as mean difference), a difference of 0.0 represents no difference between groups.

    Precision

    In the schematic, the effect size for each study is bounded by a confidence interval, reflecting the precision with which the effect size has been estimated in that study. The confidence interval for the last study (Ideal) is noticeably narrower than that for the first study (Prove-it), reflecting the fact that the Ideal study has greater precision. The meaning of precision and the factors that affect precision are discussed in Chapter 8.

    Study weights

    The solid squares that are used to depict each of the studies vary in size, with the size of each square reflecting the weight that is assigned to the corresponding study when we compute the summary effect. The TNT and Ideal studies are assigned relatively high weights, while somewhat less weight is assigned to the A to Z study and still less to the Prove-it study.

    As one would expect, there is a relationship between a study’s precision and that study’s weight in the analysis. Studies with relatively good precision (TNT and Ideal) are assigned more weight while studies with relatively poor precision (Prove-it) are assigned less weight. Since precision is driven primarily by sample size, we can think of the studies as being weighted by sample size.

    However, while precision is one of the elements used to assign weights, there are often other elements as well. In Part 3 we discuss different assumptions that one can make about the distribution of effect sizes across studies, and how these affect the weight assigned to each study.

    p-values

    For each study we show the p-value for a test of the null. There is a necessary correspondence between the p-value and the confidence interval, such that the p-value will fall under 0.05 if and only if the 95% confidence interval does not include the null value. Therefore, by scanning the confidence intervals we can easily identify the statistically significant studies. The role of p-values in the analysis, as well as the relationship between p-values and effect size, is discussed in Chapter 32.

    In this example, for three of the four studies the confidence interval crosses the null, and the p-value is greater than 0.05. In one (the TNT study) the confidence interval does not cross the null, and the p-value falls under 0.05.

    THE SUMMARY EFFECT

    One goal of the synthesis is usually to compute a summary effect. Typically we report the effect size itself, as well as a measure of precision and a p-value.

    Effect size

    On the plot the summary effect is shown on the bottom line. In this example the summary risk ratio is 0.85, indicating that the risk of death (or MI) was 15% lower for patients assigned to the high dose than for patients assigned to standard dose.

    The summary effect is nothing more than the weighted mean of the individual effects. However, the mechanism used to assign the weights (and therefore the meaning of the summary effect) depends on our assumptions about the distribution of effect sizes from which the studies were sampled. Under the fixed-effect model, we assume that all studies in the analysis share the same true effect size, and the summary effect is our estimate of this common effect size. Under the random-effects model, we assume that the true effect size varies from study to study, and the summary effect is our estimate of the mean of the distribution of effect sizes. This is discussed in Part 3.

    Precision

    The summary effect is represented by a diamond. The location of the diamond represents the effect size while its width reflects the precision of the estimate. In this example the diamond is centered at 0.85, and extends from 0.79 to 0.92, meaning that the actual impact of the high dose (as compared to the standard) likely falls somewhere in that range.

    The precision addresses the accuracy of the summary effect as an estimate of the true effect. However, as discussed in Part 3 the exact meaning of the precision depends on the statistical model.

    p-value

    The p-value for the summary effect is 0.00003. This p-value reflects both the magnitude of the summary effect size and also the volume of information on which the estimate is based. Note that the p-value for the summary effect is substantially more compelling than that of any single study. Indeed, only one of the four studies had a p-value under 0.05. The relationship between p-values and effect sizes is discussed in Chapter 32.

    HETEROGENEITY OF EFFECT SIZES

    In this example the treatment effect is consistent across all studies (by a criterion explained in Chapter 16), but such is not always the case. A key theme in this volume is the importance of assessing the dispersion of effect sizes from study to study, and then taking this into account when interpreting the data. If the effect size is consistent, then we will usually focus on the summary effect, and note that this effect is robust across the domain of studies included in the analysis. If the effect size varies modestly, then we might still report the summary effect but note that the true effect in any given study could be somewhat lower or higher than this value. If the effect varies substantially from one study to the next, our attention will shift from the summary effect to the dispersion itself.

    Because the dispersion in observed effects is partly spurious (it includes both real difference in effects and also random error), before trying to interpret the variation in effects we need to determine what part (if any) of the observed variation is real. In Part 4 we show how to partition the observed variance into the part due to error and the part that represents variation in true effect sizes, and then how to use this information in various ways.

    In this example our goal was to estimate the summary effect in a single population. In some cases, however, we will want to compare the effect size for one subgroup of studies versus another (say, for studies that used an elderly population versus those that used a relatively young population). In other cases we may want to assess the impact of putative moderators (or covariates) on the effect size (say, comparing the effect size in studies that used doses of 10, 20, 40, 80, 160 mg.). These kinds of analyses are also discussed in Part 4.

    SUMMARY POINTS

    To perform a meta-analysis we compute an effect size and variance for each study, and then compute a weighted mean of these effect sizes.

    To compute the weighted mean we generally assign more weight to the more precise studies, but the rules for assigning weights depend on our assumptions about the distribution of true effects.

    CHAPTER 2

    Why Perform a Meta-Analysis

    Introduction

    The streptokinase meta-analysis

    Statistical significance

    Clinical importance of the effect

    Consistency of effects

    INTRODUCTION

    Why perform a meta-analysis? What are the advantages of using statistical methods to synthesize data rather than taking the results that had been reported for each study and then having these collated and synthesized by an expert?

    In this chapter we start at the point where we have already selected the studies to be included in the review, and are planning the synthesis itself. We do not address the differences between systematic reviews and narrative reviews in the process of locating and selecting studies. These differences can be critically important, but (as always) our focus is on the data analysis rather than the full process of the review.

    The goal of a synthesis is to understand the results of any study in the context of all the other studies. First, we need to know whether or not the effect size is consistent across the body of data. If it is consistent, then we want to estimate the effect size as accurately as possible and to report that it is robust across the kinds of studies included in the synthesis. On the other hand, if it varies substantially from study to study, we want to quantify the extent of the variance and consider the implications.

    Meta-analysis is able to address these issues whereas the narrative review is not. We start with an example to show how meta-analysis and narrative review would approach the same question, and then use this example to highlight the key differences between the two.

    THE STREPTOKINASE META-ANALYSIS

    During the time period beginning in 1959 and ending in 1988 (a span of nearly 30 years) there were a total of 33 randomized trials performed to assess the ability of streptokinase to prevent death following a heart attack. Streptokinase, a so-called clot buster which is administered intravenously, was hypothesized to dissolve the clot causing the heart attack, and thus increase the likelihood of survival. The trials all followed similar protocols, with patients assigned at random to either treatment or placebo. The outcome, whether or not the patient died, was the same in all the studies.

    The trials varied substantially in size. The median sample size was slightly over 100 but there was one trial with a sample size in the range of 20 patients, and two large scale trials which enrolled some 12,000 and 17,000 patients, respectively. Of the 33 studies, six were statistically significant while the other 27 were not, leading to the perception that the studies yielded conflicting results.

    In 1992 Lau et al. published a meta-analysis that synthesized the results from the 33 studies. The presentation that follows is based on the Lau paper (though we use a risk ratio where Lau used an odds ratio).

    The forest plot (Figure 2.1) provides context for the analysis. An effect size to the left of center indicates that treated patients were more likely to survive, while an effect size to the right of center indicates that control patients were more likely to survive.

    Figure 2.1 Impact of streptokinase on mortality (adapted from Lau et al., 1992).

    figure

    The plot serves to highlight the following points.

    The effect sizes are reasonably consistent from study to study. Most fall in the range of 0.50 to 0.90, which suggests that it would be appropriate to compute a summary effect size.

    The summary effect is a risk ratio of 0.79 with a 95% confidence interval of 0.72 to 0.87 (that is, a 21% decrease in risk of death, with 95% confidence interval of 13% to 28%). The p-value for the summary effect is 0.0000008.

    The confidence interval that bounds each effect size indicates the precision in that study. If the interval excludes 1.0, the p-value is less than 0.05 and the study is statistically significant. Six of the studies were statistically significant while 27 were not.

    In sum, the treatment reduces the risk of death by some 21%. And, this effect was reasonably consistent across all studies in the analysis.

    Over the course of this volume we explain the statistical procedures that led to these conclusions. Our goal in the present chapter is simply to explain that meta-analysis does offer these mechanisms, whereas the narrative review does not. The key differences are as follows.

    STATISTICAL SIGNIFICANCE

    One of the first questions asked of a study is the statistical significance of the results. The narrative review has no mechanism for synthesizing the p-values from the different studies, and must deal with them as discrete pieces of data. In this example six of the studies were statistically significant while the other 27 were not, which led some to conclude that there was evidence against an effect, or that the results were inconsistent (see vote counting in Chapter 28). By contrast, the meta-analysis allows us to combine the effects and evaluate the statistical significance of the summary effect. The p-value for the summary effect is p = 0.0000008.

    While one might assume that 27 studies failed to reach statistical significance because they reported small effects, it is clear from the forest plot that this is not the case. In fact, the treatment effect in many of these studies was actually larger than the treatment effect in the six studies that were statistically significant. Rather, the reason that 82% of the studies were not statistically significant is that these studies had small sample sizes and low statistical power. In fact, as discussed in Chapter 29, most had power of less than 20%. By contrast, power for the meta-analysis exceeded 99.9% (see Chapter 29).

    As in this example, if the goal of a synthesis is to test the null hypothesis, then meta-analysis provides a mathematically rigorous mechanism for this purpose. However, meta-analysis also allows us to move beyond the question of statistical significance, and address questions that are more interesting and also more relevant.

    CLINICAL IMPORTANCE OF THE EFFECT

    Since the point of departure for a narrative review is usually the p-values reported by the various studies, the review will often focus on the question of whether or not the body of evidence allows us to reject the null hypothesis. There is no good mechanism for discussing the magnitude of the effect. By contrast, the meta-analytic approaches discussed in this volume allow us to compute an estimate of the effect size for each study, and these effect sizes fall at the core of the analysis.

    This is important because the effect size is what we care about. If a clinician or patient needs to make a decision about whether or not to employ a treatment, they want to know if the treatment reduces the risk of death by 5% or 10% or 20%, and this is the information carried by the effect size. Similarly, if we are thinking of implementing an intervention to increase the test scores of students, or to reduce the number of incarcerations among at-risk juveniles, or to increase the survival time for patients with pancreatic cancer, the question we ask is about the magnitude of the effect. The p-value can tell us only that the effect is not zero, and to report simply that the effect is not zero is to miss the point.

    CONSISTENCY OF EFFECTS

    When we are working with a collection of studies, it is critically important to ask whether or not the effect size is consistent across studies. The implications are quite different for a drug that consistently reduces the risk of death by 20%, as compared with a drug that reduces the risk of death by 20% on average, but that increases the risk by 20% in some populations while reducing it by 60% in others.

    The narrative review has no good mechanism for assessing the consistency of effects. The narrative review starts with p-values, and because the p-value is driven by the size of a study as well as the effect in that study, the fact that one study reported a p-value of 0.001 and another reported a p-value of 0.50 does not mean that the effect was larger in the former. The p-value of 0.001 could reflect a large effect size but it could also reflect a moderate or small effect in a large study (see the GISSI-1 study in Figure 2.1, for example). The p-value of 0.50 could reflect a small (or nil) effect size but could also reflect a large effect in a small study (see the Fletcher study, for example).

    This point is often missed in narrative reviews. Often, researchers interpret a nonsignificant result to mean that there is no effect. If some studies are statistically significant while others are not, the reviewers see the results as conflicting. This problem runs through many fields of research. To borrow a phrase from Cary Grant’s character in Arsenic and Old Lace, we might say that it practically gallops.

    Schmidt (1996) outlines the impact of this practice on research and policy. Suppose an idea is proposed that will improve test scores for African-American children. A number of studies are performed to test the intervention. The effect size is positive and consistent across studies but power is around 50%, and only around 50% of the studies yield statistically significant results. Researchers report that the evidence is ‘conflicting’ and launch a series of studies to determine why the intervention had a positive effect in some studies but not others (Is it the teacher’s attitude? Is it the students’ socioeconomic status?), entirely missing the point that the effect was actually consistent from one study to the next. No pattern can be found (since none exists). Eventually, researchers decide that the issue cannot be understood. A promising idea is lost, and a perception builds that research is not to be trusted. A similar point is made by Meehl (1978, 1990).

    Rossi (1997) gives an example from the field of memory research that shows what can happen to a field of research when reviewers work with discrete p-values. The issue of whether or not researchers could demonstrate the spontaneous recovery of previously extinguished associations had a bearing on a number of important learning theories, and some 40 studies on the topic were published between 1948 and 1969. Evidence of the effect (that is, statistically significant findings) was obtained in only about half the studies, which led most texts and reviews to conclude that the effect was ephemeral and ‘the issue was not so much resolved as it was abandoned’ (p. 179). Later, Rossi returned to these studies and found that the average effect size (d) was 0.39. If we assume that this is the population effect size, the mean power for these studies would have been slightly under 50%. On this basis we would expect about half the studies to yield a significant effect, which is exactly what happened.

    Even worse, when the significant study was performed in one type of sample and the nonsignificant study was performed in another type of sample, researchers would sometimes interpret this difference as meaning that the effect existed in one population but not the other. Abelson (1997) notes that if a treatment effect yields a p-value of 0.07 for wombats and 0.05 for dingbats we are likely to see a discussion explaining why the treatment is effective only in the latter group—completely missing the point that the treatment effect may have been virtually identical in the two. The treatment effect may have even been larger for the wombats if the sample size was smaller.

    By contrast, meta-analysis completely changes the landscape. First, we work with effect sizes (not p-values) to determine whether or not the effect size is consistent across studies. Additionally, we apply methods based on statistical theory to allow that some (or all) of the observed dispersion is due to random sampling variation rather than differences in the true effect sizes. Then, we apply formulas to partition the variance into random error versus real variance, to quantify the true differences among studies, and to consider the implications of this variance. In the Schmidt and the Rossi examples, a meta-analysis might have found that the effect size was consistent across studies, and that all of the observed variation in effects could be attributed to random sampling error.

    SUMMARY POINTS

    Since the narrative review is based on discrete reports from a series of studies, it provides no real mechanism for synthesizing the data. To borrow a phrase from Abelson, it involves doing arithmetic with words. And, when the words are based on p-values the words are the wrong words.

    By contrast, in a meta-analysis we introduce two fundamental changes. First, we work directly with the effect size from each study rather than the p-value. Second, we include all of the effects in a single statistical synthesis. This is critically important for the goal of computing (and testing) a summary effect. Meta-analysis also allows us to assess the dispersion of effects, and distinguish between real dispersion and spurious dispersion.

    PART 2

    Effect Size and Precision

    CHAPTER 3

    Overview

    Treatment effects and effect sizes

    Parameters and estimates

    Outline of effect size computations

    TREATMENT EFFECTS AND EFFECT SIZES

    The terms treatment effects and effect sizes are used in different ways by different people. Meta-analyses in medicine often refer to the effect size as a treatment effect, and this term is sometimes assumed to refer to odds ratios, risk ratios, or risk differences, which are common in meta-analyses that deal with medical interventions. Similarly, meta-analyses in the social sciences often refer to the effect size simply as an effect size and this term is sometimes assumed to refer to standardized mean differences or to correlations, which are common in social science meta-analyses.

    In fact, though, both the terms effect size and treatment effect can refer to any of these indices, and the distinction between these terms lies not in the index itself but rather in the nature of the study. The term effect size is appropriate when the index is used to quantify the relationship between two variables or a difference between two groups. By contrast, the term treatment effect is appropriate only for an index used to quantify the impact of a deliberate intervention. Thus, the difference between males and females could be called an effect size only, while the difference between treated and control groups could be called either an effect size or a treatment effect.

    While most meta-analyses focus on relationships between variables, some have the goal of estimating a mean or risk or rate in a single population. For example, a meta-analysis might be used to combine several estimates for the prevalence of Lyme disease in Wabash or the mean SAT score for students in Utah. In these cases the index is clearly not a treatment effect, and is also not an effect size, since effect implies a relationship. Rather, the parameter being estimated could be called simply a single group summary.

    Note, however, that the classification of an index as an effect size and/or a treatment effect (or simply a single group summary) has no bearing on the computations. In the meta-analysis itself we have simply a series of values and their variances, and the same mathematical formulas apply. In this volume we generally use the term effect size, but we use it in a generic sense, to include also treatment effects, single group summaries, or even a generic statistic.

    How to choose an effect size

    Three major considerations should drive the choice of an effect size index. The first is that the effect sizes from the different studies should be comparable to one another in the sense that they measure (at least approximately) the same thing. That is, the effect size should not depend on aspects of study design that may vary from study to study (such as sample size or whether covariates are used). The second is that estimates of the effect size should be computable from the information that is likely to be reported in published research reports. That is, it should not require the re-analysis of the raw data (unless these are known to be available). The third is that the effect size should have good technical properties. For example, its sampling distribution should be known so that variances and confidence intervals can be computed.

    Additionally, the effect size should be substantively interpretable. This means that researchers in the substantive area of the work represented in the synthesis should find the effect size meaningful. If the effect size is not inherently meaningful, it is usually possible to transform the effect size to another metric for presentation. For example, the analyses may be performed using the log risk ratio but then transformed to a risk ratio (or even to illustrative risks) for presentation.

    In practice, the kind of data used in the primary studies will usually lead to a pool of two or three effect sizes that meet the criteria outlined above, which makes the process of selecting an effect size relatively straightforward. If the summary data reported by the primary study are based on means and standard deviations in two groups, the appropriate effect size will usually be either the raw difference in means, the standardized difference in means, or the response ratio. If the summary data are based on a binary outcome such as events and non-events in two groups the appropriate effect size will usually be the risk ratio, the odds ratio, or the risk difference. If the primary study reports a correlation between two variables, then the correlation coefficient itself may serve as the effect size.

    PARAMETERS AND ESTIMATES

    Throughout this volume we make the distinction between an underlying effect size parameter (denoted by the Greek letter θ) and the sample estimate of that parameter (denoted by Y).

    If a study had an infinitely large sample size then it would yield an effect size Y that was identical to the population parameter θ. In fact, though, sample sizes are finite and so the effect size estimate Y always differs from θ by some amount. The value of Y will vary from sample to sample, and the distribution of these values is the sampling distribution of Y. Statistical theory allows us to estimate the sampling distribution of effect size estimates, and hence their standard

    Enjoying the preview?
    Page 1 of 1