In this section, we will discuss a number of meta-analytic techniques. I will demonstrate how to perform each in R as we go along. We will primarily be using the metafor package to perform these exercises.

## How can we synthesize experimental research findings?

The traditional form of conducting a research synthesis is performing a iterature review. However, there are now also a number of procedural and statistical techniques for synthesizing results of studies. In experiments, this is most commonly done by:

1.) Replicating experiments (i.e. conduct the same experiment holding treatment and outcomes constant) with different subjects, ideally drawn from the same population.

2.) Pooling the findings from these experiments into a single across-study treatment effect. The resulting increase in sample size will also result in an increase in precision.

### Why conduct meta-analysis?

From Gerber and Green: “The attraction of meta-analysis is that a series of small experiments may each be unable to speak to a hypothesis with precision, but when pooled together, these experiments may suggest a clear conclusion.”

Meta-analysis began in the 1980s as a way to synthesize educational and psychological research, but has since expanded, particularly in the medical and social sciences.

In medicine, the Cochrane Collaboration was started in 1993, and today contains thousands of systematic reviews of medical interventions. This kind of research synthesis is considered by many to be the gold standard for determining the effectiveness of different health care interventions.

Examples of meta-analysis in political science:

1. The Metaketa Initiative: “A collaborative research model aimed at improving the accumulation of knowledge from field experiments on topics where academic researchers and policy practitioners share substantive interests.” The Metaketa Initiative has completed or is in the process of conducting coordinated field experiments that are conducive to meta-analysis on the topics of: (1) Information and accountability, (2) taxation, (3) natural resource governmence, (4) community policing, and (5) women’s action committees and local services.
2. Costa, Mia. “How responsive are political elites? A meta-analysis of experiments on public officials.” Journal of Experimental Political Science 4.3 (2017): 241-254.
3. Doucouliagos, Hristos, and Mehmet Ali Ulubaşoğlu. “Democracy and economic growth: a meta‐analysis.” American Journal of Political Science 52.1 (2008): 61-83.
4. Dunning, Thad, et al. “Voter information campaigns and political accountability: Cumulative findings from a preregistered meta-analysis of coordinated trials.” Science Advances 5.7 (2019): eaaw2612.
5. Kalla, Joshua L., and David E. Broockman. “The minimal persuasive effects of campaign contact in general elections: Evidence from 49 field experiments.” American Political Science Review 112.1 (2018): 148-166.
6. Lau, Richard R., et al. “The effects of negative political advertisements: A meta-analytic assessment.” American Political Science Review 93.4 (1999): 851-875.
7. Many more that I’m missing due to my own selection bias.

### Dangers of meta-analysis

• Assumptions about subjects: Are subjects from different studies really participants in the same grand experiment? Meta-analysis is most convincing when (1) subjects are drawn randomly from the same population or (2) we believe there is very little treatment effect heterogenity.

• Selection bias and publication bias: Are we confident that we have found all of the relevant studies? What if the studies we are able to find are correlated with the size of the treatment effects? A common example of this is publication bias, or when null results are less likely to be published than studies with large and significant treatment effects. Aggregating results from published studies only can therefore exaggerate our estimate of the “true” treatment effect.

## Fixed versus random effects meta-analysis

• Under fixed effects models, each study is assumed to differ from the others only due to having just a sample of observations from the total population. Observed studies therefore yield different effect sizes only because of sampling error. Fixed effects model do not assume that the true effects are homogeneous (this is sometimes erroneosuly stated). In other words, fixed-effects models provide perfectly valid inferences under heterogeneity, as long we restrict our inferences about the average effect size across studies to the set of studies included in the meta-analysis.

• Under random effects models, we don’t assume that there is just one population effect size. Instead, a distribution of population effect sizes exists that is generated by a distribution of possible study realizations. In other words, observed outcomes in studies would differ from each other not just because of sampling error, but also because they reflect these true, underlying population differences. In contrast to the fixed-effects model, random/mixed-effects models provide an unconditional inference about a larger set of studies from which the $$k$$ studies included in the meta-analysis are assumed to be a random sample. We typically do not assume that this larger set consists only of studies that have actually been conducted, but instead envision a hypothetical population of studies that comprises studies that have been conducted, that could have been conducted, or that may be conducted in the future.

• Which should you use?
• It depends on the inferences and assumptions you want to make.
• Fixed effects model assumes studies are conducted identically except for using other research participants (that is, except for sampling error).
• Random effects allows inferences about what the results of the studies might have been had they been conducted with other participants and with changes in other study features such as in the treatment, setting, or outcome measures.
• Which of the above seems more plausible in the social sciences?
• Answer: Some argue that random effects models are preferred on conceptual grounds because they better reflect the inherent uncertainty in meta-analytic inference, and because they reduce to the fixed-effects model when the variance component is zero.

### Fixed effects meta-analysis

$y_i = \theta_i + e_i$ Where $$y_i$$ denotes the observed effect in the $$i$$th study, $$\theta_i$$ is the corresponding (unknown) true effect, $$e_i$$ is the sampling error, and $$e_i ∼ N(0,v_i)$$.

A fixed effect model tells us: how large is the average true effect in the set of $$k$$ studies included in the meta-analysis?

This is typically estimated using weighted least squares, where:

$\hat{\theta} = \frac{\sum_{i=1}^k \: w_i \: \: \theta_i \:}{\sum_{i=1}^k \: w_i \:}$

where $$w_i$$ represents the weight assigned to each study. These weights are typically equal to $$w_i = \frac{1}{v_i}$$, or the inverse of the within-study variance (the square of the standard error) of the estimated effect. Note that the variance is inversely proportional to within-study sample size (because $$v = \frac{\sum (x_i - \bar{x})^2}{n - 1}$$). Therefore, the larger the sample, the smaller the variance, so the more precise the estimate of effect size should be. Hence, larger weights are assigned to effect sizes from studies that have larger within-study sample sizes.

When all observed effect size indicators ($$\theta_i$$) estimate a single population parameter, as is hypothesized under a fixed effects model, then $$\hat{\theta}$$ is an unbiased estimate of the population parameter􏰅.

### Random effects meta-analysis

Same as above, but now $$\theta_i$$ is not fixed, but is itself random and has its own distribution:

$\theta_i = \mu + \mu_i$ where $$\mu_i ∼ N(0,\tau^2)$$. $$\mu_i$$ can be thought of as the between-studies variance. Therefore, the true effects are assumed to be normally distributed with mean $$\mu$$ and variance $$\tau^2$$.

$$\hat{\theta}$$ is still estimated as:

$\hat{\theta} = \frac{\sum_{i=1}^k \: w_i \: \: \theta_i \:}{\sum_{i=1}^k \: w_i \:}$

But now $$w_i = \frac{1}{v_i + \hat{\tau^2}}$$, where $$\hat{\tau^2}$$ is an estimate of $$\tau^2$$.

This implies that random effects models follow a two-stage process: (1) estimate the amount of heterogenity $$\tau^2$$ using one of a number of proposed estimators, and (2) estimate $$\mu$$ using WLS.

The true effects are therefore assumed to be normally distributed with mean $$\mu$$ and variance $$\tau^2$$. The goal is then to estimate $$\mu$$, the average true effect and $$\tau^2$$, the (total) amount of heterogeneity among the true effects. If $$\tau^2 = 0$$, then this implies homogeneity among the true effects (i.e., $$\theta_1 = . . . = \theta_k ≡ \theta$$), so that $$\mu = \theta$$ then denotes the true effect.

What does this last statement imply about the difference between fixed and random effects meta-analysis under treatment effect homogeneity?

• Answer: The two models will provide the same estimate since $$\mu = \theta$$.

Since the variation under random effects incorporates the same error as fixed effects plus an additional component, it cannot be less than the variation under the fixed effect model. As long as the between-studies variation is non-zero, the variance, standard error, and confidence interval will therefore always be larger under random effects.

## Publication bias

A robust literature explores how to detect and correct for publication bias in meta-analysis. Which method is the best in each circumstance remains a subject of active debate.

The following tests have been proposed to detect publication bias:

• p-curve: The p-curve is based on the premise that only “significant” results are typically published, and depicts the distribution of statistically significant p-values for a set of published studies. The shape of the p-curve is indicative of whether or not the results of a set of studies are derived from true effects, or from publication bias. If p-values are clustered around 0.05 (i.e. the p-curve is left skewed), this may be evidence of p-hacking, indicating that studies with p-values just below 0.05 are selectively reported. If the p-curve is right skewed and there are more low p-values (0.01), this is evidence of true effects.

P-curve example

• Examination of funnel plot assymetry: A funnel plot depicts the outcomes from each study on the x-axis and their corresponding standard errors on the y-axis. The chart is overlaid with an inverted triangular confidence interval region (i.e. the funnel), which should contain 95% of the studies if there is no bias or between study heterogeneity. If studies with insignificant results remain unpublished the funnel plot may be asymmetric.

Asymetric funnel plot

Symetric funnel plot

The following estimates have been proposed to correct for publication bias:

• p-curve estimation: p-curve can also use the degree of right-skew to estimate the average effect size corrected for publication bias.

• The trim and fill method: Trim-and-fill iteratively removes (i.e., trims) observations from one side of the funnel plot until a criterion for symmetry is met and then fills observations back into the funnel plot along with imputed observations reflected about the mean. Standard meta-analytic methods can then be applied to a data set including both observed and imputed studies.

Trim-and-fill

• PET, PEESE, and PET-PEESE:
• PET is a weighted-least-squares regression where effect size is regressed on its standard error: $$d_i = b_0 + b_1se_i + e_i$$, where $$b_0$$ and $$b_1$$ are the intercept and slope terms describing the linear relationship between the ith effect size estimate di and its associated standard error sei. The regression model is weighted by the inverse of the variance (i.e., the squared standard errors) of the effect size estimates.
• PEESE is the weighted-least-squares regression model where effect size is regressed on the square of the standard error: $$d_i = b_0 + b_1 se^2_i + e_i$$.
• Simulation studies suggest that PET outperforms PEESE when the true underlying effect is zero, whereas PEESE outperforms PET when the true underlying effect is non-zero. We therefore now typically use PET-PEESE. PET-PEESE considers the statistical significance of the PET estimate to decide whether PET or PEESE is taken as the final estimate. When the estimate from PET is statistically non-significant, the PET estimate is taken. When the estimate from PET is statistically significant, the PEESE estimate is used.
• PET-PEESE as an R function.
# PET-PEESE function
petpeese <- function(dataset) {
pet = lm(ate ~ se, weight = 1/var, data = dataset)
peese = lm(ate ~ var, weight = 1/var, data = dataset)
int_pet = pet$coefficients[1] se_pet = summary(pet)$coefficients[1,2]
int_peese = peese$coefficients[1] se_peese = summary(peese)$coefficients[1,2]
p_pet = summary(pet)\$coefficients[1,4]
petpeese_int = ifelse(p_pet > .05, int_pet, int_peese)
petpeese_se = ifelse(p_pet > .05, se_pet, se_peese)
return(c(petpeese_int, petpeese_se))
}

## Meta-regression (also referred to as moderator analysis)

Part of the heterogenity in a study may be due to the influence of “moderators.” For example, in a medical trial, results might depend on important differences in subjects (e.g. gender, a pre-existing condition, etc.). If these “covariates” are known, we can account for them in our meta-analysis.

In practice, we will then get a coefficient on this variable reflecting how much it is associated with the variation in results across studies. We can also recover an estimate of how much of the “total heterogenity” across studies we have “accounted for” by including this covariate.

Let’s look at an example in R.

## Meta-analysis as your replication project

• Upside: Higher likelihood of a publication or an impressive research synthesis in your dissertation.

• Downside: Much more work, and there may not have been enough studies conducted on a question you are interested in to perform a meta-analysis.