Practical Guides
practical_guides.Rmd
On this page, we provide the practical guidance to
spsR
package.
Step by Step Instruction:
We summarize the implementation details and recommendations.Common Questions and Concerns:
We answer common questions (e.g, concerns of unmeasured variables).-
Prepare Site-Level Variables:
We offer several data sets and functions to ease data collection -
Useful Information for Multi-Country Survey Experiments:
We offer useful information about implementing multi-country survey experiments.
1. Step by Step Instruction
Here, we summarize step by step instructions about how to use
spsR
package to (1) select study sites and (2) estimate the
average-site average treatment effect (the average-site ATE).
1.1 Site Selection via Synthetic Purposive Sampling
How can we select study sites for external validity?
Step 1: Define the target population of sites
Users first specify a target population of sites from which they sample sites. This defines the target against which external validity of a given substantive theory is evaluated. This step is equivalent to clarifying the eligibility criteria or the scope condition of your theory, e.g., if a theory is developed to explain advanced democracies in Europe, the target population of sites should include advanced democracies in Europe but should not include other countries, such as Japan or the U.S. In our example on the Get Started Page, we chose 15 European countries as the target population of sites.Step 2: Choose site-level variables that you want to diversify
It is recommended to choose contextual moderators that predict differences in the ATEs across sites. This step is what researchers are already doing implicitly when diversifying one or two variables by hand in practice. With SPS, users can incorporate all the covariates that are expected to moderate the ATEs across sites (instead of just one or two variables). For example, in our example on the Get Started Page, we chose 7 variables discussed in the original paper: GDP, size of migrant population, unemployment rates, general support for immigration, the proportion of females, the mean age, and the mean education.
We recommend against including too many irrelevant variables because SPS might decrease the balance on key variables to improve the balance on such irrelevant variables. When there is a concern about unobserved site-level variables, it is recommended to include some proxies of such unobserved variables, if feasible. By diversifying such observed proxies that are associated with the unobserved variables, SPS can often mitigate the potential influence of unobserved variables. Please also see Common Questions and Concerns below to learn more about how to handle unmeasured confounders.
More practically, to ease data collection, we offer the default county-level data set that contains most of common country-level variables social scientists often diversify, e.g., geography, polity score, GDP, population, education-level, and so on. Please see Prepare Country-Level Data.Step 3: Run (Stratified) Synthetic Purposive Sampling
After collecting site-level variables (see some recommendations in Prepare Site-Level Varaibles), users can run
function sps()
to optimally diversify the chosen site-level
variables. Please see the Get Started
Page for implementation.
It is recommended to stratify key site-level variables using
function stratify_sps()
. This stratification can allow
users to impose certain conditions to site selection. For example,
researchers can make sure to select at least one country from each
region of the world. Researchers can also incorporate other domain
knowledge and practical constraints into SPS. For example, users might
not be able to run experiments in certain countries that are not covered
by online survey firms. Users might want to include certain sites
because they provide a hard test for a given theory. Please see Stratified
SPS for different examples of stratification.
sps_plot()
and check how selected sites cover a wide range
of values in each site-level variable.
1.2 Analyzing Multi-Site Causal Studies
Once we complete studies (e.g., randomized experiments) in each selected study, how can we aggregate evidence for external validity analysis?
Step 1: Estimate the site-specific ATEs (internal validity analysis)
Users can first report internal validity analyses, focusing on causal estimates within selected sites. For this purpose, researchers can use whatever estimator they would like to use to estimate the ATEs in each site, e.g., difference-in-means or linear regression estimators. Because SPS has selected diverse sites, researchers can investigate how causal estimates vary across diverse contexts.
Step 2: Estimate the average-site ATE (external validity analysis)
Researchers can use function sps_estimator()
to estimate
the average-site ATE and test whether causal findings in selected sites
generalize to a broader population of sites.
In practice, researchers can use plot(sps_est)
to
visualize site-specific ATEs and the average-site ATE together. Here
sps_est
is the output from sps_estimator()
.
Please see the Get Started
Page for implementation.
Step 3: Cross-Validation to Assess the Potential Influence of Unmeasured Moderators
Finally, to empirically test the potential influence of unobserved
site-level variables, researchers can conduct site-level
cross-validation using function sps_cv()
. Please see the Get Started
Page for implementation.
In this function, we randomly choose half of the selected sites as if they were unobserved non-selected sites and predict the average ATE of those non-selected sites based on the remaining selected sites. By repeating the same procedure many times, researchers can empirically check how well the SPS estimator can credibly infer the ATEs in non-selected sites. When users pass this test, there is no evidence of significant bias from unobserved site-level variables, while researchers can never confirm it as in usual statistical tests.
When users fail this test, it implies large across-site heterogeneity, not explained by site-level variables. We view this as an opportunity for further research (rather than a failure of the given multi-site causal study) because it shows that there remains a large amount of across-site heterogeneity that existing theories cannot account for. In such scenarios, researchers can consider sequential learning: rather than viewing the current study as the final confirmation, researchers could suggest a new study by sequentially applying SPS.
2. Common Questions and Concerns
2.1 I am worried about unmeasured variables. What
should I do?
Researchers might be worried about the potential influence of unobserved moderators. We clarify several points about how to reason about unobserved moderators.
Compared to the current practice of purposive sampling, where researchers often only focus on one or two variables, this concern is mitigated in SPS because users can include any number of site-level moderators based on their domain and theoretical knowledge.
Diversifying observed site-level variables can often help diversify even unobserved site-level variables when many key site-level variables are correlated. In addition, if unobserved variables are independent of observed site-level variables, this does not lead to unobserved bias because the distribution of unobserved variables will be the same in selected and non-selected sites, if we select sites only based on observed variables. Therefore, SPS will make the potential influence of unobserved moderators bigger only when diversifying observed site-level variables somehow reduces the diversity of unobserved site-level variables, which requires users to believe some complicated nonlinear relationships between observed and unobserved site-level variables.
While diversifying observed site-level variables can often diversify unobserved site-level variables, it is always recommended to empirically assess the influence of unobserved moderators using site-level cross-validation using function
sps_cv()
(see the third step in Section 1.2 Analyzing Multi-Site Causal Studies).Finally, SPS focuses on the mean squared error and does not assume the absence of unobserved moderators, so its theoretical guarantees are valid even if there exist unobserved moderators. The SPS algorithm can reduce the mean squared error further if users can include more predictive moderators, but unobserved moderators do not invalidate the use of SPS. Please see Section 5.3 of Egami and Lee (2023+) for its theoretical foundation.
2.2 sps()
did not diversify some
variables well. What should I do?
sps()
did not diversify some
variables well. What should I do? Example 1: While you wanted to select diverse countries that have high, middle, and low GDP countries, the SPS algorithm did not select any country from the top 20 percentile of the GDP distribution and did not select high GDP country.
Example 2: While you wanted to select diverse countries from each sub-region of Europe, the SPS algorithm did not select any country from Southern Europe and over-sampled Northern Europe.
Without any instruction, sps()
might not perfectly
diversify all variables as you wish because the function has to achieve
diversity in all variables and make some trade-off. If users want to
achieve specific conditions, researchers can directly enforce such
conditions to sps()
using stratification via function
stratify_sps()
.
Solution to Example 1: Researchers can use
stratify_sps()
to make sure that sps()
will
select at least one country from the top 20 percentile of the GDP
distribution.
Solution to Example 2: Researchers can use
stratify_sps()
to make sure that sps()
will
select at least one country from Southern and Northern Europe,
respectively.
Please see Stratified SPS to learn how to use stratification to improve diversity and include practical constraints and domain knowledge into the SPS algorithm.
2.3 Isn’t Random Sampling Better?
If random sampling is feasible and the number of sites to be sampled is relatively large, it is recommended to rely on random sampling. However, in many social science settings, random sampling has been infeasible (see the literature review in Egami and Lee (2023+)). The proposed SPS is complementary to random sampling and is most useful when random sampling is infeasible.
2.4 Why does sps()
takes too long to
run??
sps()
takes too long to
run?? When sps()
is run on large data set, it may take a long
time to solve the optimization problem. To reduce the computational
time, users may use Gurobi
Optimizer, which solves optimization problems much faster, by
asserting GUROBI
under the solver
argument in
sps()
.
out <- sps(X = X_Imm, N_s = 6, solver = "GUROBI")
The use of Gurobi Optimizer requires a license registration and a software installation. For an instruction on how to register and download Gurobi Optimizer, please see Improve Computational Efficiency.
3. Prepare Site-Level Variables
In practice, users need to prepare site-level variables for the
population of sites before using functions in spsR
package.
We offer some practical guides to ease this data collection.
3.1. Country-Level Variables
For users who are considering multi-site experiments where sites are
countries, we offer the companion R package
spsRdata
, which contains
the default country-level data set (
sps_country_data
) that contains a wide range of the most commonly used country-level dataeasy-to-use functions to merge more variables from a variety of common data sets (e.g., V-Dem and World Bank)
3.2. State- or City-Level Variables in the US
If you are considering multi-site experiments where sites are states or cities in the US, the following data sources might be useful.
National Historical Geographic Information System (NHGIS): https://www.nhgis.org/
Census of Governments: https://www.census.gov/programs-surveys/cog.html
U.S. Bureau of Labor Statistics: https://www.bls.gov/
Federal Bureau of Investigation: https://cde.ucr.cjis.gov/LATEST/webapp/#/pages/home
3.3. How to Handle Missing Data
Once you collect site-level variables, you might find some missing
data. To handle missing data easily, we offer function
impute_var()
in R package spsRdata
to impute
missing data based on the state-of-the-art missing data imputation
methods. Please see Handle
Missing Data. After imputing missing data, users can run
sps()
.
4. Useful Information for Multi-Country Survey Experiments
In political science, one of the most popular types of multi-site experiments is a multi-country survey experiment. We offer some practical guides to ease the implementation of multi-country survey experiments.