Skip to contents

Stratification for Synthetic Purposive Sampling

Usage

stratify_sps(X, num_site = NULL, condition = NULL)

Arguments

X

Site-level variables for the target population of sites. Row names should be names of sites. X cannot contain missing data.

num_site

A list of two elements, e.g., list("at least", 1). This argument specifies the number of sites that should satisfy condition specified below. The first element should be either at least or at most. The second element is integer. For example, list("at least", 1) means that we stratify SPS such that we select *at least 1* site that satisfies condition (specified below).

condition

A list of three elements, e.g., list("GDP", "larger than or equal to", 1). This argument specifies conditions for stratification. The first element should be a name of a site-level variable. The second element should be either larger than or equal to, smaller than or equal to, or between. The third element is a vector of length 1 or 2. When the second element is between, the third element should be a vector of two values. For example, list("GDP", "larger than or equal to", 1) means that we stratify SPS such that we select num_site sites that have *GDP larger than or equal to 1*.

Value

stratify_sps returns an object of stratify_sps class, which we supply to sps().

  • C: A matrix on the left-hand side of linear constraints. The number of columns is the number of sites in the target population (=nrow(X)) and the number of rows is the number of constraints.

  • c0: A vector on the right-hand side of linear constraints. The length is the number of constraints.

References

Egami and Lee. (2023+). Designing Multi-Context Studies for External Validity: Site Selection via Synthetic Purposive Sampling. Available at https://naokiegami.com/paper/sps.pdf.