Power Analysis for DSL Regression
power_dsl.Rd
Power Analysis for DSL Regression
Usage
power_dsl(
labeled_size = NULL,
dsl_out = NULL,
model = "lm",
formula,
predicted_var,
prediction = NULL,
data,
cluster = NULL,
labeled = NULL,
sample_prob = NULL,
index = NULL,
fixed_effect = "oneway",
sl_method = "grf",
feature = NULL,
family = "gaussian",
cross_fit = 5,
sample_split = 10,
seed = 1234
)
Arguments
- labeled_size
A vector indicating the number of labeled documents for which the function predicts standard errors.
- dsl_out
An output from function
dsl
. When this is supplied, the remaining arguments are overwritten by arguments specified in the output ofdsl
. When this isNULL
, the function will use arguments specified below.- model
A regression model
dsl
currently supportslm
(linear regression),logit
(logistic regression), andfelm
(fixed-effects regression).- formula
A formula used in the specified regression model.
- predicted_var
A vector of column names in the data that correspond to variables that need to be predicted.
- prediction
A vector of column names in the data that correspond to predictions of
predicted_var
.- data
A data frame. The class should be
data.frame
.- cluster
A column name in the data that indicates the level at which cluster standard errors are calculated. Default is
NULL
.- labeled
(Optional) A column name in the data that indicates which observation is labeled. It should be a vector of 1 (labeled) and 0 (non-labeled). When
NULL
, the function assumes that observations that haveNA
inpredicted_var
are non-labeled and other observations are labeled.- sample_prob
(Optional) A column name in the data that correspond to the sampling probability for labeling a particular observation. When
NULL
, the function assumes random sampling with equal probabilities.- index
(Used when
model = "felm"
) A vector of column names specifying fixed effects. Whenfixed_effect = oneway
, it has one element. Whenfixed_effect = twoways
, it has two elements, e.g.,index = c("state", "year")
.- fixed_effect
(Used when
model = "felm"
) A type of fixed effects regression you run.oneway
(one-way fixed effects) ortwoways
(two-way fixed effects).- sl_method
A name of a supervised machine learning model used internally to predict
predicted_var
by fine-tuningprediction
or using predictors (specified infeature
) whenprediction = NULL
. Users can runavailable_method()
to see available supervised machine learning methods. Default isgrf
(generalized random forest).- feature
A vector of column names in the data that correspond to predictors used to fit a supervised machine learning (specified in
sl_method
).- family
(Used when making predictions) A variable type of
predicted_var
. Default isgaussian
.- cross_fit
The fold of cross-fitting. Default is
5
.- sample_split
The number of sampling-splitting. Default is
10
.- seed
Numeric
seed
used internally. Default is1234
.
Value
dsl
returns an object of dsl
class.
predicted_se
: Predicted standard errors for coefficients. The first row shows the current standard errors for coefficients. The remaining rows show predicted standard errors.labeled_size
: A vector indicating the number of labeled documents for which the function predicts standard errors.dsl_out
: An output from functiondsl
.