Power Analysis for DSL Regression
power_dsl.RdPower Analysis for DSL Regression
Usage
power_dsl(
  labeled_size = NULL,
  dsl_out = NULL,
  model = "lm",
  formula,
  predicted_var,
  prediction = NULL,
  data,
  cluster = NULL,
  labeled = NULL,
  sample_prob = NULL,
  index = NULL,
  fixed_effect = "oneway",
  sl_method = "grf",
  feature = NULL,
  family = "gaussian",
  cross_fit = 5,
  sample_split = 10,
  seed = 1234
)Arguments
- labeled_size
- A vector indicating the number of labeled documents for which the function predicts standard errors. 
- dsl_out
- An output from function - dsl. When this is supplied, the remaining arguments are overwritten by arguments specified in the output of- dsl. When this is- NULL, the function will use arguments specified below.
- model
- A regression model - dslcurrently supports- lm(linear regression),- logit(logistic regression), and- felm(fixed-effects regression).
- formula
- A formula used in the specified regression model. 
- predicted_var
- A vector of column names in the data that correspond to variables that need to be predicted. 
- prediction
- A vector of column names in the data that correspond to predictions of - predicted_var.
- data
- A data frame. The class should be - data.frame.
- cluster
- A column name in the data that indicates the level at which cluster standard errors are calculated. Default is - NULL.
- labeled
- (Optional) A column name in the data that indicates which observation is labeled. It should be a vector of 1 (labeled) and 0 (non-labeled). When - NULL, the function assumes that observations that have- NAin- predicted_varare non-labeled and other observations are labeled.
- sample_prob
- (Optional) A column name in the data that correspond to the sampling probability for labeling a particular observation. When - NULL, the function assumes random sampling with equal probabilities.
- index
- (Used when - model = "felm") A vector of column names specifying fixed effects. When- fixed_effect = oneway, it has one element. When- fixed_effect = twoways, it has two elements, e.g.,- index = c("state", "year").
- fixed_effect
- (Used when - model = "felm") A type of fixed effects regression you run.- oneway(one-way fixed effects) or- twoways(two-way fixed effects).
- sl_method
- A name of a supervised machine learning model used internally to predict - predicted_varby fine-tuning- predictionor using predictors (specified in- feature) when- prediction = NULL. Users can run- available_method()to see available supervised machine learning methods. Default is- grf(generalized random forest).
- feature
- A vector of column names in the data that correspond to predictors used to fit a supervised machine learning (specified in - sl_method).
- family
- (Used when making predictions) A variable type of - predicted_var. Default is- gaussian.
- cross_fit
- The fold of cross-fitting. Default is - 5.
- sample_split
- The number of sampling-splitting. Default is - 10.
- seed
- Numeric - seedused internally. Default is- 1234.
Value
dsl returns an object of dsl class.
- predicted_se: Predicted standard errors for coefficients. The first row shows the current standard errors for coefficients. The remaining rows show predicted standard errors.
- labeled_size: A vector indicating the number of labeled documents for which the function predicts standard errors.
- dsl_out: An output from function- dsl.