Estimating Regression using the DSL framework
dsl.RdEstimating Regression using the DSL framework
Usage
dsl(
model = "lm",
formula,
predicted_var,
prediction = NULL,
data,
cluster = NULL,
labeled = NULL,
sample_prob = NULL,
index = NULL,
fixed_effect = "oneway",
sl_method = "grf",
feature = NULL,
family = "gaussian",
cross_fit = 5,
sample_split = 10,
seed = 1234
)Arguments
- model
A regression model
dslcurrently supportslm(linear regression),logit(logistic regression), andfelm(fixed-effects regression).- formula
A formula used in the specified regression model.
- predicted_var
A vector of column names in the data that correspond to variables that need to be predicted.
- prediction
A vector of column names in the data that correspond to predictions of
predicted_var.- data
A data frame. The class should be
data.frame.- cluster
A column name in the data that indicates the level at which cluster standard errors are calculated. Default is
NULL.- labeled
(Optional) A column name in the data that indicates which observation is labeled. It should be a vector of 1 (labeled) and 0 (non-labeled). When
NULL, the function assumes that observations that haveNAinpredicted_varare non-labeled and other observations are labeled.- sample_prob
(Optional) A column name in the data that correspond to the sampling probability for labeling a particular observation. When
NULL, the function assumes random sampling with equal probabilities.- index
(Used when
model = "felm") A vector of column names specifying fixed effects. Whenfixed_effect = oneway, it has one element. Whenfixed_effect = twoways, it has two elements, e.g.,index = c("state", "year").- fixed_effect
(Used when
model = "felm") A type of fixed effects regression you run.oneway(one-way fixed effects) ortwoways(two-way fixed effects).- sl_method
A name of a supervised machine learning model used internally to predict
predicted_varby fine-tuningpredictionor using predictors (specified infeature) whenprediction = NULL. Users can runavailable_method()to see available supervised machine learning methods. Default isgrf(generalized random forest).- feature
A vector of column names in the data that correspond to predictors used to fit a supervised machine learning (specified in
sl_method).- family
(Used when making predictions) A variable type of
predicted_var. Default isgaussian.- cross_fit
The fold of cross-fitting. Default is
5.- sample_split
The number of sampling-splitting. Default is
10.- seed
Numeric
seedused internally. Default is1234.