Design-based Supervised Learning • dsl

dsl: Design-based Supervised Learning

Overview

R package dsl implements design-based supervised learning (DSL) proposed in Egami, Hinck, Stewart, and Wei (2024), which generalizes and extends the first proposal of DSL in Egami, Hinck, Stewart, and Wei (2023).

DSL is a general estimation framework for using predicted variables in statistical analyses. The package is especially useful for researchers trying to use large language models (LLMs) to annotate a large number of documents they analyze subsequently. DSL allows users to obtain statistically valid estimates and standard errors, even when LLM annotations contain arbitrary non-random prediction errors and biases.

To learn how to use the package, please start with Get Started Page.

Installation Instructions

You can install the most recent development version using the devtools package. First you have to install devtools using the following code. Note that you only have to do this once:

if(!require(devtools)) install.packages("devtools")

Then, load devtools and use the function install_github() to install dsl:

library(devtools)
install_github("naoki-egami/dsl", dependencies = TRUE)

Information

Authors:

Reference:

Egami, Hinck, Stewart, and Wei. (2024). “Using Large Language Model Annotations for the Social Sciences: A General Framework of Using Predicted Variables in Downstream Analyses.”
Egami, Hinck, Stewart, and Wei. (2023). “Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models,” Advances in Neural Information Processing Systems (NeurIPS).
Rister Portinari Maranca, Chung, Hinck, Wolsky, Egami, and Stewart. (2025). “Correcting the Measurement Errors of AI-Assisted Labeling in Image Analysis Using Design-Based Supervised Learning,” Sociological Methods and Research.