Skip to content

xyang23/cross_validated_causal

Repository files navigation

Cross-Validated Causal Inference (CVCI)

Paper

ArXiv link

Project website

Example use

Install missing packages if needed. In the terminal, run

python example_use.py

Results will be printed to the terminal.

No-covariate setting

Code

mean_eps.py: Experiments varying bias.

mean_n_obs.py: Experiments varying the number of observational data.

mean_n_exp.py: Experiments varying the number of experimental data.

Usage

Run

python mean_eps.py

or

python mean_n_obs.py

or

python mean_n_exp.py

Results are saved as JSON files (for data) and PDF files (for figures). Detailed usage see python scripts. The current scripts may take a few minutes each.

Linear setting

Code

linear_eps.py: Experiments varying bias.

linear_n_obs.py: Experiments varying the number of observational data.

Usage

Choice 1: Directly run

python linear_eps.py

or

python linear_n_obs.py

Choise 2: Use a bash script and specify --cpus-per-task for parallel computing.

Results are saved as JSON files (for data) and PDF files (for figures). Detailed usage see python scripts.

Experiments on the LaLonde dataset

Data Preparation

Step 1. Create a \data folder. Download the .txt files of NSW Data Files (Dehejia-Wahha Sample) and PSID and CPS Data Files from this link and put them into the \data folder.

To check, these would include 8 .txt files:

  • NSW controls (260 observations): nswre74_control.txt

  • NSW treated (185 observations): nswre74_treated.txt

  • PSID controls (2490 observations): psid_controls.txt

  • PSID-2 controls (253 observations): psid_controls.txt

  • PSID-3 controls (128 observations): psid_controls.txt

  • CPS controls (15,992 observations): cps_controls.txt

  • CPS-2 controls (2,369 observations): cps2_controls.txt

  • CPS-3 controls (429 observations): cps3_controls.txt

Step 2. We use a data file lalonde.csv, which is generated by the first code block of lalonde_baseline.Rmd from the downloaded .txt files. For users who are not familiar with R, one can use python read_lalonde_data.py to generate the data file instead.

Code and Usage

Linear model baselines (excluding our method):

lalonde_baseline.Rmd: Estimation and bootstrap for linear model baselines. Full configurations.

read_lalonde_data.R: Script to read data, sourced in lalonde_baseline.Rmd.

The R Markdown (Rmd) script can be run in RStudio.

Our method:

lalonde_cv.py: Run our method on the LaLonde dataset (linear setting). Full configurations.

lalonde_cv_bootstrap.py: Bootstrap our method on the LaLonde dataset (linear setting). Full configurations.

lalonde_synthetic_linear.py: Experiments on synthetic data based on LaLonde dataset (linear setting). Single configuration.

read_lalonde_data.py: Python alternative of read_lalonde_data.R to generate lalonde.csv.

For the intro figures:

lalonde_intro_mean.py: For the intro figure, run our method on the LaLonde dataset (no-covariate setting). Single configuration.

lalonde_intro_linear.py: For the intro figure, run our method on the LaLonde dataset (linear setting). Single configuration.

To run the python scripts,

Choice 1: Directly run

python lalonde_cv.py

or

python lalonde_cv_bootstrap.py

or

python lalonde_synthetic_linear.py

or

python lalonde_intro_mean.py

or

python lalonde_intro_linear.py

Results are saved as JSON files (for data) and TXT files (for tables or texts).

Choise 2: For scripts with full configurations, use a bash script and specify --cpus-per-task for parallel computing. For single configuration, use arguments specified in the script.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published