Install missing packages if needed. In the terminal, run
python example_use.py
Results will be printed to the terminal.
mean_eps.py: Experiments varying bias.
mean_n_obs.py: Experiments varying the number of observational data.
mean_n_exp.py: Experiments varying the number of experimental data.
Run
python mean_eps.py
or
python mean_n_obs.py
or
python mean_n_exp.py
Results are saved as JSON files (for data) and PDF files (for figures). Detailed usage see python scripts. The current scripts may take a few minutes each.
linear_eps.py: Experiments varying bias.
linear_n_obs.py: Experiments varying the number of observational data.
Choice 1: Directly run
python linear_eps.py
or
python linear_n_obs.py
Choise 2: Use a bash script and specify --cpus-per-task for parallel computing.
Results are saved as JSON files (for data) and PDF files (for figures). Detailed usage see python scripts.
Step 1. Create a \data folder. Download the .txt files of NSW Data Files (Dehejia-Wahha Sample) and PSID and CPS Data Files from this link and put them into the \data folder.
To check, these would include 8 .txt files:
-
NSW controls (260 observations):
nswre74_control.txt -
NSW treated (185 observations):
nswre74_treated.txt -
PSID controls (2490 observations):
psid_controls.txt -
PSID-2 controls (253 observations):
psid_controls.txt -
PSID-3 controls (128 observations):
psid_controls.txt -
CPS controls (15,992 observations):
cps_controls.txt -
CPS-2 controls (2,369 observations):
cps2_controls.txt -
CPS-3 controls (429 observations):
cps3_controls.txt
Step 2. We use a data file lalonde.csv, which is generated by the first code block of lalonde_baseline.Rmd from the downloaded .txt files. For users who are not familiar with R, one can use python read_lalonde_data.py to generate the data file instead.
Linear model baselines (excluding our method):
lalonde_baseline.Rmd: Estimation and bootstrap for linear model baselines. Full configurations.
read_lalonde_data.R: Script to read data, sourced in lalonde_baseline.Rmd.
The R Markdown (Rmd) script can be run in RStudio.
Our method:
lalonde_cv.py: Run our method on the LaLonde dataset (linear setting). Full configurations.
lalonde_cv_bootstrap.py: Bootstrap our method on the LaLonde dataset (linear setting). Full configurations.
lalonde_synthetic_linear.py: Experiments on synthetic data based on LaLonde dataset (linear setting). Single configuration.
read_lalonde_data.py: Python alternative of read_lalonde_data.R to generate lalonde.csv.
For the intro figures:
lalonde_intro_mean.py: For the intro figure, run our method on the LaLonde dataset (no-covariate setting). Single configuration.
lalonde_intro_linear.py: For the intro figure, run our method on the LaLonde dataset (linear setting). Single configuration.
To run the python scripts,
Choice 1: Directly run
python lalonde_cv.py
or
python lalonde_cv_bootstrap.py
or
python lalonde_synthetic_linear.py
or
python lalonde_intro_mean.py
or
python lalonde_intro_linear.py
Results are saved as JSON files (for data) and TXT files (for tables or texts).
Choise 2: For scripts with full configurations, use a bash script and specify --cpus-per-task for parallel computing. For single configuration, use arguments specified in the script.