Machine Learning Social Science
Machine Learning Social Science
Instructor: TA:
Christopher Hare Sam Fuller
Assistant Professor PhD Student
Department of Political Science Department of Political Science
University of California, Davis University of California, Davis
[email protected] [email protected]
Course Description:
A growing number of social scientists are taking advantage of machine learning methods
to uncover hidden structure in their data, improve model predictive power, and gain a
better understanding of complex relationships between variables. This workshops covers
the mechanics underlying machine learning methods and discusses how these techniques
can be leveraged by social scientists to gain new insight from their data. Specifically, the
workshop will cover both supervised and unsupervised methods: decision trees, random
forests, boosting, support vector machines, neural networks, deep and adversarial learning,
ensemble learning, principal components analysis, factor analysis, and manifold learning/
multidimensional scaling. We will also discuss best practices in fitting and interpreting these
models, including cross-validation techniques, bootstrapping, and presenting output. The
workshop will demonstrate how these models can be estimated in R (and, time permitting,
Python).
Recommended Texts/Readings:
1. Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of
Statistical Learning: Data Mining, Inference, and Prediction, second edition. New
York: Springer.
2. Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. Cambridge,
MA: MIT Press.
3. Murphy, Kevin P. 2012. Machine Learning: A Probabilistic Perspective. Cambridge,
MA: MIT Press.
4. Kuhn, Max and Kjell Johnson. 2013. Applied Predictive Modeling. New York: Springer.
5. Berk, Richard A. 2016. Statistical Learning from a Regression Perspective, second
edition. New York: Springer.
6. Mullainathan, Susan and Jann Spiess. 2017. “Machine Learning: An Applied Econometric
Approach.” Journal of Economic Perspectives 31 (2): 87-106.
7. Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019. “Metalearners
for Estimating Heterogeneous Treatment Effects using Machine Learning.” Proceedings
of the National Academy of Sciences 116 (10): 4156-4165.
1
8. Grimmer, Justin, Solomon Messing, and Sean J. Westwood. 2017. “Estimating Heterogeneous
Treatment Effects and the Effects of Heterogeneous Treatments with Ensemble Methods.”
Political Analysis 25 (4): 413-434.
9. Sechidis, Konstantinos and Gavin Brown. 2018. “Simple Strategies for Semi-supervised
Feature Selection.” Machine Learning 107 (2): 357-395.
10. Szegedy, Christian et al. 2013. “Intriguing Properties of Neural Networks.” https:
//arxiv.org/abs/1312.6199.
11. R
(a) James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An
Introduction to Statistical Learning with Applications in R. New York: Springer.
12. Python
(a) VanderPlas, Jake. 2017. Python Data Science Handbook: Essential Tools for
Working with Data. Sebastapol, CA: O’Reilly.
Course materials: Course materials (including slides, code, and problem sets) will be
available on a private Dropbox folder.
Tentative Schedule:
This schedule is subject to change:
2
Graphical Models
Support Vector Machines and Relevance Vector Machines
k-nearest Neighbors
• Week Three: Tree-Based Methods and Learning Ensembles
Classification and Regression Trees
Ensemble Methods: Random Forests and Boosting
Assessing Variable Importance and Effects
Partial Dependency Plots and Model Visualization
Ensemble Modeling and Heterogeneous Treatment Effects
• Week Four: Unsupervised Learning
k-means Clustering
Principal Components Analysis
Manifold Learning and Multidimensional Scaling
Self-Organizing Maps
Deep Learning
Mixture Models and Latent Class Analysis
Novelty/Outlier Detection