Data science
Data science
2019-20
UNIT IV
Machine Learning: Modeling, Overfitting and Underfitting, Correctness, The Bias-Variance
Tradeoff, Feature Extraction and Selection, k-Nearest Neighbors, Naive Bayes, Simple Linear
Regression, Multiple Regression, Digression, Logistic Regression
UNIT V
Clustering: The Idea, The Model, Choosing k, Bottom-Up Hierarchical Clustering.
Recommender Systems: Manual Curation, Recommending What’s Popular, User-Based
Collaborative Filtering, Item-Based Collaborative Filtering, Matrix Factorization
Data Ethics, Building Bad Data Products, Trading Off Accuracy and Fairness, Collaboration,
Interpretability, Recommendations, Biased Data, Data Protection
IPython, Mathematics, NumPy, pandas, scikit-learn, Visualization, R
Textbooks:
1) Joel Grus, “Data Science From Scratch”, OReilly.
2) Allen B.Downey, “Think Stats”, OReilly.
Reference Books:
1) Doing Data Science: Straight Talk From The Frontline, 1st Edition, Cathy O’Neil and
Rachel Schutt, O’Reilly, 2013
2) Mining of Massive Datasets, 2nd Edition, Jure Leskovek, Anand Rajaraman and Jeffrey
Ullman, v2.1, Cambridge University Press, 2014
3) “The Art of Data Science”, 1st Edition, Roger D. Peng and Elizabeth matsui, Lean
Publications, 2015
4) “Algorithms for Data Science”, 1st Edition, Steele, Brian, Chandler, John, Reddy,
Swarna, springers Publications, 2016
e-Resources:
1) https://github.com/joelgrus/data-science-from-scratch
2) https://github.com/donnemartin/data-science-ipython-notebooks
3) https://github.com/academic/awesome-datascience