Machine Learning Roadmap 2020
Machine Learning Roadmap 2020
(see
machine learning problems)
Question to ask
What privacy concerns are there?
Data collection Any continuous value where the difference between them
Numerical matters. For example, when selling houses, $107,850 is
more than $56,400.
Structured data: data which appears in tabulated format
(rows and columns style, like what you'd find in an Excel
Data which has order but the distance between values is
spreadsheet). Can contain different types of data, for
unknown. For example, questions such as, how would you
Types of data example, numerical, categorical, time series.
rate your health from 1-5? 1 being poor, 5 being healthy.
Ordinal
You can answer 1, 2, 3, 4 or 5 but the distance between
each value doesn't necessarily mean an answer of 5 is 5
times as good as an answer of 1.
Where are the outliers? How many of them are there? Are
they out by much (3+ standard deviations)? Why are they
there?
60 Machine Learning Problems Feature engineering: transform data into (potentially) Crossing and interaction features, in other words, Difference between two features, such as using
more meaningful representations by adding in domain combining two or more features. house_last_sold and current_date to get time_on_market.
knowledge.
Elements of AI Create an X_is_missing feature for wherever a column X
contains a missing value.
Google's Machine Learning Crash Course
If you're analyzing traffic sources from the web, you could
Google's AI education page add is_paid_traffic as feature for advertisements which
have been paid for.
Facebook's field guide to machine learning Indicator features: using other parts of the data to
indicate something potentially significant.
Does a particular sample satisfy more than 1 criteria?
Made with ML topics: a comprehensive collection of the Such as if you were analyzing car sales data, does the car
most up-to-date resources for learning about different have less than 100,000 KM's, is automatic and under 10-
machine learning topics. years old? Perhaps you know from experience these cars
Concepts and process are generally worth more. You could make a special
Wokera.ai: a series of questions by the deeplearning.ai feature called under_100km_auto_under_10yo.
team on what kinds of things you should know as a
machine learning engineer or data scientist. Dimensionality reduction: A common dimensionality
reduction method, PCA or Principal Component Analysis
Kaggle competitions: get data from the real world and Test your skills takes a larger number of dimensions (features) and uses
test your model building skills while competing with linear algebra to reduce them to less dimensions. For
others around the world. A really cool follow on from example, say you have 10 numerical features, you could
these projects would be to deploy your models in a user- run PCA to reduce it down to 3.
facing app using something like Streamlit.
Feature selection: selecting the most valuable features of
Feature importance (post modelling). Fit a model to a set
Software 2.0 your dataset to model. Potentially reducing overfitting and
of data, then inspect which features were most important
training time (less overall data and less redundant data to
to the results, remove the least important ones.
Learn Python in 1-video on YouTube (by freeCodeCamp) train on) and improving accuracy.
Give me it all in one place A helpful (and technical) paper to look at is "Learning
Kaggle learning centre from Imbalanced Data". You could use this as a starting
point to go onto the next thing.
DataCamp
Training set (usually 70-80% of data): Model learns on
Dataquest this.
End to end regression machine learning project by Validation set (typically 10-15% of data): Model
Aurelien Geron (author of Hands-on Machine Learning hyperparameters are tuned on this.
Book) Data splitting
Code Test set (usually 10-15%): Models final performance is
Example projects
Example Scikit-Learn data exploration and modelling by evaluated on this. If you've done it right, hopefully the
Daniel Formosso results on the test set give a good indication of how the
model should perform in the real world. Do not use this
Example binary classification project by Daniel Bourke dataset to tune the model.
deeplearning.ai curriculum (includes Andrew Ng's famous Draw a line that best fits data scattered on a graph.
machine learning course and much more) Linear Regression
Produces continuous variables (e.g. height in inches).
Matrix manipulation Mathematics Can be used for classification or regression. Asks the
XGBoost aglorithm
question, can a series of weak learners be turned into a
Neural networks AdaBoost/Gradient Boosting Machines (also just known strong learner? For example, train a series of weak
CatBoost algorithm
as boosting) learners whos job it is to improve each other (another
Probability and statistics example of an ensemble method, combining multiple
LightGBM algorithm
weaker models to create a better one).
Automate the boring stuff with Python Python
Can be used for classification or regression. Takes a Convolutional neural networks (typically used for
Python for Data Analysis by Wes McKinney (creator of series of inputs, manipulates the inputs with linear (dot computer vision).
Data analysis and manipulation Choosing an algorithm
pandas library) product between weights and inputs) and nonlinear
Data science Machine Learning Resources (how to learn,
functions (activation function). Do this a multiple of times Recurrent neural networks (typically used for sequence
Data Science Handbook by Jake VanderPlas
places to visit, etc) Neural networks (also called deep learning)
(at least once for each neuron in a model). The modelling).
importance of linear and non-linear functions (straight
Hands-on Machine Learning with Scikit-Learn, Machine Learning Process (steps in a machine
Machine Learning learning project)
and non-straight lines) means neural networks can use Transformer networks (can be used for vision and text,
TensorFlow & Keras by Aurelien Geron (2nd edition) this combination to estimate almost anything. starting to replace RNNs).
Deep Learning for Coders (with fastai and PyTorch) by K-Means clustering: choose 'k' number of clusters, each
Jeremy Howard and Sylvain Gugger (released July 2020) cluster receives a centre node (called a centroid) at
random and with each iteration the centre nodes attempt
Neural Networks and Deep Learning by Michael Nielsen: a Clustering
to move farther away from each other. Once the centroids
comprehensive overivew of the intuitions and functions stop moving, each sample is assigned a value equivalent
behind neural networks and deep learning. Reading this to the closest centroid.
in its entirety will give you a great foundation for deep
learning. Principal Component Analysis (PCA): reduce data from
Machine learning practices & code more dimensions to lower dimensions whilst attempting
Introduction to Machine Learning with Python by Andreas to preserve the variance.
Books
C. Muller and Sarah Guido
Autoencoders: Learn a lower dimensional encoding of
Mechanics of Machine Learning by Terrence Parr and data. For example, compress an image of 100 pixels into
Jeremy Howard Visualization and dimensionality reduction
50 pixels representing (roughly) the same information as
the 100 pixels.
Building Machine Learning Pipelines by Hannes Hapke &
Catherine Nelson (released August 2020) Unsupervised algorithms
t-Distributed Stochastic Neighbor Embedding (t-SNE):
good for visualizing high-demsionality data in a 2D or 3D
Interpretable Machine Learning Book by Christoph Molnar space.
(explores how to make machine interpretable for
structured data) Autoencoder: Use an autoencoder to reduce the
dimensionality of the inputs of a system and then try to
Seeing Theory: A visual introduction to the most useful recreate those inputs within some threshold. If the
Statistics and probability
statistical and probability concepts. recreations aren't able to match the threshold, there
could be some sort of outlier.
Math for Machine Learning Book by Marc Peter
Deisenroth, A. Aldo Faisal, and Cheng Soon Ong Ananomaly detection
One-class classification: train a model on only one-class
Mathematics (e.g. normal events of computer network traffic, which are
The Matrix Calculus You Need For Deep Learning by usually in abundance), if anything lays outside of this
Terrence Parr and Jeremy Howard class, it may be an anomaly. Algorithms for doing so
include, one-class K-Means, one-class SVM (support
A Cloud Guru vector machine), isolation forest and local outlier factor.
Google Cloud Training Resources All of your data exists in a big static warehouse and you
Cloud Services train a model on it. You may train a new model once per
AWS Training Resources month once you get new data. Learning may take a while
Batch learning
and isn't done often ("don't stuff this one up"). Runs in
Microsoft Azure Training Resources (some amazing paths production without learning (though can be retrained
here) later).
fastpages (blog from Jupyter Noteboks) Your data is constantly being updated and you constantly
Advice for creating your own blog (yes, you should have
Creating a blog train new models on it. Each learning step is usually fast
one) Online learning
GitHub Pages and cheap (as opposed to batch learning). Runs in
production and learns continuously.
Andrei Karpathy's (head of AI at Tesla) recipe for training
neural networks TensorFlow Hub
Rules and tidbits Helpful if you don't have much data or vast compute
Machine learning 101 (actually 43) rules (e.g. don't use Type of learning Take the knowledge one model has learned and use it resources. Use resources such as TensorFlow Hub or PyTorch Hub
machine learning if you don't have to)... machine learning Train model on data (3 steps: choose an algorithm, overfit Transfer learning with your own. Gives you the ability to leverage SOTA PyTorch Hub for broad model options and HuggingFace
rules the model, reduce overfitting with regularization) (state of the art) models for your own problems. transformers (NLP models) and Detectron2 (computer HuggingFace transformers
vision models) for specific models.
Practical Advice for Building Deep Neural Networks (a Detectron2
handful of hands-on tips for deep learning)
Also referred to as "human in the loop" learning. A human
arXiv-sanity: A tool for monitoring the latest research expert interacts with a model and provides updates to Example of how Nvidia uses active learning for their self-
from arXiv. Active learning
labels for samples which the model is most uncertain driving car models.
about.
Made with ML: community driven resource for projects
built with machine learning (you should put yours here!). Not really a form of learning, more combining algorithms
Bookmarks which have already learned in some way to get better
Papers with Code: most recent machine learning research Ensembling
results. For example, leveraging the "wisdom of the
with code resources. crowd".
sotabench: State of the art benchmark tracking (what is Happens when your model doesn't perform as well as
the current state of the art for a series of benchmarks). Underfitting you'd like on your data. Try training for longer or a more
advanced model.
Google Dataset Search: use the power of Google search to
find different datasets. Happens when your validation loss (how your model is
performing on the validation dataset, lower is better)
Kaggle Datasets: get open machine learning datasets as starts to increase. Or if you don't have a validation set,
well as examples of how to work through them. happens when the model performs far better on the
training set than on the test set (e.g. 99% accuracy on
Curated list of free datasets by Dataquest training set, 67% accuracy on the test set). Fix through
various regularization techniques.
Open Images: massive repository of open source images Datasets
and labels. L1 (lasso) and L2 (ridge) regularization: L1 regularization
sets uneeded feature coefficients to 0 (performs feature
Awesome data labelling: a curated list of resources for selection on which features are most essential and which
labelling data, including image annotation, audio aren't, useful for model explainability). L2 contrains a
annotation and more. models features (won't set them to 0).
The big bad NLP dataset: a collection of 545~ datasets for Dropout: randomly remove parts of your model so the
NLP. rest of it has to become better.
Learning how to learn on Coursera Early stopping: stop your model from training before the
validation loss starts to increase too much or more
6 Techniques Which Help Me Study Machine Learning Five generally, any other metric has stopped improving. Early
Days Per Week by Daniel Bourke (the one who wrote this Overfitting
stopping usually implemented in the form of a model
roadmap) callback.
Cool resources, how do I actually learn all of this?
My Self-Created AI Masters Degree by Daniel Bourke Regularization: a collection of techniques to Data augmentation: manipulate your dataset in artifical Image transformation in PyTorch
(mostly ML) prevent/reduce overfitting. ways to make it 'harder to learn'. For example, if you're
Create your own curriculum dealing with images, randomly rotate, skew, flip and Image augmentation in Keras/TensorFlow
How I learned to code by Jason Benn (ML + software adjust the height of your images. This makes your model
engineering + web development) have to learn similar patterns across different styles of EDA (easy data augementation): text augmentation with
the same image (harder). Note: since this can be compute Python (boost performance on text classification tasks)
intensive, it's a good idea to do this in memory. See a
Machine Learning Tools (tools you can use to get
50 functions like ImageDataGenerator in Keras or transforms TextAttack: framework for adversarial attacks, data
the job done) in torchvision. augmentation and modeling training in NLP.
Hyperparameter Tuning (run a bunch of experiments with [Very useful paper] A disciplined approach to neural
different model settings and see which works best) network hyperpameters by Leslie Smith. Covers learning
rate, batch size, momentum, weight decay and more.
Other hyperparameters you can tune Number of trees (decision tree algorithms)
Accuracy
Precision
Recall
Classification
f1
Confusion matrix
R^2 (r-squared)
Analysis/Evaluation
Which features contributed most to the model? Should
some be removed? Useful for model explainability, for
Feature importance
example, telling someone, "the number of bedrooms is
most important when predicting a house price".
Training/inference time/cost
How long does inference take? Is it suitable for
production?
Sagemaker