0% found this document useful (0 votes)
6 views

Machine Learning Roadmap 2020

Uploaded by

eon62701
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Machine Learning Roadmap 2020

Uploaded by

eon62701
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

What kind of problem are we trying to solve?

(see
machine learning problems)

What data sources already exist?

Question to ask
What privacy concerns are there?

Is the data public?

Where should we store the data?

One thing or another (mutually exclusive). For example,


Nominal/categorical for car sales, color is a category. A car may be blue but
not white. Order does not matter.

Data collection Any continuous value where the difference between them
Numerical matters. For example, when selling houses, $107,850 is
more than $56,400.
Structured data: data which appears in tabulated format
(rows and columns style, like what you'd find in an Excel
Data which has order but the distance between values is
spreadsheet). Can contain different types of data, for
unknown. For example, questions such as, how would you
Types of data example, numerical, categorical, time series.
rate your health from 1-5? 1 being poor, 5 being healthy.
Ordinal
You can answer 1, 2, 3, 4 or 5 but the distance between
each value doesn't necessarily mean an answer of 5 is 5
times as good as an answer of 1.

Data across time. For example, the historical sale values


Time series
of Bulldozers from 2012-2018.

Unstructured data: Data with no rigid structure (images,


video, natural language text, speech).

What are the feature variables (input) and the target


variables (output)? For example, for predicting heart
disease, the feature variables may be a person's age,
weight, average heart rate and their level of physical
activity. And the target variable will be whether or not
they have heart disease (note: this example has been very
simplified).

Create a data dictionary for what each feature is. For


What kid of data do you have? Structured, unstructured, example if you've got a column of numbers called "hr",
categorical, numerical. how would someone else know that actually means Heart
Exploratory data analysis (EDA), learning about the data
Rate?
you're working with.

Are there missing values? Should you remove them or fill


them with feature imputation (see below)?

Where are the outliers? How many of them are there? Are
they out by much (3+ standard deviations)? Why are they
there?

Are there questions you could ask a domain expert about


the data? For example, would a heart disease physician
be able to shed some light on your heart disease dataset?

Single imputation: Fill with mean, median of column.

Multiple imputation: Model other missing values and fill


with what your model finds.

Feature imputation: filling missing values (a machine


KNN (k-nearest neighbours): Fill data with a value from
learning model can't learn on data that isn't there).
another example which is similar.

Many more, such as, random imputation, last observation


carried forward (for time series), moving window, most
frequent.

OneHotEncoding: Turn all unique values into lists of 0's


and 1's where the target value is 1 and the rest are 0's.
For example, with car colors green, red, blue, a green
car's color feature would be represented as [1, 0, 0] and a
red one would be [0, 1, 0].

LabelEncoder: Turn labels into distinct numerical values.


Feature encoding (turning values into numbers). A
For example, if your target variables are different
machine learning model requires all values to be
animals, such as dog, cat, bird, these could become 0, 1, 2
numerical.
respectively.

Embedding encoding: Learn a representation amongst all


the different data points. For example, a language model
is a representation of how different words relate to each
other. Embeddings are also becoming more widely
available for structured (tabular) data).

Feature scaling (also called normalization) shifts your


values so they always appear between 0 and 1. This is
done by substracting the min value and dividing by the
max minus the min. For example, 1 and 4 bathrooms
would become something like 0.2 and 0.8 respectively
Feature normalization (scaling) or standardization. When
(between 0 and 1 but the numerical relationship is still
your numerical variables are on different scales (e.g.
there).
number_of_bathrooms is between 1 and 5 and
size_of_land between 5,000 and 20,000 sq feet), some
Feature standardization standardizes all values so they
machine learning algorithms don't perform very well.
have a mean of 0 and unit variance. It happens by
Scaling and standardization help to fix this.
substracting the mean and dividing by the standard
diviation of that particular feature. Doing this means
values do not end up between 0 and 1 (their range is
uncapped). Standardization is more robust to outliers
than feature scaling.

Decompose, for example, turn a date (such as 2020-06-18


20:16:24) into hour_of_day, day_of_week, day_of_month,
is_holiday, etc.
Data preparation

For numerical variables, let's say age, you may want to


move it into buckets, such as, over_50 or under_50. Or,
21-30, 31-40, etc. This process is also known as binning
(putting data into different 'bins').
Discretization (turning larger groups into smaller groups).
Data preprocessing, preparing your data to be modelled.
For categorical variables such as car color, this may
mean combining colors such as 'light green', 'dark green'
and 'lime green' into a single 'green' color.

60 Machine Learning Problems Feature engineering: transform data into (potentially) Crossing and interaction features, in other words, Difference between two features, such as using
more meaningful representations by adding in domain combining two or more features. house_last_sold and current_date to get time_on_market.
knowledge.
Elements of AI Create an X_is_missing feature for wherever a column X
contains a missing value.
Google's Machine Learning Crash Course
If you're analyzing traffic sources from the web, you could
Google's AI education page add is_paid_traffic as feature for advertisements which
have been paid for.
Facebook's field guide to machine learning Indicator features: using other parts of the data to
indicate something potentially significant.
Does a particular sample satisfy more than 1 criteria?
Made with ML topics: a comprehensive collection of the Such as if you were analyzing car sales data, does the car
most up-to-date resources for learning about different have less than 100,000 KM's, is automatic and under 10-
machine learning topics. years old? Perhaps you know from experience these cars
Concepts and process are generally worth more. You could make a special
Wokera.ai: a series of questions by the deeplearning.ai feature called under_100km_auto_under_10yo.
team on what kinds of things you should know as a
machine learning engineer or data scientist. Dimensionality reduction: A common dimensionality
reduction method, PCA or Principal Component Analysis
Kaggle competitions: get data from the real world and Test your skills takes a larger number of dimensions (features) and uses
test your model building skills while competing with linear algebra to reduce them to less dimensions. For
others around the world. A really cool follow on from example, say you have 10 numerical features, you could
these projects would be to deploy your models in a user- run PCA to reduce it down to 3.
facing app using something like Streamlit.
Feature selection: selecting the most valuable features of
Feature importance (post modelling). Fit a model to a set
Software 2.0 your dataset to model. Potentially reducing overfitting and
of data, then inspect which features were most important
training time (less overall data and less redundant data to
to the results, remove the least important ones.
Learn Python in 1-video on YouTube (by freeCodeCamp) train on) and improving accuracy.

Wrapper methods such as genetic algorithms and


Zero to Mastery Python Course (full-stack Python), video recursive feature elimination involve creating large
based course. subsets of feature options and then removing the ones
Python
which don't matter. These have the potential to find a
Python like you mean it: learn Python for STEM great set of features but can also require a large amount
applications (data analysis, machine learning, numerical of computation time. TPot does this.
work, etc.)
Collect more data (if you can).
Anaconda/Miniconda/conda
Package management (take care all of your Python
packages) Use the scikit-learn-contrib imbalanced-learn package (a
Python virtualenv scikit-learn compatible Python library for dealing with
Tooling (barebones)
Beginner (if you're completely new, start here). imbalanced datasets).
Jupyter Notebooks (write and explore machine learning
and data science code) Use SMOTE (synthetic minority over-sampling technique)
Dealing with imbalances: does your data have 10,000
which creates synthentic samples of your minor class to
Zero to Mastery Machine Learning Course (learn Python, examples of one class but only 100 examples of another?
try and level the playing field. You can access an
the foundations of machine learning, tools and concepts implementation of SMOTE contained within the scikit-
all in one course). Disclaimer: this is taught by the person learn-contrib imbalanced-learn package.
(Daniel Bourke) who made this roadmap.

Give me it all in one place A helpful (and technical) paper to look at is "Learning
Kaggle learning centre from Imbalanced Data". You could use this as a starting
point to go onto the next thing.
DataCamp
Training set (usually 70-80% of data): Model learns on
Dataquest this.

End to end regression machine learning project by Validation set (typically 10-15% of data): Model
Aurelien Geron (author of Hands-on Machine Learning hyperparameters are tuned on this.
Book) Data splitting
Code Test set (usually 10-15%): Models final performance is
Example projects
Example Scikit-Learn data exploration and modelling by evaluated on this. If you've done it right, hopefully the
Daniel Formosso results on the test set give a good indication of how the
model should perform in the real world. Do not use this
Example binary classification project by Daniel Bourke dataset to tune the model.

deeplearning.ai curriculum (includes Andrew Ng's famous Draw a line that best fits data scattered on a graph.
machine learning course and much more) Linear Regression
Produces continuous variables (e.g. height in inches).

Overview Predicts a binary outcome based on a series of


Advanced (after 3-6 months+ of beginner work) or, you've
had prior experience as a Python developer/have a independent variables. For example, if you're trying to
fast.ai Part 1 (deep learning for coders) fast.ai curriculum Logisitc Regression
mathematics background. predict whether someone has heart disease or not based
on their health parameters.
fast.ai Part 2 (deep learning from the foundations)
Find the 'k' examples which are most similar to each
CS50's Introduction to Artificial Intelligence with Python: k-Nearest Neighbours other. Then, given a new sample, which is the new sample
phenomenal coverage and introduction to many of the most closely aligned with?
most important topics in artificial intelligence.
Can be used for classification or regression. Try to find
Full-stack deep learning curriculum: Don't let your best way to seperate data points using multiple planes
models die in Jupyter Notebooks, get them live. This (referred to as Hyperplanes). Imagine a road running
course will teach you how to build and deploy machine Support Vector Machines (SVMs) through a city with red cars on the left and green cars on
learning models. the right, the SVM is the road. With more data, you might
need more dimensions (more roads on different angles
Get proficient with at least 1 cloud service (see below) and cars on different heights).

Docker Can be used for classification and regression (very


What's missing (general software engineering practices valuable algorithm for structured data). Decision trees
Missing parts of your CS degree Supervised algorithms split data based on criteria such as, "is over 50" and "has
left out of ML curriculums)...
average heart rate lower than 65" eventually getting to a
teachyourselfcs.com Decision Trees and Random Forests point where the data can't be split anymore (you can See a great way to visualize decision trees by explained.ai
define this). Random Forests are a combination of many
Rachel Thomas computational linear algebra Linear algebra decision trees, effectively leveraging and combining the
choices of many models (this technique of using a
Multivariate calculus (used in optimization) Calculus combination of models is known as ensembling).

Matrix manipulation Mathematics Can be used for classification or regression. Asks the
XGBoost aglorithm
question, can a series of weak learners be turned into a
Neural networks AdaBoost/Gradient Boosting Machines (also just known strong learner? For example, train a series of weak
CatBoost algorithm
as boosting) learners whos job it is to improve each other (another
Probability and statistics example of an ensemble method, combining multiple
LightGBM algorithm
weaker models to create a better one).
Automate the boring stuff with Python Python
Can be used for classification or regression. Takes a Convolutional neural networks (typically used for
Python for Data Analysis by Wes McKinney (creator of series of inputs, manipulates the inputs with linear (dot computer vision).
Data analysis and manipulation Choosing an algorithm
pandas library) product between weights and inputs) and nonlinear
Data science Machine Learning Resources (how to learn,
functions (activation function). Do this a multiple of times Recurrent neural networks (typically used for sequence
Data Science Handbook by Jake VanderPlas
places to visit, etc) Neural networks (also called deep learning)
(at least once for each neuron in a model). The modelling).
importance of linear and non-linear functions (straight
Hands-on Machine Learning with Scikit-Learn, Machine Learning Process (steps in a machine
Machine Learning learning project)
and non-straight lines) means neural networks can use Transformer networks (can be used for vision and text,
TensorFlow & Keras by Aurelien Geron (2nd edition) this combination to estimate almost anything. starting to replace RNNs).

Deep Learning for Coders (with fastai and PyTorch) by K-Means clustering: choose 'k' number of clusters, each
Jeremy Howard and Sylvain Gugger (released July 2020) cluster receives a centre node (called a centroid) at
random and with each iteration the centre nodes attempt
Neural Networks and Deep Learning by Michael Nielsen: a Clustering
to move farther away from each other. Once the centroids
comprehensive overivew of the intuitions and functions stop moving, each sample is assigned a value equivalent
behind neural networks and deep learning. Reading this to the closest centroid.
in its entirety will give you a great foundation for deep
learning. Principal Component Analysis (PCA): reduce data from
Machine learning practices & code more dimensions to lower dimensions whilst attempting
Introduction to Machine Learning with Python by Andreas to preserve the variance.
Books
C. Muller and Sarah Guido
Autoencoders: Learn a lower dimensional encoding of
Mechanics of Machine Learning by Terrence Parr and data. For example, compress an image of 100 pixels into
Jeremy Howard Visualization and dimensionality reduction
50 pixels representing (roughly) the same information as
the 100 pixels.
Building Machine Learning Pipelines by Hannes Hapke &
Catherine Nelson (released August 2020) Unsupervised algorithms
t-Distributed Stochastic Neighbor Embedding (t-SNE):
good for visualizing high-demsionality data in a 2D or 3D
Interpretable Machine Learning Book by Christoph Molnar space.
(explores how to make machine interpretable for
structured data) Autoencoder: Use an autoencoder to reduce the
dimensionality of the inputs of a system and then try to
Seeing Theory: A visual introduction to the most useful recreate those inputs within some threshold. If the
Statistics and probability
statistical and probability concepts. recreations aren't able to match the threshold, there
could be some sort of outlier.
Math for Machine Learning Book by Marc Peter
Deisenroth, A. Aldo Faisal, and Cheng Soon Ong Ananomaly detection
One-class classification: train a model on only one-class
Mathematics (e.g. normal events of computer network traffic, which are
The Matrix Calculus You Need For Deep Learning by usually in abundance), if anything lays outside of this
Terrence Parr and Jeremy Howard class, it may be an anomaly. Algorithms for doing so
include, one-class K-Means, one-class SVM (support
A Cloud Guru vector machine), isolation forest and local outlier factor.

Google Cloud Training Resources All of your data exists in a big static warehouse and you
Cloud Services train a model on it. You may train a new model once per
AWS Training Resources month once you get new data. Learning may take a while
Batch learning
and isn't done often ("don't stuff this one up"). Runs in
Microsoft Azure Training Resources (some amazing paths production without learning (though can be retrained
here) later).

fastpages (blog from Jupyter Noteboks) Your data is constantly being updated and you constantly
Advice for creating your own blog (yes, you should have
Creating a blog train new models on it. Each learning step is usually fast
one) Online learning
GitHub Pages and cheap (as opposed to batch learning). Runs in
production and learns continuously.
Andrei Karpathy's (head of AI at Tesla) recipe for training
neural networks TensorFlow Hub
Rules and tidbits Helpful if you don't have much data or vast compute
Machine learning 101 (actually 43) rules (e.g. don't use Type of learning Take the knowledge one model has learned and use it resources. Use resources such as TensorFlow Hub or PyTorch Hub
machine learning if you don't have to)... machine learning Train model on data (3 steps: choose an algorithm, overfit Transfer learning with your own. Gives you the ability to leverage SOTA PyTorch Hub for broad model options and HuggingFace
rules the model, reduce overfitting with regularization) (state of the art) models for your own problems. transformers (NLP models) and Detectron2 (computer HuggingFace transformers
vision models) for specific models.
Practical Advice for Building Deep Neural Networks (a Detectron2
handful of hands-on tips for deep learning)
Also referred to as "human in the loop" learning. A human
arXiv-sanity: A tool for monitoring the latest research expert interacts with a model and provides updates to Example of how Nvidia uses active learning for their self-
from arXiv. Active learning
labels for samples which the model is most uncertain driving car models.
about.
Made with ML: community driven resource for projects
built with machine learning (you should put yours here!). Not really a form of learning, more combining algorithms
Bookmarks which have already learned in some way to get better
Papers with Code: most recent machine learning research Ensembling
results. For example, leveraging the "wisdom of the
with code resources. crowd".

sotabench: State of the art benchmark tracking (what is Happens when your model doesn't perform as well as
the current state of the art for a series of benchmarks). Underfitting you'd like on your data. Try training for longer or a more
advanced model.
Google Dataset Search: use the power of Google search to
find different datasets. Happens when your validation loss (how your model is
performing on the validation dataset, lower is better)
Kaggle Datasets: get open machine learning datasets as starts to increase. Or if you don't have a validation set,
well as examples of how to work through them. happens when the model performs far better on the
training set than on the test set (e.g. 99% accuracy on
Curated list of free datasets by Dataquest training set, 67% accuracy on the test set). Fix through
various regularization techniques.
Open Images: massive repository of open source images Datasets
and labels. L1 (lasso) and L2 (ridge) regularization: L1 regularization
sets uneeded feature coefficients to 0 (performs feature
Awesome data labelling: a curated list of resources for selection on which features are most essential and which
labelling data, including image annotation, audio aren't, useful for model explainability). L2 contrains a
annotation and more. models features (won't set them to 0).

The big bad NLP dataset: a collection of 545~ datasets for Dropout: randomly remove parts of your model so the
NLP. rest of it has to become better.

Learning how to learn on Coursera Early stopping: stop your model from training before the
validation loss starts to increase too much or more
6 Techniques Which Help Me Study Machine Learning Five generally, any other metric has stopped improving. Early
Days Per Week by Daniel Bourke (the one who wrote this Overfitting
stopping usually implemented in the form of a model
roadmap) callback.
Cool resources, how do I actually learn all of this?
My Self-Created AI Masters Degree by Daniel Bourke Regularization: a collection of techniques to Data augmentation: manipulate your dataset in artifical Image transformation in PyTorch
(mostly ML) prevent/reduce overfitting. ways to make it 'harder to learn'. For example, if you're
Create your own curriculum dealing with images, randomly rotate, skew, flip and Image augmentation in Keras/TensorFlow
How I learned to code by Jason Benn (ML + software adjust the height of your images. This makes your model
engineering + web development) have to learn similar patterns across different styles of EDA (easy data augementation): text augmentation with
the same image (harder). Note: since this can be compute Python (boost performance on text classification tasks)
intensive, it's a good idea to do this in memory. See a
Machine Learning Tools (tools you can use to get
50 functions like ImageDataGenerator in Keras or transforms TextAttack: framework for adversarial attacks, data
the job done) in torchvision. augmentation and modeling training in NLP.

Batch normalization: standardize inputs (zero mean and


Machine Learning Mathematics (what's running normalize) as well as addbing two parameters (beta, how
13 under the hood when you write machine learning much to offset the parameters for each layer and epsilon
code) to avoid division by zero) before they go into the next
layer. This often results in faster training speeds since the
optimizer has less parameters to update. May be a
replacement for dropout in some networks.

Finding the optimal learning rate: train the model for a


few hundred iterations starting with a very low learning
rate (e.g. 10e-6) and slowly increase it to a very large
value (e.g. 1). Then plot the loss versus the learning rate
(using a log scale for learning rate), you should see a U-
shaped curve, the optimal learning is about 1-2 notches
Setting a learning rate (often the most important
to the left of the bottom of the U-curve. See: p326 Hands-
hyperparameter), generally, high learning rate =
on Machine Learning Book Edition 2.
algorithm rapidly adapts to new data, low learning rate =
algorithm adapts slower to new data (e.g. for transfer
Learning rate scheduling (use Adam optimizer) involves
learning).
decreasing the learning rate slowly as model learns more
(gets closer to convergence).

Cyclic learning rate: dynamically change the learning rate


up and down between some threshold and potentially
speed up training.

Hyperparameter Tuning (run a bunch of experiments with [Very useful paper] A disciplined approach to neural
different model settings and see which works best) network hyperpameters by Leslie Smith. Covers learning
rate, batch size, momentum, weight decay and more.

Number of layers (deep learning networks)

Batch size (how many examples of data your model sees


at once). Use the largest batch size you can fit in your GPU
memory. If in doubt, use batch size 32 (see Yann LeCunn's
Tweet, and the paper attached of course).

Other hyperparameters you can tune Number of trees (decision tree algorithms)

Number of iterations (how many times the model goes


through the data). Note: Instead of tuning iterations, use
early-stopping instead.

Many more... depends on the algorithm you're using.


Search "[aglorithm_name] hyperparameter tuning".

Accuracy

Precision

Recall
Classification
f1

Confusion matrix

Evaluation metrics Mean average precision (object detection)

MSE (mean squared error)

Regression MAE (mean absolute error)

R^2 (r-squared)

Task-based metric (create one based on your specific


problem). For example, for self-driving cars, you might
want to know "number of disengagements".

Analysis/Evaluation
Which features contributed most to the model? Should
some be removed? Useful for model explainability, for
Feature importance
example, telling someone, "the number of bedrooms is
most important when predicting a house price".

How long does a model take to train? Is this feasible?

Training/inference time/cost
How long does inference take? Is it suitable for
production?

What-if tool: how does my model compare to other


models? What if I changed something in the data? How
does this effect the outcome?

Least confident examples: what does the model get


wrong? (usually the long tail, instances you don't have
many examples of)

Bias/variance trade-off: high bias results in underfitting


and a lack of generalization to new samples, high
variance results in overfitting due to the model finding
patterns in the data which is actually random noise.

Put the model into production and see how it goes.


Evaluation metrics in vivo (in a notebook) are great but
until it's in production, you won't know how it performs
for real.

TensorFlow Serving (part of TFX, TensorFlow Extended)

PyTorch Serving (TorchServe)


Serve model (deploying a model)
Tools you can use
Google AI Platform: make your model available as a REST
API

Sagemaker

MLOps: where software engineering meets maching


learning, essentially all the technology required around a
machine learning model to have it working in production.

See how the model performs after serving (or prior to


serving) based on various evaluation metrics and revisit
the above steps as required (remember, machine learning
is very experimental, so this is where you'll want to track
your data and experiments).
Retrain model

You'll also find your models predictions start to 'age'


(usually not in a fine-wine style) or 'drift', as in when data
sources change or upgrade (new hardware, etc). This is
when you'll want to retrain it.

Made by Daniel Bourke (any questions, feel free to ask).


Copy/use whatever you need.
Daniel Bourke

Inspiration drawn from (you should also check this out):


GitHub - dformoso/machine-learning-mindmap: A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

You might also like