0% found this document useful (0 votes)
203 views

Six Steps To Master Machine Learning With Data Preparation

The document discusses six critical steps for preparing data for machine learning: 1) Data collection to determine relevant attributes and parse nested data structures 2) Data exploration to assess data quality issues and identify outliers or biases 3) Formatting data for consistency across different sources 4) Improving data quality by addressing errors, missing values, and outliers 5) Feature engineering to better represent patterns for learning algorithms 6) Splitting data into training and evaluation sets to ensure proper model testing

Uploaded by

Poorna28
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
203 views

Six Steps To Master Machine Learning With Data Preparation

The document discusses six critical steps for preparing data for machine learning: 1) Data collection to determine relevant attributes and parse nested data structures 2) Data exploration to assess data quality issues and identify outliers or biases 3) Formatting data for consistency across different sources 4) Improving data quality by addressing errors, missing values, and outliers 5) Feature engineering to better represent patterns for learning algorithms 6) Splitting data into training and evaluation sets to ensure proper model testing

Uploaded by

Poorna28
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Six Steps to Master Machine Learning with

Data Preparation
To prepare data for both analytics and machine learning initiatives teams can accelerate machine
learning and data science projects to deliver an immersive business consumer experience that
accelerates and automates the data-to-insight pipeline by following six critical steps.

By David Levinger, VP Dev and Cloud Operations at Paxata

Organizations today continue to look for ways to prepare data quickly and more accurately to
solve their data challenges and enable machine learning (ML). But before bringing your data into
a machine learning model or any other analytics project, it’s important to ensure that it is clean,
consistent, and accurate. Because much of today’s analytics is dependent on the context of the
data, the task is best done by those closest to what the data actually represents; the business
domain expert who can apply hunches, theories, and business knowledge to the data.

Unfortunately, business users don’t usually come equipped with data science skills so bridging
that gap can make the difference between gaining value from your data quickly. As result, many
are applying data preparation (DP) to help data scientists and ML practitioners rapidly prepare
and annotate their enterprise data to extend the value of the data across the enterprise for analytic
workloads.

How data collection and preparation are the foundation for trusted ML models

 
To create a successful machine learning model, it is imperative that an organization has the
ability to train, test, and validate them prior to deploying into production. Data preparation
technology is being used to create the clean and annotated foundation needed for today’s modern
machine learning yet, good DP historically takes more time than any other part of the machine
learning process.

Reducing the time necessary for data preparation has become increasingly important, as it leaves
more time to test, tune, and optimize models to create greater value. To prepare data for both
analytics and machine learning initiatives teams can accelerate machine learning and data
science projects to deliver an immersive business consumer experience that accelerates and
automates the data-to-insight pipeline by following six critical steps:

Step 1: Data collection

This is the by far the essential first step as it addresses common challenges, including:

 Automatically determining relevant attributes in a data string stored in a .csv (comma-


separated) file
 Parsing highly-nested data structures such as those from XML or JSON files into a
tabular form, for easier scanning and pattern detection.
 Searching and identifying relevant data from external repositories.

However, when considering a DP solution, make sure it can combine multiple files into one
input, such as when you have a collection of files representing daily transactions, but your
machine learning model needs to ingest a year of data. Also, be sure to have a contingency plan
in place for overcoming problems associated with sampling and bias in your data set and your
machine learning model.

Step 2: Data Exploration and Profiling

 
Once the data is collected, it’s time to assess the condition of it, including looking for trends,
outliers, exceptions, incorrect, inconsistent, missing, or skewed information. This is important
because your source data will inform all of your model’s findings, so it is critical to be sure it
does not contain unseen biases. For example, if you are looking at customer behavior nationally,
but only pulling in data from a limited sample, you might miss important geographic regions.
This is the time to catch any issues that could incorrectly skew your model’s findings, on the
entire data set, and not just on partial or sample data sets.

Step 3: Formatting data to make it consistent

 
The next step in great data preparation is to ensure your data is formatted in a way that best fits
your machine learning model. If you are aggregating data from different sources, or if your data
set has been manually updated by more than one stakeholder, you’ll likely discover anomalies in
how the data is formatted (e.g. USD5.50 versus $5.50). In the same way, standardizing values in
a column, e.g. State names that could be spelled out or abbreviated) will ensure that your data
will aggregate correctly. Consistent data formatting takes away these errors so that the entire data
set uses the same input formatting protocols.

Step 4: Improving data quality

 
Here, start by having a strategy for dealing with erroneous data, missing values, extreme values,
and outliers in your data. Self-service data preparation tools can help if they have intelligent
facilities built in to help match data attributes from disparate datasets to combine them
intelligently. For instance, if you have columns for FIRST NAME and LAST NAME in one
dataset and another dataset has a column called CUSTOMER that seem to hold a FIRST and
LAST NAME combined, intelligent algorithms should be able to determine a way to match these
and join the datasets to get a singular view of the customer.

For continuous variables, make sure to use histograms to review the distribution of your data and
reduce the skewness. Be sure to examine records outside an accepted range of value. This
“outlier” could be an inputting error, or it could be a real and meaningful result that could inform
future events as duplicate or similar values could carry the same information and should be
eliminated. Similarly, take care before automatically deleting all records with a missing value, as
too many deletions could skew your data set to no longer reflect real-world situations.
 
 

Step 5: Feature engineering

 
This step involves the art and science of transforming raw data into features that better represent
a pattern to the learning algorithms. For example, data can be decomposed into multiple parts to
capture more specific relationships, such as analyzing sales performance by the day of the week,
not only the month or year. In this situation, segregating the day as a separate categorical value
from the date (e.g. “Mon; 06.19.2017”) may provide the algorithm with more relevant
information.

Step 6: Splitting data into training and evaluation sets

 
The final step is to split your data into two sets; one for training your algorithm, and another for
evaluation purposes. Be sure to select non-overlapping subsets of your data for the training and
evaluation sets in order to ensure proper testing. Invest in tools that provide versioning and
cataloging of your original source as well as your prepared data for input to machine learning
algorithms, and the lineage between them. This way, you can trace the outcome of your
predictions back to the input data to refine and optimize your models over time.

Accelerating business performance – How DP enables ML and solves data


challenges

 
Data preparation has long been recognized for helping business leaders and analysts to ready and
prepare the data needed for analytics, operations, and regulatory requirements. Self-service data
preparation that runs on Amazon Web Services (AWS) and Azure takes it to the next level by
leveraging many valuable attributes of a cloud-based environment.
As a result, business users who are closest to the data and most knowledgeable about its business
context, can prepare data sets quickly and accurately, with the help of built-in intelligence and
smart algorithms. They can work within an intuitive, visual application to access, explore, shape,
collaborate and publish data with clicks, not code, with complete governance and security. IT
professionals are able to maintain the scale of data volumes and variety across both enterprise
and cloud data sources to support business scenarios for immediate and repeatable data service
needs.

Solutions like DP solve many data challenges and enable ML and data science workflows that
enhance applications with machine intelligence. More importantly, it enables them to transform
data into information on-demand to empower every person, process, and system in the
organization to be more intelligent.

 
Bio: David Levinger is VP Dev and Cloud Operations at Paxata, the pioneer and leader in
enterprise-grade self-service data preparation for analytics. To learn more visit www.paxata.com
or engage with the company on Twitter, LinkedIn, Facebook, or YouTube.

Original. Reposted with permission.

MyStory: Step by Step process of How I


Became a Machine Learning Expert in 10
Months
Guest Blog, July 19, 2018

Login to Bookmark this article

Introduction

Not so long ago, using the pivot tables option in Excel was the upper limit of my skills with
numbers and the word python was more likely to make me think about a dense jungle or a nature
program on TV than a tool to generate business insights and create complex solutions.

It took me ten months to leave that life behind and start feeling like I belonged to the exclusive
world of people who can tell their medians from their means, their x-bars from the neighborhood
pub, and who know how to teach machines what they need to learn.
The transformation process was not easy and demanded hard work, lots of time, dedication and
required plenty of help along the way. It also involved well over hundreds of hours of “studying”
in different forms and an equal amount of time practicing and applying all that was being learnt.
In short, it wasn’t easy to transform from being data dumb to a data nerd, but I managed
to do so while going through a terribly busy work schedule as well as being a dad to a one-
year old.

The point of this article is to help you if you are looking to make a similar transformation but do
not know where to start and how to proceed from one step to the next. If you are interested in
finding out, read on to get an idea about the topics you need to cover and also develop an
understanding of the level of expertise you need to build at each stage of the learning process.

There are plenty of great online and offline resources to help you master each of these steps, but
very often, the trouble for the uninitiated can be in figuring out where to start and where to
finish. I hope spending the next ten to fifteen minutes going through this article will help solve
that problem for you.

And finally, before proceeding any further, I would like to point out that I had a lot of help in
making this transformation. Right at the end of the article, I will reveal how I managed to
squeeze in so much learning and work in a matter of ten months. But that’s for later.

For now, I want to give you more details about the nine steps that I had to go through in my
transformation process.

Step 1: Understand the basics

Spend a couple of weeks enhancing your “general knowledge” about the field of data science
and machine learning. You may already have ideas and some sort of understanding about what
the field is, but if you want to become an expert, you need to understand the finer details to a
point where you can explain it in simple terms to just about anyone.

Suggested topics:   

 What is Analytics? 
 What is Data Science? 
 What is Big Data? 
 What is Machine Learning? 
 What is Artificial Intelligence? 
 How are the above domains different from each other and related to each other?
 How are all of the above domains being applied in the real world? 

Exercise to show that you know:

 Write a blog post telling readers how to answer these questions if asked in an interview

Step 2: Learn some Statistics

I have a confession to make. Even though I feel like a machine learning expert, I do not feel that
I have any level of expertise in statistics. Which should be good news for people who struggle
with concepts in statistics as much as I do, as it proves that you can be a data scientist without
being a statistician. Having said that, you cannot ignore statistical concepts – not in machine
learning and data science!

So what you need to do is to understand certain concepts and know when they may be applied or
used. If you can also completely understand the theory behind these concepts, give yourself a
few good pats on your back.

Suggested topics:   

 Data structures, variables and summaries 
 Sampling
 The basic principles of probability 
 Distributions of random variables 
 Inference for numerical and categorical data 
 Linear, multiple and logistic regression

Suggested exercise to mark completion of this step:

 Create a list of references with the easiest to understand explanation that you found for each
topic and publish them in a blog. Add a list of statistics related questions that one may be
expected to answer in a data science interview

Step 3: Learn Python or R (or both) for data analysis

Programming turned out to be easier to learn, more fun and more rewarding in terms of the
things it made possible, than I had ever imagined. While mastering a programming language
could be an eternal quest, at this stage, you need to get familiar with the process of learning a
language and that is not too difficult.

Both Python and R are very popular and mastering one can make it quite easy to learn the other.
I started with R and have slowly started using Python for doing similar tasks as well.

Suggested topics:

 Supported data structures 


 Read, import or export data 
 Data quality analysis 
 Data cleaning and preparation 
 Data manipulation – e.g. sorting, filtering, aggregating and other functions 
 Data vizualization

Know that you are set for the next step:

 Extract a table from a website, modify it to compute new variables, and create graphs
summarizing the data

Step 4: Complete an Exploratory Data Analysis Project

In the first cricket test match ever played (see scorecard), Australian Charles Bannerman scored
67.35% (165 out of 245) of his team’s total score, in the very first innings of cricket’s history.
This remains a record in cricket at the time of writing, for the highest share of the total score by a
batsman in an innings of a test match.

What makes the innings even more remarkable is that the other 43 innings in that test match had
an average of only 10.8 runs an innings, with only about 40% of all batsmen registering a score
of ten or more runs. In fact, the second highest score by an Australian in the match was 20 runs.
Given that Australia won the match by 45 runs, we can say with conviction that Bannerman’s
innings was the most important contributor to Australia’s win.

Just like we were able to build this story from the scorecard of the test match, exploratory data
analysis is about studying data to understand the story that is hidden beneath it, and then sharing
the story with everyone.

Personally, I find this phase of a data project the most interesting, which is a good thing as quite
a lot of the time in a typical project could be expected to be taken up by exploratory data
analysis.

Topics to cover:

 Single variable explorations 


 Pair-wise and multi-variable explorations 
 Vizualization, dashboard and storytelling in Tableau

Project output: 

 Create a blog post summarizing the exercise and sharing the dashboard or story. Use a dataset


with at least ten columns and a few thousand records

Step 5: Create unsupervised learning models

Let’s say we had data for all the countries in the world across many parameters ranging from
population, to income, to health, to major industries and more. Now suppose we wanted to find
out which countries are similar to each other across all these parameters. How do we go about
doing this, when we have to compare each country with all the others, across over 50 different
parameters?

That is where unsupervised machine learning algorithms come in. This is not the time to bore
you with details about what these are all about, but the good news is that once you reach this
stage, you have moved on into the world of machine learning and are already in elite company.

Topics to cover:

 K-means clustering
 Association rules

Milestone exercise:

 Practice K-means clustering on 3 different datasets from different industries or interest areas

Step 6: Create supervised learning models

If you had data about millions of loan applicants and their repayment history from the past, could
you identify an applicant who is likely to default on payments, even before the loan is approved?

Given enough prior data, could you predict which users are more likely to respond to a digital
advertising campaign? Could you identify if someone is more likely to develop a certain disease
later in their life based on their current lifestyle and habits?

Supervised learning algorithms help solve all these problems and a lot more. While there are a
plethora of algorithms to understand and master, just getting started with some of the most
popular ones will open up a world of new possibilities for you and the ways in which you can
make data useful for an organization.
Topics to cover:

 Logistic regression 
 Classification trees 
 Ensemble models like Bagging and Random Forest 
 Supervised Vector Machines

You have not really started with creating models till you have done this:

 Take a dataset, create models using all the algorithms you have learnt. Train, test and tune each


model to improve performance. Compare them to identify which is the best model and
document why you think it is so

Step 7: Understand Big Data Technologies

Many of the machine learning models in use today have been around for decades. The reason
why these algorithms are only finding applications now, is that we finally have access to
sufficiently large amounts of data, that can be supplied to these algorithms for them to be able to
come up with useful outputs.

Data engineering and architecture is a field of specialization in itself, but every machine learning
expert must know how to deal with big data systems, irrespective of their specialization within
the industry.

Understanding how large amounts of data can be stored, accessed and processed efficiently is
important to being able to create solutions that can be implemented in practice and are not just
theoretical exercises.

I had approached this step with a real lack of conviction, but as I soon found out, it was driven
more by the fear of the unknown in the form of Linux interfaces than any real complexity in
finding my way around a Hadoop system.    

Topics to cover:

 Big data overview and eco-system


 Hadoop – HDFS, MapReduce, Pig and Hive 
 Spark

Do this to know that you have understood the basics:

 Upload data, run processes and extract results after installing a local version of Hadoop or Spark
on your system

 
Step 8: Explore Deep Learning Models

Deep learning models are helping companies like Apple and Google create solutions like Siri or
the Google Assistant. They are helping global giants test driverless cars and suggesting best
courses of treatment to doctors.

Machines are able to see, listen, read, write and speak thanks to deep learning models that are
going to transform the world in many ways, including significantly changing the skills required
for people to be useful to organizations.

Getting started with creating a model that can tell the image of a flower from a fruit may not
immediately help you start building your own driverless car, but it will certainly help you start
seeing the path to getting there.

Topics to cover:

 Artificial Neural Networks 


 Natural Language Processing 
 Convolutional Neural Networks 
 TensorFlow
 Open CV

Milestone exercise:

 Create a model that can correctly identify pictures of two of your friends or family members

Step 9. Undertake and Complete a Data Project

By now you are almost ready to unleash yourself to the world as a machine learning pro, but you
need to showcase all that you have learnt before anyone else will be willing to agree with you.  

The internet presents glorious opportunities to find such projects. If you have been diligent about
the previous eight steps, chances are that you would already know how to find a project that will
excite you, be useful to someone, as well as help demonstrate your knowledge and skills.

Topics to cover:

 Data collection, quality check, cleaning and preparation 


 Exploratory data analysis 
 Model creation and selection 
 Project report

Milestone exercise:
 Get in touch with a stakeholder who will be interested in your report and share your findings wit
h them and get feedback

End Notes

Machine learning and artificial intelligence is a set of skills for the present and future. It is also a
field where learning will never cease and very often you may have to keep running to stay in the
same place, as far as being equipped with the most in-demand skills is concerned.

However, if you start the journey well, you will be able to understand how to go about taking the
next step in your learning path. As you must have gathered by now, starting the journey well is a
pretty challenging exercise in itself. If you choose to start upon it, I hope this article will have
been of some help to you and I wish you the very best.
Finally, I will confess that I got a lot of help with my ten-month transition. The reason I was able
to cover so much ground in this amount of time, along with a busy schedule at work and home,
was that I enrolled for the Post Graduate Program in Data Science and Machine Learning offered
by Jigsaw Academy and Graham School, University of Chicago.
Investing in the course helped in keeping my learning hours focused, created external pressure
that ensured that I was finding time for it irrespective of whatever else was going on in life, and
gave me access to experts in the form of faculty and a great peer group through other students.

Transforming from being non-technical to someone who is comfortable with the machine
learning world has already opened up many new doors for me. Whatever path you choose to
make this transformation, you can do so with the assurance that going through the rigor will reap
rewards for a long time and will banish any fears of becoming irrelevant in tomorrow’s
economy.

About the Author

Madhukar Jha, Founder – Blue Footed Ideas

Madhukar Jha believes that great digital experiences are created by concocting a perfect mix of
data driven insights, understanding of behavioural drivers, a design thinking approach, and
cutting edge technology. He applies this philosophy to help businesses make world class
products, run campaigns that rock and tell compelling stories.

5-Steps and 10-Steps, to Learn Machine


Learning.

Maher
Follow
Feb 15, 2019 · 4 min read

There is a 5-Step shortcut that you can do to be able to solve machine learning problems right
away, As a beginner, you can take this path at first if you want to get something done with
machine learning.

And then you can take the 10-steps path to be a data scientist or a more advanced machine
learning engineer.
Just keep this in mind,

— Improve your mindset and your way of thinking, This is much more important than learning
how to use the tools.

— There is no fixed track to achieve your goal and become a data scientist, So you’re free to
change any step or do what you think would be right for you.
Read The Catalog

Path 1: Read the Catalog path ( 5-Steps


Plan )
This path is like reading the catalog of the new device you just bought, you’ll learn how to use it
effectively, it will turn you from a beginner to a beginner that can get a job done! you’ll be able
to solve limited problems with limited accuracy. I recommend taking this path at some point to
be able to solve less important problems quickly.

I use these tools a lot when I start solving any machine learning problem, this gives me a
baseline accuracy to try to improve upon, It helps me get more familiar with the dataset I have
and give me some insights.

1. Master python or any other language like R but I recommend you start with Python,
there’re a lot of resources for you on the web that is free, I’d suggest you go with a trial
and error approach along with some reading.
2. Learn how to use Numpy and Scipy for math, Pandas and Matplotlib and seaborn.
Numpy and Scipy, are Mathematical libraries.
Pandas is a python library for data manipulation and analysis.
Matplotlib and Seaborn are libraries that help you visualize the Data.
3. Read the article Introduction to Machine learning: Top-down approach, It’ll give you
a smooth introduction to the machine learning world.
4. Read about Scikit-learn, this step is the actual catalog reading, scikit-learn is the toolset
you’ll use to solve the problems, you don't have to learn everything in the library just
learn to implement one or two models and read about the others.
Scikit-learn is a python library that has a lot of already implemented models that are
black boxes that you can use to train and make predictions with directly, and you can
even tune the model’s parameters to suit your problem and get more accuracy.
5. Read Chapter 2 in the book Hands-On Machine Learning with Scikit-Learn &
TensorFlow

If you’re not interested in data science yet then you should not continue aiming for data science,
But if you ARE interested, I recommend that you take the second path.
Now, Let’s be Professional

Path 2: Build a Career path ( 10-Steps Plan )


This path will actually turn you from a beginner to a data scientist, It will give you the toolset
and the knowledge to solve relatively complex problems.

The most important thing that will shape how good you are as a data scientist is that you should
always be up to date with the new discoveries out in the field.

I recommend that you read a lot of papers, follow a lot of publications and writers, and engage
with them, reach out to me for any questions! We can benefit each other and build a caring
community.

The other really important thing is practicing.

1. Of course, choose a programming language to master and use in your journey.


2. Revise your Linear Algebra knowledge https://ocw.mit.edu/courses/mathematics/18-06-
linear-algebra-spring-2010/
3. Revise your statistics and probability knowledge from cheat-sheets, or you can learn it
from scratch on KhanAcademy https://www.khanacademy.org/math/statistics-probability
4. Revise your Calculus Knowledge or learn calculus from this course
https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9K-
rj53DwVRMYO3t5Yr
5. Then, Of course, the most popular free course out there Andrew Ng’s course on
Coursera https://www.coursera.org/learn/machine-learning for theoretical Knowledge.
6. I recommend reading the book Hands-On Machine Learning with Scikit-Learn &
TensorFlow by Aurélien Géron, it’s a really good book that’s full of information, It’s for
technical knowledge.
7. Practice a lot on Kaggle, Now you can solve problems with the Machine Learning
techniques that you understood so far.
8. Data visualization from this course https://www.edx.org/course/data-visualization-a-
practical-approach-for-absolute-beginners-0
9. Learn how to work with databases (SQL and no SQL)
10. Learn Hadoop & Spark, I recommend this course https://www.udemy.com/share/1000lU,
and there’re a lot of free courses and books out there, check them out.
11. I’m sure if you have reached this point you will be able to guide yourself forward.

Be dedicated and have faith that you can do it.

Have a good time Making your machine creative.


Need Help Getting Started with Applied
Machine Learning?
These are the Step-by-Step Guides that You’ve Been
Looking For!
What do you want help with?

Foundations

 How Do I Get Started?


 Step-by-Step Process
 Probability
 Statistical Methods
 Linear Algebra

Beginner

 Understand ML Algorithms
 ML + Weka (no code)
 ML + Python (scikit-learn)
 ML + R (caret)
 Time Series Forecasting

Intermediate

 Code ML Algorithms
 XGBoost Algorithm
 Imbalanced Classification
 Deep Learning (Keras)
 Better Deep Learning

Advanced

 Long Short-Term Memory


 Natural Language (Text)
 Computer Vision
 CNN/LSTM + Time Series
 GANs

How Do I Get Started?


The most common question I’m asked is: “how do I get started?”

My best advice for getting started in machine learning is broken down into a 5-step process:

 Step 1: Adjust Mindset. Believe you can practice and apply machine learning.
o What is Holding you Back From Your Machine Learning Goals?
o Why Machine Learning Does Not Have to Be So Hard
o How to Think About Machine Learning
o Find Your Machine Learning Tribe
 Step 2: Pick a Process. Use a systemic process to work through problems.
o Applied Machine Learning Process
 Step 3: Pick a Tool. Select a tool for your level and map it onto your process.
o Beginners: Weka Workbench.
o Intermediate: Python Ecosystem.
o Advanced: R Platform.
o Best Programming Language for Machine Learning
 Step 4: Practice on Datasets. Select datasets to work on and practice the process.
o Practice Machine Learning with Small In-Memory Datasets
o Tour of Real-World Machine Learning Problems
o Work on Machine Learning Problems That Matter To You
 Step 5: Build a Portfolio. Gather results and demonstrate your skills.
o Build a Machine Learning Portfolio
o Get Paid To Apply Machine Learning
o Machine Learning For Money

For more on this top-down approach, see:

 The Machine Learning Mastery Method


 Machine Learning for Programmers

Many of my students have used this approach to go on and do well in Kaggle competitions and
get jobs as Machine Learning Engineers and Data Scientists.

Applied Machine Learning Process


The benefit of machine learning are the predictions and the models that make predictions.

To have skill at applied machine learning means knowing how to consistently and reliably
deliver high-quality predictions on problem after problem. You need to follow a systematic
process.

Below is a 5-step process that you can follow to consistently achieve above average results on
predictive modeling problems:
 Step 1: Define your problem.
o How to Define Your Machine Learning Problem
 Step 2: Prepare your data.
o How to Prepare Data For Machine Learning
o How to Identify Outliers in your Data
o Improve Model Accuracy with Data Pre-Processing
o Discover Feature Engineering
o An Introduction to Feature Selection
o Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset
o Data Leakage in Machine Learning
 Step 3: Spot-check algorithms.
o How to Evaluate Machine Learning Algorithms
o Why you should be Spot-Checking Algorithms on your Machine Learning
Problems
o How To Choose The Right Test Options When Evaluating Machine Learning
Algorithms
o A Data-Driven Approach to Choosing Machine Learning Algorithms
 Step 4: Improve results.
o How to Improve Machine Learning Results
o Machine Learning Performance Improvement Cheat Sheet
o How To Improve Deep Learning Performance
 Step 5: Present results.
o How to Use Machine Learning Results
o How to Train a Final Machine Learning Model
o How To Deploy Your Predictive Model To Production

For a good summary of this process, see the posts:

 Applied Machine Learning Process


 How to Use a Machine Learning Checklist to Get Accurate Predictions

Probability for Machine Learning


Probability is the mathematics of quantifying and harnessing uncertainty. It is the bedrock of
many fields of mathematics (like statistics) and is critical for applied machine learning.

Below is the 3 step process that you can use to get up-to-speed with probability for machine
learning, fast.

 Step 1: Discover what Probability is.


o Basics of Mathematical Notation for Machine Learning
o What Is Probability?
 Step 2: Discover why Probability is so important for machine learning.
o 5 Reasons to Learn Probability for Machine Learning
o A Gentle Introduction to Uncertainty in Machine Learning
 Step 3: Dive into Probability topics.
o Probability for Machine Learning Mini-Course
o Probability for Machine Learning (my book)

You can see all of the tutorials on probability here. Below is a selection of some of the most
popular tutorials.

Probability Foundations

 Introduction to Joint, Marginal, and Conditional Probability


 Intuition for Joint, Marginal, and Conditional Probability
 Worked Examples of Different Types of Probability

Bayes Theorem

 A Gentle Introduction to Bayes Theorem for Machine Learning


 Develop a Naive Bayes Classifier from Scratch in Python
 Implement Bayesian Optimization from Scratch in Python

Probability Distributions

 A Gentle Introduction to Probability Distributions


 Discrete Probability Distributions for Machine Learning
 Continuous Probability Distributions for Machine Learning

Information Theory

 A Gentle Introduction to Information Entropy


 Calculate the Divergence Between Probability Distributions
 A Gentle Introduction to Cross-Entropy for Machine Learning

Statistics for Machine Learning


Statistical Methods an important foundation area of mathematics required for achieving a deeper
understanding of the behavior of machine learning algorithms.

Below is the 3 step process that you can use to get up-to-speed with statistical methods for
machine learning, fast.

 Step 1: Discover what Statistical Methods are.


o What is Statistics (and why is it important in machine learning)?
 Step 2: Discover why Statistical Methods are important for machine learning.
o The Close Relationship Between Applied Statistics and Machine Learning
o 10 Examples of How to Use Statistical Methods in a Machine Learning Project
 Step 3: Dive into the topics of Statistical Methods.
o Statistics for Machine Learning (7-Day Mini-Course)
o Statistical Methods for Machine Learning (my book)

You can see all of the statistical methods posts here. Below is a selection of some of the most
popular tutorials.

Summary Statistics

 Introduction to the 5 Number Summary


 Introduction to Data Visualization
 Correlation to Understand the Relationship Between Variables
 Introduction to Calculating Normal Summary Statistics

Statistical Hypothesis Tests

 15 Statistical Hypothesis Tests in Python (Cheat Sheet)


 Introduction to Statistical Hypothesis Tests
 Introduction to Nonparametric Statistical Significance Tests
 Introduction to Parametric Statistical Significance Tests
 Statistical Significance Tests for Comparing Algorithms

Resampling Methods

 Introduction to Statistical Sampling and Resampling


 Introduction to the Bootstrap
 Introduction to Cross-Validation

Estimation Statistics

 Introduction to Estimation Statistics


 Introduction to Confidence Intervals
 Introduction to Prediction Intervals
 Introduction to Tolerance Intervals

Linear Algebra for Machine Learning


Linear algebra is an important foundation area of mathematics required for achieving a deeper
understanding of machine learning algorithms.

Below is the 3 step process that you can use to get up-to-speed with linear algebra for machine
learning, fast.
 Step 1: Discover what Linear Algebra is.
o Basics of Mathematical Notation for Machine Learning
o A Gentle Introduction to Linear Algebra
 Step 2: Discover why Linear Algebra is important for machine learning.
o 5 Reasons to Learn Linear Algebra for Machine Learning
o 10 Examples of Linear Algebra in Machine Learning
o Linear Algebra for Machine Learning
 Step 3: Dive into Linear Algebra topics.
o Linear Algebra for Machine Learning Mini-Course
o Linear Algebra for Machine Learning (my book)

You can see all linear algebra posts here. Below is a selection of some of the most popular
tutorials.

Linear Algebra in Python

 Introduction to N-Dimensional Arrays in Python


 How to Index, Slice and Reshape NumPy Arrays

Matrices

 Introduction to Matrices and Matrix Arithmetic


 Introduction to Matrix Types in Linear Algebra
 Introduction to Matrix Operations for Machine Learning
 Introduction to Tensors for Machine Learning

Vectors

 Introduction to Vectors
 Introduction to Vector Norms

Matrix Factorization

 Introduction to Matrix Factorization


 Introduction to Eigendecomposition
 Introduction to Singular-Value Decomposition (SVD)
 Introduction to Principal Component Analysis (PCA)

Understand Machine Learning Algorithms


Machine learning is about machine learning algorithms.

You need to know what algorithms are available for a given problem, how they work, and how
to get the most out of them.
Here’s how to get started with machine learning algorithms:

 Step 1: Discover the different types of machine learning algorithms.


o A Tour of Machine Learning Algorithms
 Step 2: Discover the foundations of machine learning algorithms.
o How Machine Learning Algorithms Work
o Parametric and Nonparametric Algorithms
o Supervised and Unsupervised Algorithms
o The Bias-Variance Trade-Off
o Overfitting and Underfitting With Algorithms
 Step 3: Discover how top machine learning algorithms work.
o Machine Learning Algorithms Mini-Course
o Master Machine Learning Algorithms (my book)

You can see all machine learning algorithm posts here. Below is a selection of some of the most
popular tutorials.

Linear Algorithms

 Gradient Descent
 Linear Regression
 Logistic Regression
 Linear Discriminant Analysis

Nonlinear Algorithms

 Classification And Regression Trees


 Naive Bayes
 K-Nearest Neighbors
 Learning Vector Quantization
 Support Vector Machines

Ensemble Algorithms

 Bagging and Random Forest


 Boosting and AdaBoost

How to Study/Learn ML Algorithms

 5 Ways To Understand Machine Learning Algorithms


 How to Learn a Machine Learning Algorithm
 How to Study Machine Learning Algorithms
 How to Research a Machine Learning Algorithm
 How To Investigate Machine Learning Algorithm Behavior
 Take Control By Creating Lists of Machine Learning Algorithms
 6 Questions To Understand Any Machine Learning Algorithm

Weka Machine Learning (no code)


Weka is a platform that you can use to get started in applied machine learning.

It has a graphical user interface meaning that no programming is required and it offers a suite of
state of the art algorithms.

Here’s how you can get started with Weka:

 Step 1: Discover the features of the Weka platform.


o What is the Weka Machine Learning Workbench
 Step 2: Discover how to get around the Weka platform.
o How to Download and Install the Weka Machine Learning Workbench
o A Tour of the Weka Machine Learning Workbench
 Step 3: Discover how to deliver results with Weka.
o How to Run Your First Classifier in Weka
o Applied Machine Learning With Weka Mini-Course
o Machine Learning Mastery With Weka (my book)

You can see all Weka machine learning posts here. Below is a selection of some of the most
popular tutorials.

Prepare Data in Weka

 How To Load CSV Machine Learning Data in Weka


 How to Better Understand Your Machine Learning Data in Weka
 How to Normalize and Standardize Your Machine Learning Data in Weka
 How To Handle Missing Values In Machine Learning Data With Weka
 How to Perform Feature Selection With Machine Learning Data in Weka

Weka Algorithm Tutorials

 How to Use Machine Learning Algorithms in Weka


 How To Estimate The Performance of Machine Learning Algorithms in Weka
 How To Use Regression Machine Learning Algorithms in Weka
 How To Use Classification Machine Learning Algorithms in Weka
 How to Tune Machine Learning Algorithms in Weka

Python Machine Learning (scikit-learn)


Python is one of the fastest growing platforms for applied machine learning.
You can use the same tools like pandas and scikit-learn in the development and operational
deployment of your model.

Below are the steps that you can use to get started with Python machine learning:

 Step 1: Discover Python for machine learning


o A Gentle Introduction to Scikit-Learn: A Python Machine Learning Library
 Step 2: Discover the ecosystem for Python machine learning.
o Crash Course in Python for Machine Learning Developers
o Python Ecosystem for Machine Learning
o Python is the Growing Platform for Applied Machine Learning
 Step 3: Discover how to work through problems using machine learning in Python.
o Your First Machine Learning Project in Python Step-By-Step
o Python Machine Learning Mini-Course
o Machine Learning Mastery With Python (my book)

You can see all Python machine learning posts here. Below is a selection of some of the most
popular tutorials.

Prepare Data in Python

 How To Load Machine Learning Data in Python


 Understand Your Machine Learning Data With Descriptive Statistics in Python
 Visualize Machine Learning Data in Python With Pandas
 How To Prepare Your Data For Machine Learning in Python with Scikit-Learn
 Feature Selection For Machine Learning in Python

Machine Learning in Python

 Evaluate the Performance of Machine Learning Algorithms


 Metrics To Evaluate Machine Learning Algorithms in Python
 Spot-Check Classification Machine Learning Algorithms in Python with scikit-learn
 Spot-Check Regression Machine Learning Algorithms in Python with scikit-learn
 How To Compare Machine Learning Algorithms in Python with scikit-learn

R Machine Learning (caret)


R is a platform for statistical computing and is the most popular platform among professional
data scientists.

It’s popular because of the large number of techniques available, and because of excellent
interfaces to these methods such as the powerful caret package.

Here’s how to get started with R machine learning:


 Step 1: Discover the R platform and why it is so popular.
o What is R
o Use R For Machine Learning
o Super Fast Crash Course in R
 Step 2: Discover machine learning algorithms in R.
o How To Get Started With Machine Learning Algorithms in R
 Step 3: Discover how to work through problems using machine learning in R.
o Your First Machine Learning Project in R Step-By-Step
o R Machine Learning Mini-Course
o Machine Learning Mastery With R (my book)

You can see all R machine learning posts here. Below is a selection of some of the most popular
tutorials.

Data Preparation in R

 How To Load Your Machine Learning Data Into R


 Better Understand Your Data in R Using Descriptive Statistics
 Better Understand Your Data in R Using Visualization
 Feature Selection with the Caret R Package
 Get Your Data Ready For Machine Learning in R with Pre-Processing

Applied Machine Learning in R

 How to Evaluate Machine Learning Algorithms with R


 Spot Check Machine Learning Algorithms in R
 Tune Machine Learning Algorithms in R
 How to Build an Ensemble Of Machine Learning Algorithms in R
 Compare The Performance of Machine Learning Algorithms in R

Code Algorithm from Scratch (Python)


You can learn a lot about machine learning algorithms by coding them from scratch.

Learning via coding is the preferred learning style for many developers and engineers.

Here’s how to get started with machine learning by coding everything from scratch.

 Step 1: Discover the benefits of coding algorithms from scratch.


o Benefits of Implementing Machine Learning Algorithms From Scratch
o Understand Machine Learning Algorithms By Implementing Them From Scratch
 Step 2: Discover that coding algorithms from scratch is a learning tool only.
o Stop Coding Machine Learning Algorithms From Scratch
o Don’t Start with Open-Source Code When Implementing Machine Learning
Algorithms
 Step 3: Discover how to code machine learning algorithms from scratch in Python.
o Machine Learning Algorithms From Scratch (my book)

You can see all of the Code Algorithms from Scratch posts here. Below is a selection of some of
the most popular tutorials.

Prepare Data

 How to Load Machine Learning Data From Scratch


 How to Scale Machine Learning Data From Scratch

Linear Algorithms

 How To Implement Simple Linear Regression From Scratch


 How To Implement The Perceptron Algorithm From Scratch

Algorithm Evaluation

 How to Code Resampling Methods From Scratch


 How To Code Algorithm Performance Metrics From Scratch

Nonlinear Algorithms

 How to Code the Backpropagation Algorithm From Scratch


 How To Code The Decision Tree Algorithm From Scratch

Introduction to Time Series Forecasting (Python)


Time series forecasting is an important topic in business applications.

Many datasets contain a time component, but the topic of time series is rarely covered in much
depth from a machine learning perspective.

Here’s how to get started with Time Series Forecasting:

 Step 1: Discover Time Series Forecasting.


o What Is Time Series Forecasting?
 Step 2: Discover Time Series as Supervised Learning.
o Time Series Forecasting as Supervised Learning
 Step 3: Discover how to get good at delivering results with Time Series Forecasting.
o Time Series Forecasting With Python Mini-Course
o Time Series Forecasting With Python (my book)
You can see all Time Series Forecasting posts here. Below is a selection of some of the most
popular tutorials.

Data Preparation Tutorials

 7 Time Series Datasets for Machine Learning


 How to Load and Explore Time Series Data in Python
 How to Normalize and Standardize Time Series Data in Python
 Basic Feature Engineering With Time Series Data in Python
 How To Backtest Machine Learning Models for Time Series Forecasting

Forecasting Tutorials

 How to Make Baseline Predictions for Time Series Forecasting with Python
 How to Check if Time Series Data is Stationary with Python
 How to Create an ARIMA Model for Time Series Forecasting with Python
 How to Grid Search ARIMA Model Hyperparameters with Python
 How to Work Through a Time Series Forecast Project

XGBoost in Python (Stochastic Gradient Boosting)


XGBoost is a highly optimized implementation of gradient boosted decision trees.

It is popular because it is being used by some of the best data scientists in the world to win
machine learning competitions.

Here’s how to get started with XGBoost:

 Step 1: Discover the Gradient Boosting Algorithm.


o A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning
 Step 2: Discover XGBoost.
o A Gentle Introduction to XGBoost for Applied Machine Learning
 Step 3: Discover how to get good at delivering results with XGBoost.
o How to Develop Your First XGBoost Model in Python with scikit-learn
o XGBoost With Python Mini-Course
o XGBoost With Python (my book)

You can see all XGBoosts posts here. Below is a selection of some of the most popular tutorials.

XGBoost Basics

 Data Preparation for Gradient Boosting with XGBoost in Python


 How to Evaluate Gradient Boosting Models with XGBoost in Python
 Avoid Overfitting By Early Stopping With XGBoost In Python
 Feature Importance and Feature Selection With XGBoost in Python

XGBoost Tuning

 How to Configure the Gradient Boosting Algorithm


 Tune Learning Rate for Gradient Boosting with XGBoost in Python
 Stochastic Gradient Boosting with XGBoost and scikit-learn in Python
 How to Tune the Number and Size of Decision Trees with XGBoost in Python
 How to Best Tune Multithreading Support for XGBoost in Python

Imbalanced Classification
Imbalanced classification refers to classification tasks where there are many more examples for
one class than another class.

These types of problems often require the use of specialized performance metrics and learning
algorithms as the standard metrics and methods are unreliable or fail completely.

Here’s how you can get started with Imbalanced Classification:

 Step 1: Discover the challenge of imbalanced classification


o A Gentle Introduction to Imbalanced Classification
 Step 2: Discover the intuition for skewed class distributions.
o Develop an Intuition for Severely Skewed Class Distributions
 Step 3: Discover how to solve imbalanced classification problems.
o Step-By-Step Framework for Imbalanced Classification Projects
o Imbalanced Classification With Python (7-Day Mini-Course)
o Imbalanced Classification with Python (my book)

You can see all Imbalanced Classification posts here. Below is a selection of some of the most
popular tutorials.

Performance Measures

 Tour of Evaluation Metrics for Imbalanced Classification


 Failure of Classification Accuracy
 How to Calculate Precision, Recall, and F-Measure

Cost-Sensitive Algorithms

 Cost-Sensitive Logistic Regression


 Cost-Sensitive Decision Trees
 How to Configure XGBoost for Imbalanced Classification
Data Sampling

 Tour of Data Sampling Methods for Imbalanced Classification


 Random Oversampling and Undersampling
 SMOTE Oversampling for Imbalanced Classification

Advanced Methods

 Threshold Moving Methods


 One-Class Classification
 Customised Ensemble Algorithms

Deep Learning (Keras)


Deep learning is a fascinating and powerful field.

State-of-the-art results are coming from the field of deep learning and it is a sub-field of machine
learning that cannot be ignored.

Here’s how to get started with deep learning:

 Step 1: Discover what deep learning is all about.


o What is Deep Learning?
o 8 Inspirational Applications of Deep Learning
 Step 2: Discover the best tools and libraries.
o Introduction to the Python Deep Learning Library Theano
o Introduction to the Python Deep Learning Library TensorFlow
o Introduction to Python Deep Learning with Keras
 Step 3: Discover how to work through problems and deliver results.
o Develop Your First Neural Network in Python With Keras Step-By-Step
o Applied Deep Learning in Python Mini-Course
o Deep Learning With Python (my book)

You can see all deep learning posts here. Below is a selection of some of the most popular
tutorials.

Background

 Crash Course On Multi-Layer Perceptron Neural Networks


 Crash Course in Convolutional Neural Networks for Machine Learning
 Crash Course in Recurrent Neural Networks for Deep Learning

Multilayer Perceptrons
 5 Step Life-Cycle for Neural Network Models in Keras
 How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras
 Save and Load Your Keras Deep Learning Models
 Display Deep Learning Model Training History in Keras
 Dropout Regularization in Deep Learning Models With Keras

Convolutional Neural Networks

 Handwritten Digit Recognition using Convolutional Neural Networks in Python with


Keras
 Object Recognition with Convolutional Neural Networks in the Keras Deep Learning
Library
 Predict Sentiment From Movie Reviews Using Deep Learning

Recurrent Neural Networks

 Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras
 Understanding Stateful LSTM Recurrent Neural Networks in Python with Keras
 Text Generation With LSTM Recurrent Neural Networks in Python with Keras

Better Deep Learning Performance


Although it is easy to define and fit a deep learning neural network model, it can be challenging
to get good performance on a specific predictive modeling problem.

There are standard techniques that you can use to improve the learning, reduce overfitting, and
make better predictions with your deep learning model.

Here’s how to get started with getting better deep learning performance:

 Step 1: Discover the challenge of deep learning.


o Why Training a Neural Network Is Hard
o The Challenge of Training Deep Learning Neural Network Models
 Step 2: Discover frameworks for diagnosing and improving model performance.
o How To Improve Deep Learning Performance
o Framework for Better Deep Learning
o Introduction to Learning Curves for Diagnosing Model Performance
 Step 3: Discover techniques that you can use to improve performance.
o How to Get Better Deep Learning Results (7-Day Mini-Course)
o Better Deep Learning (my book)

You can see all better deep learning posts here. Below is a selection of some of the most popular
tutorials.
Better Learning (fix training)

 How to Control Model Capacity With Nodes and Layers


 How to Choose Loss Functions When Training Neural Networks
 Understand the Impact of Learning Rate on Model Performance
 How to Fix Vanishing Gradients Using the ReLU

Better Generalization (fix overfitting)

 Regularization to Reduce Overfitting of Neural Networks


 How to Use Weight Decay to Reduce Overfitting
 How to Reduce Overfitting With Dropout Regularization
 How to Stop Training At the Right Time Using Early Stopping

Better Predictions (ensembles)

 Ensemble Methods for Deep Learning Neural Networks


 How to Develop Model Averaging Ensembles
 How to Develop a Cross-Validation and Bagging Ensembles
 How to Develop a Stacking Deep Learning Ensemble

Tips, Tricks, and Resources

 8 Tricks for Configuring Backpropagation


 Tricks of the Trade Review
 Three Must-Own Books for Deep Learning Practitioners
 Impact of Dataset Size on Deep Learning Model Skill

Long Short-Term Memory Networks (LSTMs)


Long Short-Term Memory (LSTM) Recurrent Neural Networks are designed for sequence
prediction problems and are a state-of-the-art deep learning technique for challenging prediction
problems.

Here’s how to get started with LSTMs in Python:

 Step 1: Discover the promise of LSTMs.


o The Promise of Recurrent Neural Networks for Time Series Forecasting
 Step 2: Discover where LSTMs are useful.
o Making Predictions with Sequences
o A Gentle Introduction to Long Short-Term Memory Networks by the Experts
o Introduction to Models for Sequence Prediction
 Step 3: Discover how to use LSTMs on your project.
o The 5 Step Life-Cycle for Long Short-Term Memory Models in Keras
o Long Short-Term Memory Networks (Mini-Course)
o Long Short-Term Memory Networks With Python (my book)

You can see all LSTM posts here. Below is a selection of some of the most popular tutorials
using LSTMs in Python with the Keras deep learning library.

Data Preparation for LSTMs

 How to Reshape Input Data for Long Short-Term Memory Networks


 How to One Hot Encode Sequence Data
 How to Remove Trends and Seasonality with a Difference Transform
 How to Scale Data for Long Short-Term Memory Networks
 How to Prepare Sequence Prediction for Truncated BPTT
 How to Handle Missing Timesteps in Sequence Prediction Problems

LSTM Behaviour

 A Gentle Introduction to Backpropagation Through Time


 Demonstration of Memory with a Long Short-Term Memory Network
 How to Use the TimeDistributed Layer for Long Short-Term Memory Networks
 How to use an Encoder-Decoder LSTM to Echo Sequences of Random Integers
 Attention in Long Short-Term Memory Recurrent Neural Networks

Modeling with LSTMs

 Generative Long Short-Term Memory Networks


 Stacked Long Short-Term Memory Networks
 Encoder-Decoder Long Short-Term Memory Networks
 CNN Long Short-Term Memory Networks
 Diagnose Overfitting and Underfitting of LSTM Models
 How to Make Predictions with Long Short-Term Memory Models

LSTM for Time Series

 On the Suitability of LSTMs for Time Series Forecasting


 Time Series Forecasting with the Long Short-Term Memory Network
 Multi-step Time Series Forecasting with Long Short-Term Memory Networks
 Multivariate Time Series Forecasting with LSTMs in Keras

Deep Learning for Natural Language Processing (NLP)


Working with text data is hard because of the messy nature of natural language.
Text is not “solved” but to get state-of-the-art results on challenging NLP problems, you need to
adopt deep learning methods

Here’s how to get started with deep learning for natural language processing:

 Step 1: Discover what deep learning for NLP is all about.


o What is Natural Language Processing?
o What is Deep Learning?
o Promise of Deep Learning for Natural Language Processing
 Step 2: Discover standard datasets for NLP.
o 7 Applications of Deep Learning for Natural Language Processing
o Datasets for Natural Language Processing
 Step 3: Discover how to work through problems and deliver results.
o Crash-Course in Deep Learning for Natural Language Processing
o Deep Learning for Natural Language Processing (my book)

You can see all deep learning for NLP posts here. Below is a selection of some of the most
popular tutorials.

Bag-of-Words Model

 What is the Bag-of-Words Model?


 How to Prepare Text Data for Machine Learning with scikit-learn
 How to Develop a Bag-of-Words Model for Predicting Sentiment

Language Modeling

 Gentle Introduction to Statistical Language Modeling and Neural Language Models


 How to Develop a Character-Based Neural Language Model in Keras
 How to Develop a Word-Level Neural Language Model and Use it to Generate Text

Text Summarization

 A Gentle Introduction to Text Summarization


 How to Prepare News Articles for Text Summarization
 Encoder-Decoder Models for Text Summarization in Keras

Text Classification

 Best Practices for Text Classification with Deep Learning


 How to Develop a Bag-of-Words Model for Sentiment Analysis
 How to Develop a CNN for Sentiment Analysis

Word Embeddings
 What are Word Embeddings?
 How to Develop Word Embeddings in Python with Gensim
 How to Use Word Embedding Layers for Deep Learning with Keras

Photo Captioning

 How to Automatically Generate Textual Descriptions for Photographs with Deep


Learning
 A Gentle Introduction to Deep Learning Caption Generation Models
 How to Develop a Deep Learning Photo Caption Generator from Scratch

Text Translation

 A Gentle Introduction to Neural Machine Translation


 How to Configure an Encoder-Decoder Model for Neural Machine Translation
 How to Develop a Neural Machine Translation System from Scratch

Deep Learning for Computer Vision


Working with image data is hard because of the gulf between raw pixels and the meaning in the
images.

Computer vision is not solved, but to get state-of-the-art results on challenging computer vision
tasks like object detection and face recognition, you need deep learning methods.

Here’s how to get started with deep learning for computer vision:

 Step 1: Discover what deep learning for Computer Vision is all about.
o What is Computer Vision?
o What is the Promise of Deep Learning for Computer Vision?
 Step 2: Discover standard tasks and datasets for Computer Vision.
o 9 Applications of Deep Learning for Computer Vision
o How to Load and Visualize Standard Computer Vision Datasets With Keras
o How to Develop and Demonstrate Competence With Deep Learning for
Computer Vision
 Step 3: Discover how to work through problems and deliver results.
o How to Get Started With Deep Learning for Computer Vision (7-Day Mini-
Course)
o Deep Learning for Computer Vision (my book)

You can see all deep learning for Computer Vision posts here. Below is a selection of some of
the most popular tutorials.

Image Data Handling


 How to Load and Manipulate Images With PIL/Pillow
 How to Load, Convert, and Save Images With the Keras API
 Introduction to hannels First and Channels Last Image Formats

Image Data Augmentation

 How to Load Large Datasets From Directories


 How to Configure and Use Image Data Augmentation
 Introduction to Test-Time Data Augmentation

Image Classification

 How to Develop a CNN for CIFAR-10 Photo Classification


 How to Develop a CNN to Classify Photos of Dogs and Cats
 How to Develop a CNN to Classify Satellite Photos

Image Data Preparation

 How to Manually Scale Image Pixel Data for Deep Learning


 How to Evaluate Pixel Scaling Methods for Image Classification
 How to Normalize, Center, and Standardize Images in Keras

Basics of Convolutional Neural Networks

 Gentle Introduction to Convolutional Layers in CNNS


 Gentle Introduction to Padding and Stride in CNNs
 Gentle Introduction to Pooling Layers in CNNs

Object Recognition

 A Gentle Introduction to Object Recognition


 How to Perform Object Detection with Mask R-CNN
 How to Perform Object Detection With YOLOv3 in Keras

Deep Learning for Time Series Forecasting


Deep learning neural networks are able to automatically learn arbitrary complex mappings from
inputs to outputs and support multiple inputs and outputs.

Methods such as MLPs, CNNs, and LSTMs offer a lot of promise for time series forecasting.

Here’s how to get started with deep learning for time series forecasting:

 Step 1: Discover the promise (and limitations) of deep learning for time series.
o The Promise of Recurrent Neural Networks for Time Series Forecasting
o On the Suitability of Long Short-Term Memory Networks for Time Series
Forecasting
o Results From Comparing Classical and Machine Learning Methods for Time
Series Forecasting
 Step 2: Discover how to develop robust baseline and defensible forecasting models.
o Taxonomy of Time Series Forecasting Problems
o How to Develop a Skillful Machine Learning Time Series Forecasting Model
 Step 3: Discover how to build deep learning models for time series forecasting.
o How to Get Started with Deep Learning for Time Series Forecasting (7-Day Mini-
Course)
o Deep Learning for Time Series Forecasting (my book)

You can see all deep learning for time series forecasting posts here. Below is a selection of some
of the most popular tutorials.

Forecast Trends and Seasonality (univariate)

 Grid Search SARIMA Models for Time Series Forecasting


 Grid Search Exponential Smoothing for Time Series Forecasting
 Develop Deep Learning Models for Univariate Forecasting

Human Activity Recognition (multivariate classification)

 How to Model Human Activity From Smartphone Data


 How to Develop CNN Models for Human Activity Recognition
 How to Develop RNN Models for Human Activity Recognition

Forecast Electricity Usage (multivariate, multi-step)

 How to Load and Explore Household Electricity Usage Data


 Multi-step Time Series Forecasting with Machine Learning
 How to Develop CNNs for Multi-Step Time Series Forecasting

Models Types

 How to Develop MLPs for Time Series Forecasting


 How to Develop CNNs for Time Series Forecasting
 How to Develop LSTMs for Time Series Forecasting

Time Series Case Studies

 Indoor Movement Time Series Classification


 Probabilistic Forecasting Model to Predict Air Pollution Days
 Predict Room Occupancy Based on Environmental Factors
 Predict Whether Eyes are Open or Closed Using Brain Waves

Forecast Air Pollution (multivariate, multi-step)

 Load, Visualize, and Explore a Air Pollution Forecasting


 Develop Baseline Forecasts for Air Pollution Forecasting
 Develop Autoregressive Models for Air Pollution Forecasting
 Develop Machine Learning Models for Air Pollution Forecasting

Generative Adversarial Networks (GANs)


Generative Adversarial Networks, or GANs for short, are an approach to generative modeling
using deep learning methods, such as convolutional neural networks.

GANs are an exciting and rapidly changing field, delivering on the promise of generative models
in their ability to generate realistic examples across a range of problem domains, most notably in
image-to-image translation tasks.

Here’s how to get started with deep learning for Generative Adversarial Networks:

 Step 1: Discover the promise of GANs for generative modeling.


o 18 Impressive Applications of Generative Adversarial Networks
 Step 2: Discover the GAN architecture and different GAN models.
o A Gentle Introduction to Generative Adversarial Networks
o A Tour of Generative Adversarial Network Models
 Step 3: Discover how to develop GAN models in Python with Keras.
o How to Get Started With Generative Adversarial Networks (7-Day Mini-Course)
o Generative Adversarial Networks with Python (my book)

You can see all Generative Adversarial Network tutorials listed here. Below is a selection of
some of the most popular tutorials.

GAN Fundamentals

 How to Code the GAN Training Algorithm and Loss Functions


 How to use the UpSampling2D and Conv2DTranspose Layers
 How to Implement GAN Hacks in Keras to Train Stable Models

GAN Loss Functions

 How to Implement Wasserstein Loss (WGAN)


 How to Develop a Least Squares GAN (LSGAN)

Develop Simple GAN Models


 How to Develop a 1D GAN From Scratch
 How to Develop a GAN for Generating MNIST Digits
 How to Develop a GAN to Generate CIFAR10 Photos

GANs for Image Translation

 How to Implement Pix2Pix GAN Models From Scratch


 How to Implement CycleGAN Models From Scratch

Need More Help?


I’m here to help you become awesome at applied machine learning.

If you still have questions and need help, you have some options:

 Ebooks: I sell a catalog of Ebooks that show you how to get results with machine
learning, fast.
o Machine Learning Mastery EBook Catalog
 Blog: I write a lot about applied machine learning on the blog, try the search feature.
o Machine Learning Mastery Blog
 Frequently Asked Questions: The most common questions I get and their answers
o Machine Learning Mastery FAQ
 Contact: You can contact me with your question, but one question at a time please.
o Machine Learning Mastery Contact

You might also like