Introduction To Machine Learning Top-Down Approach - Towards Data Science
Introduction To Machine Learning Top-Down Approach - Towards Data Science
Top-Down Approach
A super smooth introduction to machine learning
Maher
Feb 22 · 5 min read
There are 4,156,513,325 Users on the internet, If you tried to count them it’d take
more than 128 years.
I can go on like this forever, and these numbers increase incredibly fast, A great amount
of data exists that we can even make computers have insights about, we can program
the computer to develop some kind of experience from going through this data and
formulate a small simple Brain (dumb in most cases) but dedicated to solving a specific
problem or even more than one as long as you’re a ninja level programmer/researcher.
This Brain we often call it a model, it’s most probably a function, if you have an
enormous amount of data and want the brain/model to be complex and sophisticated,
you can make it a much higher degree function (aka neural network) and by higher
I’m not talking about the cubic or quadratic functions we took in school I mean much
higher.
The good thing is (Introduction to Machine Learning) is not the place to take you
deep into this complex mathematical stuff.
First I would convince you that machine learning is a really interesting field and tell
you some stuff about robots, face detections, SPAM mail detection and on and on but I
want this to be short and to the point, so read the following, try to make a map in your
mind and stay focused, starting now!
Define an outline
— the MachineLearning-System reads the dataset and optimizes the Brain/Model to
solve the problem.
In Supervised ML-System, The training data (dataset) you feed to the algorithm
includes the desired solutions, called labels.
In Reinforcement ML-System, It’s a little different, the model here is called agent,
and its job is to perform actions and he gets rewards, The agent then learns as he’s
trying to maximize the rewards, it’s like training your pet with treats.
2. Whether or not the Model can learn incrementally on the fly ( Online Learning,
Batch Learning), with simpler words:
For example, let’s use housing prices dataset, it’s mainly a table where each row
represents a house, columns = [ ‘house_id’, ’house_size’, ‘no_of_rooms’, ’price’ ] and
the Problem is to predict the price of a house.
Deciding which features to use is a really important step and you have to consider a lot
of things, I’ll walk you through this step in another article, but we can start with
features = [house_size, no_of_rooms] because it’s a simple example.
The model we’re using will try to have a sense of how these labels(outputs) can be
estimated from the given features, in other words, how the price of the house is
affected by its size and number of rooms in it. this process called, Training.
3. Poor-Quality Data.
which means it’s full of unintended errors, noise, and outliers, this will make it
harder for the model to detect patterns, and most probably will not perform well.
Have some time to clean the data or discard the noisy parts.
4. Irrelevant features.
Features selection is a very important step, The model can detect relations between
the desired label(solution) and an irrelevant feature, that’ll make the predictions
more random.
5. Overfitting.
it means that the model performs well on the training data, but it does not
generalize well. in other words, The model has studied the training data very well
that it memorized it, This happens most probably because the model is too complex
relative to the amount of data and its noisiness. (I’ll tell you what to do if you
encountered this problem in another article).
6. Underfitting.
It’s obvious that it’s the opposite of overfitting, this happens when The model is too
simple to understand the data.
. . .
To know which model to choose for your project, read this article Which machine
learning model to use?
Here are some models you can take a look at and I’ll write articles to explain how they
work.
linear Regression
Logistic Regression
You can read 5-Steps and 10-Steps, to Learn Machine Learning article published in
Towards Data Science Publication