IML unit 1 notes
IML unit 1 notes
Machine learning is a branch of artificial intelligence and computer science which focuses on the
use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.
In 1997, Tom Mitchell gave a “well-posed” mathematical and relational definition for Machine
Learning as, “A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves with
experience E.
• Algorithm: A Machine Learning algorithm is a set of rules and statistical techniques used
to learn patterns from data and draw significant information from it. It is the logic behind
a Machine Learning model.
An example of a Machine Learning algorithm is, the Linear Regression algorithm.
• Model: A model is the main component of Machine Learning. A model is trained by using
a Machine Learning Algorithm. An algorithm maps all the decisions that a model is
supposed to take based on the given input, in order to get the correct output.
• Predictor Variable: It is a feature(s) of the data that can be used to predict the output.
• Response Variable: It is the feature or the output variable that needs to be predicted by
using the predictor variable(s).
• Training Data: The Machine Learning model is built using the training data. The training
data helps the model to identify key trends and patterns essential to predict the output.
• Testing Data: After the model is trained, it must be tested to evaluate how accurately it
can predict an outcome. This is done by the testing data set.
At this step, we must understand what exactly needs to be predicted. In our case, the objective is
to predict the possibility of rain by studying weather conditions. At this stage, it is also essential
to take mental notes on what kind of data can be used to solve this problem or the type of approach
you must follow to get to the solution.
Once you know the types of data that is required, you must understand how you can derive this
data. Data collection can be done manually or by web scraping. However, if you’re a beginner and
you’re just looking to learn Machine Learning you don’t have to worry about getting the data.
There are 1000s of data resources on the web, you can just download the data set and get going.
Coming back to the problem at hand, the data needed for weather forecasting includes measures
such as humidity level, temperature, pressure, locality, whether or not you live in a hill station, etc.
Such data must be collected and stored for analysis.
RKR21 IML II/I ALL BRANCHES
The data you collected is almost never in the right format. You will encounter a lot of
inconsistencies in the data set such as missing values, redundant variables, duplicate values, etc.
Removing such inconsistencies is very essential because they might lead to wrongful computations
and predictions. Therefore, at this stage, you scan the data set for any inconsistencies and you fix
them then and there.
Grab your detective glasses because this stage is all about diving deep into data and finding all the
hidden data mysteries. EDA or Exploratory Data Analysis is the brainstorming stage of Machine
Learning. Data Exploration involves understanding the patterns and trends in the data. At this
stage, all the useful insights are drawn and correlations between the variables are understood.
For example, in the case of predicting rainfall, we know that there is a strong possibility of rain if
the temperature has fallen low. Such correlations must be understood and mapped at this stage.
All the insights and patterns derived during Data Exploration are used to build the Machine
Learning Model. This stage always begins by splitting the data set into two parts, training data,
and testing data. The training data will be used to build and analyze the model. The logic of the
model is based on the Machine Learning Algorithm that is being implemented.
In the case of predicting rainfall, since the output will be in the form of True (if it will rain
tomorrow) or False (no rain tomorrow), we can use a Classification Algorithm such as Logistic
Regression.
After building a model by using the training data set, it is finally time to put the model to a test.
The testing data set is used to check the efficiency of the model and how accurately it can predict
the outcome. Once the accuracy is calculated, any further improvements in the model can be
RKR21 IML II/I ALL BRANCHES
implemented at this stage. Methods like parameter tuning and cross-validation can be used to
improve the performance of the model.
Step 7: Predictions
Once the model is evaluated and improved, it is finally used to make predictions. The final output
can be a Categorical variable (eg. True or False) or it can be a Continuous Quantity (eg. the
predicted value of a stock).
In our case, for predicting the occurrence of rainfall, the output will be a categorical variable.
So that was the entire Machine Learning process. Now it’s time to learn about the different ways
in which Machines can learn.
We can train machine learning algorithms by providing them the huge amount of data and let them
explore the data, construct the models, and predict the required output automatically.
1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition
and face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.
RKR21 IML II/I ALL BRANCHES
It is based on the Facebook project named "Deep Face," which is responsible for face recognition
and person identification in the picture.
While using Google, we get an option of "Search by voice," it comes under speech recognition,
and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present, machine learning algorithms
are widely used by various applications of speech recognition. Google assistant, Siri, Cortana,
and Alexa are using speech recognition technology to follow the voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct path
with the shortest route and predicts the traffic conditions.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
RKR21 IML II/I ALL BRANCHES
product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning. Google understands the user
interest using various machine learning algorithms and suggests the product as per customer
interest.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine learning
plays a significant role in self-driving cars. Tesla, the most popular car manufacturing company is
working on self-driving car. It is using unsupervised learning method to train the car models to
detect people and objects while driving.
Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We
always receive an important mail in our inbox with the important symbol and spam emails in our
spam box, and the technology behind this is Machine learning.
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As
the name suggests, they help us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.
Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is
a genuine transaction or a fraud transaction.
1. Supervised learning
2. Unsupervised learning
3. Semi supervised learning
4. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample labeled data
to the machine learning system in order to train it, and on that basis, it predicts the output. The
system creates a model using labeled data to understand the datasets and learn about each data,
once the training and processing are done then we test the model by providing a sample data to
check whether it is predicting the exact output or not.
For Example:
• Let us consider images that are labeled a spoon or a knife. This known data is fed to the
machine, which analyzes and learns the association of these images based on its features such
as shape, size, sharpness, etc.
RKR21 IML II/I ALL BRANCHES
• Now when a new image is fed to the machine without any label, the machine is able to predict
accurately that it is a spoon with the help of the past data.
The goal of supervised learning is to map input data with the output data. The supervised learning
is based on supervision, and it is the same as when a student learns things in the supervision of the
teacher. The example of supervised learning is spam filtering.
• Classification is used when the output variable is categorical i.e. with 2 or more classes.
• For example, yes or no, male or female, true or false, etc.
RKR21 IML II/I ALL BRANCHES
• In order to predict whether a mail is spam or not, we need to first teach the machine what
a spam mail is.
• This is done based on a lot of spam filters - reviewing the content of the mail, reviewing
the mail header, and then searching if it contains any false information.
• Certain keywords and blacklist filters that blackmails are used from already blacklisted
spammers.
• All of these features are used to score the mail and give it a spam score. The lower the total
spam score of the email, the more likely that it is not a scam.
• Based on the content, label, and the spam score of the new incoming mail, the algorithm
decides whether it should land in the inbox or spam folder.
• For example, salary based on work experience or weight based on height, etc.
Let’s consider two variables -humidity and temperature. Here, ‘temperature’ is the independent
variable and ‘humidity' is the dependent variable. If the temperature increases, then the humidity
decreases.
These two variables are fed to the model and the machine learns the relationship between them.
After the machine is trained, it can easily predict the humidity based on the given temperature.
• Prediction of future cases: Use the rule to predict the output for future inputs
• Knowledge extraction: The rule is easy to understand
RKR21 IML II/I ALL BRANCHES
● Slow -it requires human experts to manually label training examples one by one
● Costly -a model should be trained on the large volumes of hand-labeled data to provide
accurate predictions.
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any supervision.
The training is provided to the machine with the set of data that has not been labeled, classified,
or categorized, and the algorithm needs to act on that data without any supervision. The goal of
unsupervised learning is to restructure the input data into new features or a group of objects with
similar patterns.
Example:
• Let's take a similar example as before, but this time we do not tell the machine whether it's
a spoon or a knife.
• The machine identifies patterns from the given set and groups them based on their patterns,
similarities, etc.
In unsupervised learning, we don't have a predetermined result. The machine tries to find useful
insights from the huge amount of data. It can be further classifieds into two categories of
algorithms:
o Clustering
o Association
• Clustering is the method of dividing the objects into clusters that are similar between them
and are dissimilar to the objects belonging to another cluster.
• For example, finding out which customers made similar product purchases.
Example:
• Suppose a telecom company wants to reduce its customer churn rate by providing
personalized call and data plans.
• The behavior of the customers is studied and the model segments the customers with
similar traits.
• Group A customers use more data and also have high call durations. Group B customers
are heavy Internet users, while Group C customers have high call duration.
• So, Group B will be given more data benefit plants, while Group C will be given cheaper
called call rate plans and group A will be given the benefit of both.
Association - Unsupervised Learning
RKR21 IML II/I ALL BRANCHES
• Let’s say that a customer goes to a supermarket and buys bread, milk, fruits, and wheat.
Another customer comes and buys bread, milk, rice, and butter.
• Now, when another customer comes, it is highly likely that if he buys bread, he will buy
milk too.
• Hence, a relationship is established based on customer behavior and recommendations are
made.
• Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labeled input data.
• Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to
labeled data.
1. Similarity detection
2. Automatic labeling
3. Object segmentation (such as Person, Animal, Films)
The goal in such unsupervised learning problems may be to discover groups of similar examples
within the data, where it is called clustering, or to determine the distribution of data within the
input space, known as density estimation, or to project the data from a high-dimensional space
down to two or three dimensions for the purpose of visualization.
3) Semi-Supervised Learning
Semi-supervised learning is supervised learning where the training data contains very few labeled
examples and a large number of unlabeled examples.
The goal of a semi-supervised learning model is to make effective use of all of the available data,
not just the labeled data like in supervised learning.
• Semi-supervised learning is an important category that lies between the Supervised and
Unsupervised machine learning.
• To overcome the drawbacks of supervised learning and unsupervised learning
algorithms, the concept of Semi-supervised learning is introduced.
• Labeled data exists with a very small amount while it consists of a huge amount of
unlabeled data.
• Initially, similar data is clustered along with an unsupervised learning algorithm, and
further, it helps to label the unlabeled data into labeled data.
• It is why label data is a comparatively, more expensive acquisition than unlabeled data.
4) Reinforcement Learning
In many complex domains, reinforcement learning is the only feasible way to train a program to
perform at high levels. For example, in game playing, it is very hard for a human to provide
accurate and consistent evaluations of large numbers of positions, which would be needed to train
an evaluation function directly from examples. Instead, the program can be told when it has won
or lost, and it can use this information to learn an evaluation function that gives reasonably accurate
estimates of the probability of winning from any given position.
• Suppose there is an AI agent present within a maze environment, and his goal is to find the
diamond.
RKR21 IML II/I ALL BRANCHES
• The agent interacts with the environment by performing some actions, and based on those
actions, the state of the agent gets changed, and it also receives a reward or penalty as
feedback.
• The agent continues doing these three things (take action, change state/remain in the
same state, and get feedback), and by doing these actions, he learns and explores the
environment.
• The agent learns that what actions lead to positive feedback or rewards and what actions
lead to negative feedback penalty.
• As a positive reward, the agent gets a positive point, and as a penalty, it gets a negative
point.
Trained using
Trained using Works on
Learns by using unlabeled data
Definition both labeled& interacting with
labeled data without any
unlabeled data the environment
guidance
Type of No –predefined
Labeled data Unlabeled data Both
data data
Type of Regression and Association and Classification and Exploitation or
problems classification Clustering Regression Exploration
Linear Regression,
Text document Q –Learning,
Algorithms Logistic Regression, K –Means
classifier SARSA
SVM, KNN etc.
Classify the data
Discover and also
Learn a series of
Aim Calculate outcomes underlying discovers
action
patterns underlying
patterns
Recommendatio
Self Driving
Risk Evaluation, n System, Text
Application Cars, Gaming,
Forecast Sales Anomaly Classification
Healthcare
Detection
Batch learning is also called offline learning. The models trained using batch or offline learning
are moved into production only at regular intervals based on the performance of models trained
with new data.
Building offline models or models trained in a batch manner requires training the models with the
entire training data set. Improving the model performance would require re-training all over again
with the entire training data set. These models are static in nature which means that once they get
trained; their performance will not improve until a new model gets re-trained. Offline models or
models trained using batch learning are deployed in the production environment by replacing the
old model with the newly trained model.
RKR21 IML II/I ALL BRANCHES
There can be various reasons why we can choose to adopt batch learning for training the models.
Some of these reasons are the following:
If the models trained using batch learning needs to learn about new data, the models need to be
retrained using the new data set and replaced appropriately with the model already in production
based on different criteria such as model performance. The whole process of batch learning can be
automated as well. The disadvantage of batch learning is it takes a lot of time and resources to re-
training the model.
The criteria based on which the machine learning models can be decided to train in a batch manner
depends on the model performance. Red-amber-green statuses can be used to determine the health
of the model based on the prediction accuracy or error rates. Accordingly, the models can be
chosen to be retrained or otherwise.
The following stakeholders can be involved in reviewing the model performance and leveraging
batch learning:
● Business/product owners
● Product managers
● Data scientists
● ML engineers
In online learning, the training happens in an incremental manner by continuously feeding data as
it arrives or in a small group. Each learning step is fast and cheap, so the system can learn about
new data on the fly, as it arrives.
RKR21 IML II/I ALL BRANCHES
Online learning is great for machine learning systems that receive data as a continuous flow (e.g.,
stock prices) and need to adapt to change rapidly or autonomously. It is also a good option if you
have limited computing resources: once an online learning system has learned about new data
instances, it does not need them anymore, so you can discard them (unless you want to be able to
roll back to a previous state and “replay” the data) or move the data to another form of storage
(warm or cold storage) if you are using the data lake. This can save a huge amount of space and
cost.
Online learning algorithms can also be used to train systems on huge datasets that cannot fit in one
machine’s main memory (this is also called out-of-core learning). The algorithm loads part of the
data runs a training step on that data and repeats the process until it has run on all of the data.
One of the key aspects of online learning is the learning rate. The rate at which you want your
machine learning to adapt to new data set is called the learning rate. A system with a high learning
rate will tend to forget the learning quickly. A system with a low learning rate will be more like
batch learning.
One of the big disadvantages of an online learning system is that if it is fed with bad data, the
system will have bad performance and the user will see the impact instantly. Thus, it is very
important to come up with appropriate data governance strategy to ensure that the data fed is of
high quality. In addition, it is very important to monitor the performance of the machine learning
system in a very close manner.
RKR21 IML II/I ALL BRANCHES
The following are some of the challenges for adopting an online learning method:
● Data governance
● Model governance includes appropriate algorithm and model selection on-the-fly
Online models require only a single deployment in the production setting and they evolve over a
period of time. The disadvantage that the online models have is that they don’t have the entire
dataset available for the training. The models are trained in an incremental manner based on the
assumptions made using the available data and the assumptions at times can be sub-optimal.
More complex because the model Less complex because the model is
Complexity keeps evolving over time as more fed with more consistent data sets
data becomes available. periodically.
Noisy Data-It is responsible for an inaccurate prediction that affects the decision as well
as accuracy in classification tasks.
Incorrect data-It is also responsible for faulty programming and results obtained in
machine learning models. Hence, incorrect data may affect the accuracy of the results
also.
Generalizing of output data-Sometimes, it is also found that generalizing output data
becomes complex, which results in comparatively poor future actions.
Regular monitoring and maintenance become compulsory for the same. Different results for
different actions require data change; hence editing of codes as well as resources for monitoring
them also become necessary.
Although Machine Learning and Artificial Intelligence are continuously growing in the market,
still these industries are fresher in comparison to others.
Machine learning includes analyzing the data, removing data bias, training data, applying complex
mathematical calculations, etc., making the procedure more complicated and quite tedious.
Machine learning models are highly efficient in producing accurate results but are time-
consuming. Slow programming, excessive requirements' and overloaded data take more time to
provide accurate results than expected.
Overfitting:
RKR21 IML II/I ALL BRANCHES
•Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model.
•Let's understand with a simple example where we have a few training data sets such as
1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then there is a
considerable probability of identification of an apple as papaya because we have a
massive amount of biased data in the training data set.
Underfitting:
•Whenever a machine learning model is trained with fewer amounts of data, and as a
result, it provides incomplete and inaccurate data and destroys the accuracy of the
machine learning model.
Probability
Probability is the foundation stone of ML, which tells how likely is the event to occur. The value
of Probability always lies between 0 to 1. 1 indicates more likely that event will occur. 0 indicates
that event will not occur.
Formula:
Probability of an event = (Number of way an event can occur) / (Total number of outcomes)
When a coin is tossed, there are two possible outcomes: Heads (H) or Tails (T)
Number of ways Head can happen: 1(there is only 1 face with a “H" on Coin)
Total number of outcomes: 2(there are 2 faces altogether)
So the probability of Head (H)= 1/2 i.e 50%, similar for Tail also
Number of ways it can happen: 1(there is only 1 face with a "4" on it)
Total number of outcomes: 6(there are 6 faces altogether)
So the probability =1/6