0% found this document useful (0 votes)
62 views

Machine Learning 3

Uploaded by

suvajit2021
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Machine Learning 3

Uploaded by

suvajit2021
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Machine

Learning
D R. S O U M I D U T TA
Machine Learning Model
“A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.”

The above definition is basically focusing on three parameters, also the main
components of any learning algorithm, namely Task(T), Performance(P) and
experience (E). In this context, we can simplify this definition as −

ML is a field of AI consisting of learning algorithms that −

 Improve their performance (P)

 At executing some task (T)

 Over time with experience (E)


Task(T)
From the perspective of problem, we may define the task T as the real-world
problem to be solved. The problem can be anything like finding best house price in
a specific location or to find best marketing strategy etc. On the other hand, if we
talk about machine learning, the definition of task is different because it is difficult
to solve ML based tasks by conventional programming approach.

A task T is said to be a ML based task when it is based on the process and the
system must follow for operating on data points. The examples of ML based tasks
are Classification, Regression, Structured annotation, Clustering, Transcription etc.
Experience (E)
As name suggests, it is the knowledge gained from data points provided to the
algorithm or model. Once provided with the dataset, the model will run iteratively
and will learn some inherent pattern. The learning thus acquired is called
experience(E). Making an analogy with human learning, we can think of this
situation as in which a human being is learning or gaining some experience from
various attributes like situation, relationships etc. Supervised, unsupervised and
reinforcement learning are some ways to learn or gain experience. The experience
gained by out ML model or algorithm will be used to solve the task T.
Performance(P)
An ML algorithm is supposed to perform task and gain experience with the passage
of time. The measure which tells whether ML algorithm is performing as per
expectation or not is its performance (P).

P is basically a quantitative metric that tells how a model is performing the task, T,
using its experience, E. There are many metrics that help to understand the ML
performance, such as accuracy score, F1 score, confusion matrix, precision, recall,
sensitivity etc.
Machine Learning - Python Libraries
Some popular Python machine learning libraries are as follows −

NumPy Keras

Pandas Matplotlib

SciPy Seaborn

Scikit-learn OpenCV

PyTorch NLTK

TensorFlow SpaCy
Machine Learning Life Cycle(P)
The machine learning life cycle is an iterative process that moves from a business problem to a machine learning solution.
It is used as a guide for developing a machine learning project to solve a problem. It provides us with instructions and best
practices to be used in each phase while developing ML solutions.

The machine learning life cycle is a process that involves several phases from problem identification to model deployment
and monitoring. While developing an ML project, each step in the life cycle is revisited many times through these phases.
The stages/ phases involved in the end to end machine life cycle process are as follows −
 Problem Definition

 Data Preparation

 Model Development

 Model Deployment

 Monitoring and Maintenance


Problem Definition
The first step in the machine learning life cycle is to identify the
problem you want to solve. It is a crucial step which helps you start
building a machine learning solution for a problem. This process of
identifying a problem would establish an understanding about what the
output might be, scope of the task and its objective.

As this step lays the foundation for building a machine learning model,
the problem definition has to be clear and concise.

This stage involves understanding the business problem, defining the


problem statement, and identifying the success criteria for the machine
learning model.
Data Preparation
Data preparation process includes collecting data, preprocessing data, and feature engineering & feature selection. This
stage generally also includes exploratory data analysis.

1. Data Collection:
After the problem statement is analyzed, the next step would be collecting data. This involves gathering data from various
sources which is given as a raw material to the machine learning model. Few features that are considered while collecting
data are −
Relevant and usefulness − The data collected has to be relevant to the problem statement, and also should be useful
enough to train the machine learning model efficiently.

Quality and Quantity − The quality and quantity of the data collected would directly impact the performance of the
machine learning model.

Variety − Make sure that the data collected is diverse so that the model can be trained with multiple scenarios to
recognize the patterns.
Data Preparation
2. Data Preprocessing:
The data collected often might be unstructured and messy which causes it to negatively affect the outcomes, hence pre
processing data is important to improve the accuracy and performance of the machine learning model. Issues that have to
be addressed are missing values, duplicate data, invalid data and noise.

3. Analyzing Data:
After the data is all sorted, it is time to understand the data that is collected. The data is visualized and statistically
summarized to gain insights. Various tools like Power BI, Tableau are used to visualize data which helps in understanding
the patterns and trends in the data.

4. Feature Engineering and Selection:


A 'Feature' is an individual measurable quantity which is preferably observed when the machine learning model is being
trained. Feature Engineering is the process of creating new features or enhancing the existing ones to accurately
understand the patterns and trends in the data.
Model Development
In the model development phase, the machine learning model is built using the prepared data. The model building process involves
selecting the appropriate machine learning algorithm, algorithm training, tuning the hyperparameters of the algorithm, and evaluating
the performance of the model using cross-validation techniques. This phase mainly consists of three steps, such as –

1. Model Selection:
Model selection is a crucial step in the machine learning workflow. The decision of choosing a model depends on basic features like
characteristics of the data, complexity of the problem, desired outcomes and how well it aligns with the defined problem.

2. Model Training:
In this process, the algorithm is fed with a preprocessed dataset to identify and understand the patterns and relationships in the specified
features.

3. Model Evaluation:
In model evaluation, the performance of the machine learning model is evaluated using a set of evaluation metrics. These metrics
measure the accuracy, precision, recall, and F1 score of the model. If the model has not achieved desired performance, the model is
tuned to adjust hyper parameters and improve the predictive accuracy. This continuous iteration is essential to make the model more
accurate and reliable.
Model Deployment
In the model deployment phase, we deploy the machine learning model into production. This process involves
integrating the tested model with existing systems to make it available to users, management or other purposes.
This also involves testing the model in a real-world scenario.

Two important factors that have to be checked before deploying are whether the model is portable i.e, the ability
to transfer the software from one machine to another and scalable i.e, the model need not be redesigned to
maintain performance.
Monitor and Maintenance
Monitoring in machine learning involves techniques to measure the model performance metrics and to detect
issues in the models. After an issue is detected, the model has to be trained with new data or the architecture
has to be modified.

Sometimes when the issue detected in the designed model cannot be solved with training it with new data, the
issue becomes the problem statement. So, the machine learning life cycle revamps from analyzing the problem
again to develop an improved model.

The machine learning life cycle is an iterative process, and it may be necessary to revisit previous stages to
improve the model's performance or address new requirements. By following the machine learning life cycle,
data scientists can ensure that their machine learning models are effective, accurate, and meet the business
requirements.
Supervised Machine Learning
Supervised learning is the types of machine learning in which machines are trained using well "labelled" training
data, and on basis of that data, machines predict the output. The labelled data means some input data is already
tagged with the correct output. In supervised learning, the training data provided to the machines work as the
supervisor that teaches the machines to predict the output correctly. It applies the same concept as a student
learns in the supervision of the teacher.

Supervised learning is a process of providing input data as well as correct output data to the machine learning
model. The aim of a supervised learning algorithm is to find a mapping function to map the input variable(x) with
the output variable(y).

In the real-world, supervised learning can be used for Risk Assessment, Image classification, Fraud Detection,
spam filtering, etc.
How Supervised Learning Works?
In supervised learning, models are trained using labelled dataset, where the model learns about each type of
data. Once the training process is completed, the model is tested on the basis of test data (a subset of the
training set), and then it predicts the output.

The working of Supervised learning can be easily understood by the below example and diagram:
Steps Involved in Supervised Learning:
 First Determine the type of training dataset

 Collect/Gather the labelled training data.

 Split the training dataset into training dataset, test dataset, and validation dataset.

 Determine the input features of the training dataset, which should have enough knowledge so that the model

can accurately predict the output.

 Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc.

 Execute the algorithm on the training dataset. Sometimes we need validation sets as the control parameters,

which are the subset of training datasets.

 Evaluate the accuracy of the model by providing the test set. If the model predicts the correct output, which

means our model is accurate


Types of supervised Machine learning Algorithms:
Supervised learning can be further divided into two types of problems:
Regression
Regression algorithms are used if there is a relationship between the input variable and the output variable. It is

used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc. Below are some

popular Regression algorithms which come under supervised learning:

 Linear Regression

 Regression Trees

 Non-Linear Regression

 Bayesian Linear Regression

 Polynomial Regression
Classification
Classification algorithms are used when the output variable is categorical, which means there are two classes

such as Yes-No, Male-Female, True-false, etc. Spam Filtering,

 Random Forest

 Decision Trees

 Logistic Regression

 Support vector Machines


Advantages of Supervised learning:
 With the help of supervised learning, the model can predict the output on the basis of prior experiences.

 In supervised learning, we can have an exact idea about the classes of objects.

 Supervised learning model helps us to solve various real-world problems such as fraud detection, spam

filtering, etc.
Disadvantages of supervised learning:
 Supervised learning models are not suitable for handling the complex tasks.

 Supervised learning cannot predict the correct output if the test data is different from the training dataset.

 Training required lots of computation times.

 In supervised learning, we need enough knowledge about the classes of object.


Unsupervised Machine Learning
As the name suggests, unsupervised learning is a machine learning technique in which models are not supervised using training
dataset. Instead, models itself find the hidden patterns and insights from the given data. It can be compared to learning which takes
place in the human brain while learning new things. It can be defined as:
Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are allowed to act on
that data without any supervision.

Unsupervised learning cannot be directly applied to a regression or classification problem because unlike supervised learning, we have
the input data but no corresponding output data. The goal of unsupervised learning is to find the underlying structure of dataset,
group that data according to similarities, and represent that dataset in a compressed format.

Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of different types of cats and
dogs. The algorithm is never trained upon the given dataset, which means it does not have any idea about the features of the dataset.
The task of the unsupervised learning algorithm is to identify the image features on their own. Unsupervised learning algorithm will
perform this task by clustering the image dataset into the groups according to similarities between images.
Why Unsupervised Learning?
Below are some main reasons which describe the importance of Unsupervised Learning:

 Unsupervised learning is helpful for finding useful insights from the data.

 Unsupervised learning is much similar as a human learns to think by their own experiences, which makes it
closer to the real AI.

 Unsupervised learning works on unlabelled and uncategorized data which make unsupervised learning more
important.

 In real-world, we do not always have input data with the corresponding output so to solve such cases, we need
unsupervised learning.
Working of Unsupervised Learning
Working of unsupervised learning can be understood by the below diagram:

Here, we have taken an unlabelled input data, which means it is not categorized and corresponding outputs are
also not given. Now, this unlabelled input data is fed to the machine learning model in order to train it. Firstly, it will
interpret the raw data to find the hidden patterns from the data and then will apply suitable algorithms such as k-
means clustering, Decision tree, etc.
Types of Unsupervised Learning Algorithm:
The unsupervised learning algorithm can be further categorized into two types of problems:
Types of Unsupervised Learning Algorithm:
Clustering:
Clustering is a method of grouping the objects into clusters such that objects with most similarities remains
into a group and has less or no similarities with the objects of another group. Cluster analysis finds the
commonalities between the data objects and categorizes them as per the presence and absence of those
commonalities.

Association:
An association rule is an unsupervised learning method which is used for finding the relationships between
variables in the large database. It determines the set of items that occurs together in the dataset.
Association rule makes marketing strategy more effective. Such as people who buy X item (suppose a
bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association rule is Market Basket
Analysis.
Unsupervised Learning algorithms:
 K-means clustering

 KNN (k-nearest neighbors)

 Hierarchal clustering

 Anomaly detection

 Neural Networks

 Principle Component Analysis

 Independent Component Analysis

 Apriori algorithm

 Singular value decomposition


Unsupervised Learning algorithms:
Below is the list of some popular unsupervised learning algorithms:

 K-means clustering

 KNN (k-nearest neighbors)

 Hierarchal clustering

 Anomaly detection

 Neural Networks

 Principle Component Analysis

 Independent Component Analysis

 Apriori algorithm

 Singular value decomposition


Unsupervised Learning algorithms:
Advantages of Unsupervised Learning:

 Unsupervised learning is used for more complex tasks as compared to supervised learning because, in

unsupervised learning, we don't have labelled input data.

 Unsupervised learning is preferable as it is easy to get unlabelled data in comparison to labelled data.

Disadvantages of Unsupervised Learning:

 Unsupervised learning is intrinsically more difficult than supervised learning as it does not have

corresponding output.

 The result of the unsupervised learning algorithm might be less accurate as input data is not labelled,

and algorithms do not know the exact output in advance.


Difference between Supervised and Unsupervised
Learning

You might also like