Machine Learning 3
Machine Learning 3
Learning
D R. S O U M I D U T TA
Machine Learning Model
“A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.”
The above definition is basically focusing on three parameters, also the main
components of any learning algorithm, namely Task(T), Performance(P) and
experience (E). In this context, we can simplify this definition as −
A task T is said to be a ML based task when it is based on the process and the
system must follow for operating on data points. The examples of ML based tasks
are Classification, Regression, Structured annotation, Clustering, Transcription etc.
Experience (E)
As name suggests, it is the knowledge gained from data points provided to the
algorithm or model. Once provided with the dataset, the model will run iteratively
and will learn some inherent pattern. The learning thus acquired is called
experience(E). Making an analogy with human learning, we can think of this
situation as in which a human being is learning or gaining some experience from
various attributes like situation, relationships etc. Supervised, unsupervised and
reinforcement learning are some ways to learn or gain experience. The experience
gained by out ML model or algorithm will be used to solve the task T.
Performance(P)
An ML algorithm is supposed to perform task and gain experience with the passage
of time. The measure which tells whether ML algorithm is performing as per
expectation or not is its performance (P).
P is basically a quantitative metric that tells how a model is performing the task, T,
using its experience, E. There are many metrics that help to understand the ML
performance, such as accuracy score, F1 score, confusion matrix, precision, recall,
sensitivity etc.
Machine Learning - Python Libraries
Some popular Python machine learning libraries are as follows −
NumPy Keras
Pandas Matplotlib
SciPy Seaborn
Scikit-learn OpenCV
PyTorch NLTK
TensorFlow SpaCy
Machine Learning Life Cycle(P)
The machine learning life cycle is an iterative process that moves from a business problem to a machine learning solution.
It is used as a guide for developing a machine learning project to solve a problem. It provides us with instructions and best
practices to be used in each phase while developing ML solutions.
The machine learning life cycle is a process that involves several phases from problem identification to model deployment
and monitoring. While developing an ML project, each step in the life cycle is revisited many times through these phases.
The stages/ phases involved in the end to end machine life cycle process are as follows −
Problem Definition
Data Preparation
Model Development
Model Deployment
As this step lays the foundation for building a machine learning model,
the problem definition has to be clear and concise.
1. Data Collection:
After the problem statement is analyzed, the next step would be collecting data. This involves gathering data from various
sources which is given as a raw material to the machine learning model. Few features that are considered while collecting
data are −
Relevant and usefulness − The data collected has to be relevant to the problem statement, and also should be useful
enough to train the machine learning model efficiently.
Quality and Quantity − The quality and quantity of the data collected would directly impact the performance of the
machine learning model.
Variety − Make sure that the data collected is diverse so that the model can be trained with multiple scenarios to
recognize the patterns.
Data Preparation
2. Data Preprocessing:
The data collected often might be unstructured and messy which causes it to negatively affect the outcomes, hence pre
processing data is important to improve the accuracy and performance of the machine learning model. Issues that have to
be addressed are missing values, duplicate data, invalid data and noise.
3. Analyzing Data:
After the data is all sorted, it is time to understand the data that is collected. The data is visualized and statistically
summarized to gain insights. Various tools like Power BI, Tableau are used to visualize data which helps in understanding
the patterns and trends in the data.
1. Model Selection:
Model selection is a crucial step in the machine learning workflow. The decision of choosing a model depends on basic features like
characteristics of the data, complexity of the problem, desired outcomes and how well it aligns with the defined problem.
2. Model Training:
In this process, the algorithm is fed with a preprocessed dataset to identify and understand the patterns and relationships in the specified
features.
3. Model Evaluation:
In model evaluation, the performance of the machine learning model is evaluated using a set of evaluation metrics. These metrics
measure the accuracy, precision, recall, and F1 score of the model. If the model has not achieved desired performance, the model is
tuned to adjust hyper parameters and improve the predictive accuracy. This continuous iteration is essential to make the model more
accurate and reliable.
Model Deployment
In the model deployment phase, we deploy the machine learning model into production. This process involves
integrating the tested model with existing systems to make it available to users, management or other purposes.
This also involves testing the model in a real-world scenario.
Two important factors that have to be checked before deploying are whether the model is portable i.e, the ability
to transfer the software from one machine to another and scalable i.e, the model need not be redesigned to
maintain performance.
Monitor and Maintenance
Monitoring in machine learning involves techniques to measure the model performance metrics and to detect
issues in the models. After an issue is detected, the model has to be trained with new data or the architecture
has to be modified.
Sometimes when the issue detected in the designed model cannot be solved with training it with new data, the
issue becomes the problem statement. So, the machine learning life cycle revamps from analyzing the problem
again to develop an improved model.
The machine learning life cycle is an iterative process, and it may be necessary to revisit previous stages to
improve the model's performance or address new requirements. By following the machine learning life cycle,
data scientists can ensure that their machine learning models are effective, accurate, and meet the business
requirements.
Supervised Machine Learning
Supervised learning is the types of machine learning in which machines are trained using well "labelled" training
data, and on basis of that data, machines predict the output. The labelled data means some input data is already
tagged with the correct output. In supervised learning, the training data provided to the machines work as the
supervisor that teaches the machines to predict the output correctly. It applies the same concept as a student
learns in the supervision of the teacher.
Supervised learning is a process of providing input data as well as correct output data to the machine learning
model. The aim of a supervised learning algorithm is to find a mapping function to map the input variable(x) with
the output variable(y).
In the real-world, supervised learning can be used for Risk Assessment, Image classification, Fraud Detection,
spam filtering, etc.
How Supervised Learning Works?
In supervised learning, models are trained using labelled dataset, where the model learns about each type of
data. Once the training process is completed, the model is tested on the basis of test data (a subset of the
training set), and then it predicts the output.
The working of Supervised learning can be easily understood by the below example and diagram:
Steps Involved in Supervised Learning:
First Determine the type of training dataset
Split the training dataset into training dataset, test dataset, and validation dataset.
Determine the input features of the training dataset, which should have enough knowledge so that the model
Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc.
Execute the algorithm on the training dataset. Sometimes we need validation sets as the control parameters,
Evaluate the accuracy of the model by providing the test set. If the model predicts the correct output, which
used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc. Below are some
Linear Regression
Regression Trees
Non-Linear Regression
Polynomial Regression
Classification
Classification algorithms are used when the output variable is categorical, which means there are two classes
Random Forest
Decision Trees
Logistic Regression
In supervised learning, we can have an exact idea about the classes of objects.
Supervised learning model helps us to solve various real-world problems such as fraud detection, spam
filtering, etc.
Disadvantages of supervised learning:
Supervised learning models are not suitable for handling the complex tasks.
Supervised learning cannot predict the correct output if the test data is different from the training dataset.
Unsupervised learning cannot be directly applied to a regression or classification problem because unlike supervised learning, we have
the input data but no corresponding output data. The goal of unsupervised learning is to find the underlying structure of dataset,
group that data according to similarities, and represent that dataset in a compressed format.
Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of different types of cats and
dogs. The algorithm is never trained upon the given dataset, which means it does not have any idea about the features of the dataset.
The task of the unsupervised learning algorithm is to identify the image features on their own. Unsupervised learning algorithm will
perform this task by clustering the image dataset into the groups according to similarities between images.
Why Unsupervised Learning?
Below are some main reasons which describe the importance of Unsupervised Learning:
Unsupervised learning is helpful for finding useful insights from the data.
Unsupervised learning is much similar as a human learns to think by their own experiences, which makes it
closer to the real AI.
Unsupervised learning works on unlabelled and uncategorized data which make unsupervised learning more
important.
In real-world, we do not always have input data with the corresponding output so to solve such cases, we need
unsupervised learning.
Working of Unsupervised Learning
Working of unsupervised learning can be understood by the below diagram:
Here, we have taken an unlabelled input data, which means it is not categorized and corresponding outputs are
also not given. Now, this unlabelled input data is fed to the machine learning model in order to train it. Firstly, it will
interpret the raw data to find the hidden patterns from the data and then will apply suitable algorithms such as k-
means clustering, Decision tree, etc.
Types of Unsupervised Learning Algorithm:
The unsupervised learning algorithm can be further categorized into two types of problems:
Types of Unsupervised Learning Algorithm:
Clustering:
Clustering is a method of grouping the objects into clusters such that objects with most similarities remains
into a group and has less or no similarities with the objects of another group. Cluster analysis finds the
commonalities between the data objects and categorizes them as per the presence and absence of those
commonalities.
Association:
An association rule is an unsupervised learning method which is used for finding the relationships between
variables in the large database. It determines the set of items that occurs together in the dataset.
Association rule makes marketing strategy more effective. Such as people who buy X item (suppose a
bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association rule is Market Basket
Analysis.
Unsupervised Learning algorithms:
K-means clustering
Hierarchal clustering
Anomaly detection
Neural Networks
Apriori algorithm
K-means clustering
Hierarchal clustering
Anomaly detection
Neural Networks
Apriori algorithm
Unsupervised learning is used for more complex tasks as compared to supervised learning because, in
Unsupervised learning is preferable as it is easy to get unlabelled data in comparison to labelled data.
Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.
The result of the unsupervised learning algorithm might be less accurate as input data is not labelled,