ML 2
ML 2
their past purchases, predicting stock market fluctuations, and translating text from one (ii) What are the Common Applications of Machine Learning
language to another.
In common usage, the terms “machine learning” and “artificial intelligence” are often used
interchangeably with one another due to the prevalence of machine learning for AI purposes 1. Image Recognition One of the most notable machine learning applications is image
in the world today. recognition, which is a method for cataloging and detecting an object or feature in a digital
image. In addition, this technique is used for further analysis, such as pattern recognition,
Examples and use cases face detection, and face recognition.
Machine learning is typically the most mainstream type of AI technology in use around the 2. Speech Recognition ML software can make measurements of words spoken using a
world today. Some of the most common examples of machine learning that you may have collection of numbers that represent the speech signal. Popular applications that employ
interacted with in your day-to-day life include: speech recognition include Amazon’s Alexa, Apple’s Siri, and Google Maps.
* Recommendation engines that suggest products, songs, or television shows to you, such as 3. Predict Traffic Patterns To explain this, let’s consider the example of Google maps.
those found on Amazon, Spotify, or Netflix. When we enter our location on the map, the application collects massive amounts of data
* Speech recognition software that allows you to convert voice memos into text. about the present traffic to generate predictions regarding the upcoming traffic and identify
the fastest route to our destination.
* A bank’s fraud detection services automatically flag suspicious transactions.
4. E-commerce Product Recommendations One of the prominent elements of typically
* Self-driving cars and driver assistance features, such as blind-spot detection and automatic any e-commerce website is product recommendation, which involves the sophisticated use of
stopping, improve overall vehicle safety. machine learning algorithms. Websites track customer behavior based on past purchases,
WHY IS MACHINE LEARNING IMPORTANT? browsing habits, and cart history and then recommend products using machine learning and
AI.
Data is the lifeblood of all business. Data-driven decisions increasingly make the difference
between keeping up with competition or falling further behind. Machine learning can be the 5. Self-Driving Cars Self-driving cars use an unsupervised learning algorithm that
key to unlocking the value of corporate and customer data and enacting decisions that keep a heavily relies on machine learning techniques. This algorithm enables the vehicle to collect
company ahead of the competition. information from cameras and sensors about its surroundings, understand it, and choose what
actions to perform.
6. Catching Email Spam One of the most popular applications of machine learning that
Q3. everyone is familiar with is in detecting email spam. Email service providers build
applications with spam filters that use an ML algorithm to classify an incoming email as
spam and direct it to the spam folder.
7. Catching Malware The process of using machine learning (ML) to detect malware
consists of two basic stages. First, analyzing suspicious activities in an Android environment
to generate a suitable collection of features; second, training the system to use the machine
and deep learning (DL) techniques on the generated features to detect future cyberattacks in
such environments.
Q4.
Difference Feature Selection and Feature Extraction Methods
Feature selection and feature extraction methods have their advantages and disadvantages,
depending on the nature of the data and the task at hand.
Q6.
Support Vector Machine Algorithm
Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector
Machine. Consider the below diagram in which there are two different categories that are
classified using a decision boundary or hyperplane:
So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
But there can be multiple lines that can separate these classes. Consider the below image: Q7.
(A)
What is Naive Bayes Classifiers?
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’
Theorem. It is not a single algorithm but a family of algorithms where all of them share a
common principle, i.e. every pair of features being classified is independent of each other.
To start with, let us consider a dataset
One of the most simple and effective classification algorithms, the Naïve Bayes classifier
aids in the rapid development of machine learning models with rapid prediction capabilities.
Naïve Bayes algorithm is used for classification problems. It is highly used in text
classification. In text classification tasks, data contains high dimension (as each word
represent one feature in the data). It is used in spam filtering, sentiment detection, rating
Hence, the SVM algorithm helps to find the best line or decision boundary; this best classification etc. The advantage of using naïve Bayes is its speed. It is fast and making
boundary or region is called as a hyperplane. SVM algorithm finds the closest point of the prediction is easy with high dimension of data.
lines from both the classes. These points are called support vectors. The distance between the
vectors and the hyperplane is called as margin. And the goal of SVM is to maximize this
margin. The hyperplane with maximum margin is called the optimal hyperplane Why it is Called Naive Bayes?
The “Naive” part of the name indicates the simplifying assumption made by the Naïve Bayes
classifier. The classifier assumes that the features used to describe an observation are
conditionally independent, given the class label. The “Bayes” part of the name refers to
Reverend Thomas Bayes, an 18th-century statistician and theologian who formulated Bayes’
theorem.
Consider a fictional dataset that describes the weather conditions for playing a game of golf.
Given the weather conditions, each tuple classifies the conditions as fit(“Yes”) or
unfit(“No”) for playing golf.Here is a tabular representation of our dataset.
Linear
A linear regression is a model where the relationship between inputs and outputs is a straight
line. This is the easiest to conceptualize and even observe in the real world. Even when a
relationship isn’t very linear, our brains try to see the pattern and attach a rudimentary linear
model to that relationship.
One example may be around the number of responses to a marketing campaign. If we send
1,000 emails, we may get five responses. If this relationship can be modeled using a linear
regression, we would expect to get ten responses when we send 2,000 emails. Your chart
may vary, but the general idea is that we associate a predictor and a target, and we assume a
relationship between the two.
The dataset is divided into two parts, namely, feature matrix and the response vector.
* Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the Using a linear regression model, we want to estimate the correlation between the number of
value of dependent features. In above dataset, features are ‘Outlook’, ‘Temperature’, emails sent and response rates. In other words, if the linear model fits our observations well
‘Humidity’ and ‘Windy’. enough, then we can estimate that the more emails we send, the more responses we will get.
* Response vector contains the value of class variable(prediction or output) for each row of When making a claim like this, whether it is related to exercise, happiness, health, or any
feature matrix. In above dataset, the class variable name is ‘Play golf’ number of claims, there is usually a regression model behind the scenes to support the
claim.
In addition, the model fit can be described using a mean squared error. This basically gives
B) linear regression us a number to show exactly how well the linear model fits.
A regression model provides a function that describes the relationship between one or more
Q8.
independent variables and a response, dependent, or target variable.
For example, the relationship between height and weight may be described by a linear What is K-Means Algorithm?
regression model. A regression analysis is the basis for many types of prediction and for K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled
determining the effects on target variables. When you hear about studies on the news that dataset into different clusters. Here K defines the number of pre-defined clusters that need to
talk about fuel efficiency, or the cause of pollution, or the effects of screen time on learning, be created in the process, as if K=2, there will be two clusters, and for K=3, there will be
there is often a regression model being used to support their claims. three clusters, and so on.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a
way that each dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.
The k-means clustering algorithm mainly performs two tasks:
* Determines the best value for K center points or centroids by an iterative process.
* Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.
The below diagram explains the working of the K-means Clustering Algorithm:
Q9)
(i)
(B)
ROC curve-
Random Forest Algorithm-
An ROC curve (receiver operating characteristic curve) is a graph showing the performance
of a classification model at all classification thresholds. This curve plots two parameters:
Random Forest is a popular machine learning algorithm that belongs to the supervised * True Positive Rate
learning technique. It can be used for both Classification and Regression problems in ML. It
* False Positive Rate
is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model. True Positive Rate (TPR) is a synonym for recall and is therefore defined as follows:
As the name suggests, "Random Forest is a classifier that contains a number of decision trees
on various subsets of the given dataset and takes the average to improve the predictive
TPR = TP/TP + FN
accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of predictions, and it predicts the
final output.
False Positive Rate (FPR) is defined as follows:
The greater number of trees in the forest leads to higher accuracy and prevents the problem
of overfitting.
FPR = FP/FP + FN
The below diagram explains the working of the Random Forest algorithm:
An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the
classification threshold classifies more items as positive, thus increasing both False Positives
and True Positives. The following figure shows a typical ROC curve.
(ii)
What is a Confusion Matrix?
A confusion matrix is a matrix that summarizes the performance of a machine learning
model on a set of test data. It is a means of displaying the number of accurate and inaccurate
instances based on the model's predictions. It is often used to measure the performance of
classification models, which aim to predict a categorical label for each input instance.
The matrix displays the number of instances produced by the model on the test data.
* True positives (TP): occur when the model accurately predicts a positive data point.
* True negatives (TN): occur when the model accurately predicts a negative data point.
* False positives (FP): occur when the model predicts a positive data point incorrectly.
* False negatives (FN): occur when the model mispredicts a negative data point.
Why do we need a Confusion Matrix?
When assessing a classification model's performance, a confusion matrix is essential. It
offers a thorough analysis of true positive, true negative, false positive, and false negative
predictions, facilitating a more profound comprehension of a model's recall, accuracy,
precision, and overall effectiveness in class distinction. When there is an uneven class
distribution in a dataset, this matrix is especially helpful in evaluating a model's performance
beyond basic accuracy metrics.