0% found this document useful (0 votes)
21 views

Machine Learning Suggestion

The document discusses artificial neural networks and their applications in machine learning. It also defines concept learning and machine learning, and discusses different perspectives and challenges in machine learning such as algorithms, data, ethics, and deployment.

Uploaded by

atik pal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Machine Learning Suggestion

The document discusses artificial neural networks and their applications in machine learning. It also defines concept learning and machine learning, and discusses different perspectives and challenges in machine learning such as algorithms, data, ethics, and deployment.

Uploaded by

atik pal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

1. Definition of Artificial Neural Network (ANN). Applications of ANN in Machine Learning.

An Artificial Neural Network (ANN) is a computational model inspired by the biological neural networks of the human brain. It consists of
interconnected nodes, or artificial neurons, organized in layers. Each neuron receives input signals, processes them using an activation
function, and passes the output to the neurons in the next layer.

Applications of Artificial Neural Networks in Machine Learning include:

Image Recognition and Computer Vision: ANNs are widely used in tasks such as object detection, image classification, and facial
recognition. Convolutional Neural Networks (CNNs), a type of ANN specialized for processing grid-like data, have shown remarkable
performance in these areas.

Natural Language Processing (NLP): ANNs are employed in various NLP tasks like sentiment analysis, machine translation, named entity
recognition, and text generation. Recurrent Neural Networks (RNNs) and Transformer models are popular architectures for NLP tasks.

Speech Recognition: ANNs, particularly deep learning models like Recurrent Neural Networks (RNNs) and Convolutional Neural Networks
(CNNs), are used for speech recognition tasks, enabling systems to transcribe spoken language into text with high accuracy.

Recommendation Systems: ANNs are utilized in recommendation systems to predict user preferences and provide personalized
recommendations for items such as movies, music, products, and articles.

Financial Forecasting: ANNs are applied in financial markets for tasks like stock price prediction, risk assessment, algorithmic trading, and
fraud detection.

Healthcare: ANNs are used for medical image analysis, disease diagnosis, drug discovery, and patient outcome prediction. They can
analyze medical images like X-rays, MRIs, and CT scans to assist in diagnosis.

Autonomous Vehicles: ANNs play a crucial role in autonomous vehicles for tasks like object detection, lane tracking, decision-making, and
path planning.

Gaming: ANNs are used in gaming for tasks such as character behavior modeling, opponent AI, and procedural content generation.

2. Difference between Biological Neural Network and Computer Networks with diagram.

BASIS FOR
ARTIFICIAL NEURAL NETWORK BIOLOGICAL NEURAL NETWORK
COMPARISON

Processing Sequential and centralised Parallel and distributed

Rate Artificial neural networks process Biological neurons are slow in processing information.

information in a faster pace.

Size Small Large

Complexity Incapable to perform complex pattern The enormous size and complexity of the connections provide

recognition. brain a capability of the performing complex tasks.

Fault tolerance Intolerant to the failure. Implicitly fault tolerant.

Control mechanism Control unit monitors all computing- All the processing is centrally controlled.

related activities.

Feedback Not provided Provided

3. Definition of Machine Learning.

Machine Learning (ML) is a subfield of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that
enable computers to learn from and make predictions or decisions based on data, without being explicitly programmed for each task.
4.Different perspectives/viewpoints and issues/challenges in Machine Learning.

Machine Learning (ML) is a vast and rapidly evolving field, encompassing various perspectives, viewpoints, and associated challenges. Here
are some different perspectives and the corresponding issues and challenges in ML:

Algorithmic Perspective:

Issue: Designing efficient and accurate algorithms that can handle large-scale datasets and complex patterns.

Challenge: Balancing model complexity and interpretability, addressing overfitting, and optimizing computational resources.

Data Perspective:

Issue: Data quality, quantity, and diversity significantly impact the performance and generalization ability of ML models.

Challenge: Data preprocessing, cleaning, and augmentation, dealing with imbalanced datasets, ensuring privacy and security of sensitive
data.

Ethical Perspective:

Issue: ML algorithms can perpetuate biases present in training data, leading to unfair or discriminatory outcomes.

Challenge: Developing fair and transparent ML models, addressing issues of algorithmic bias, ensuring accountability and ethical use of AI
systems.

Interpretability Perspective:

Issue: Many ML models, especially deep learning models, are often treated as "black boxes" with limited interpretability.

Challenge: Enhancing the interpretability of ML models, enabling users to understand and trust model predictions, exploring techniques
for model explanation and visualization.

Deployment Perspective:

Issue: Transitioning ML models from research prototypes to real-world applications involves various technical and logistical challenges.

Challenge: Model deployment and integration into existing systems, managing scalability and performance, ensuring robustness and
reliability in production environments.

Regulatory and Legal Perspective:

Issue: ML applications raise concerns regarding privacy, data protection, intellectual property rights, and regulatory compliance.

Challenge: Navigating legal and regulatory frameworks, ensuring compliance with data protection regulations (e.g., GDPR, CCPA),
addressing liability issues and ethical guidelines.

Human-Centric Perspective:

Issue: ML systems interact with humans, impacting user experience, behavior, and decision-making processes.

Challenge: Designing ML systems that are user-friendly, accessible, and inclusive, understanding the societal implications of AI
technologies, fostering human-AI collaboration and trust.

Continual Learning Perspective:

Issue: ML models need to adapt and learn continuously in dynamic environments where data distributions and tasks may change over
time.

Challenge: Developing algorithms for lifelong learning, handling concept drift and catastrophic forgetting, enabling efficient and scalable
continual learning.

5. Definition of Concept Learning. Tasks of concept learning.

Concept learning refers to the process of acquiring knowledge or understanding about categories, classes, or concepts from examples or
data. In other words, it involves identifying patterns, relationships, or rules that define a particular concept based on observed instances.
Tasks of concept learning typically include:

Classification: Classifying instances into predefined categories or classes based on their features or attributes.

Regression: Predicting a continuous value or quantity based on input variables. Regression tasks inv olve learning a function that maps
input features to a target output. For instance, predicting house prices based on features like size, location, and amenities.

Clustering: Grouping similar instances together into clusters or segments based on their inherent similarities. Clustering tasks aim to
discover underlying structures or patterns in data without predefined labels.

Association Rule Learning: Discovering relationships, dependencies, or associations between variables or items in a dataset.

Anomaly Detection: Identifying outliers or unusual instances in a dataset that deviate significantly from the norm. Anomaly detection tasks
involve distinguishing between normal and abnormal behavior, which is useful for fraud detection, network security, and fault diagnosis.

Feature Selection and Dimensionality Reduction: Selecting relevant features or reducing the dimensionality of the input space to improve
the efficiency and effectiveness of learning algorithms

Rule-Based Learning: Discovering explicit rules or decision criteria that accurately classify instances into different categories..

6. Explain how the concept learning can be viewed as a task of searching

Concept learning can be viewed as a task of searching for a hypothesis or a concept description that best fits the observed data. In this
context, the concept to be learned is typically represented as a hypothesis space, which contains all possible candidate concepts that could
potentially explain the data.

Here's how concept learning can be conceptualized as a search task:

Hypothesis Space: The hypothesis space represents the set of all possible concepts or hypotheses that the learner considers during the
learning process. Each hypothesis corresponds to a particular way of categorizing or representing the data. Search Strategy: The search
strategy determines how the learner explores the hypothesis space to find the most suitable concept. This strategy involves s electing and
evaluating candidate hypotheses based on their ability to explain the observed data.

Evaluation Criterion: During the search process, hypotheses are evaluated based on their goodness of fit to the observed data and their
ability to generalize to unseen instances.

Optimization: The goal of the search process is to find the hypothesis that optimally balances the trade-off between model complexity and
accuracy. This often involves optimizing an objective function that quantifies the quality of the hypotheses with respect to the training
data..

Generalization: Once a suitable concept is learned from the training data, the ultimate goal is to generalize the learned knowledge to
unseen instances or new data. Generalization ensures that the learned concept can accurately classify or predict the outcomes of
previously unseen instances, reflecting the underlying patterns or relationships in the data beyond the training set.

7.Difference between Classification and Regression

Classification Regression

In this problem statement, the target variables are


In this problem statement, the target variables are discrete.
continuous.

Problems like Spam Email Classification, Disease Problems like House Price Prediction, Rainfall
prediction like problems are solved using Classification Prediction like problems are solved using regression
Algorithms. Algorithms.
Classification Regression

In this algorithm, we try to find the best possible decision


In this algorithm, we try to find the best-fit line which
boundary which can separate the two classes with the
can represent the overall trend in the data.
maximum possible separation.

Evaluation metrics like Precision, Recall, and F1-Score are Evaluation metrics like Mean Squared Error, R2-Score,
used here to evaluate the performance of the classification and MAPE are used here to evaluate the performance
algorithms. of the regression algorithms.

Here we face the problems like binary Classification or Multi- Here we face the problems like Linear
Class Classification problems. Regression models as well as non-linear models.

Input Data are Independent variables and categorical Input Data are Independent variables and continuous
dependent variable. dependent variable.

The classification algorithm’s task mapping the input value of The regression algorithm’s task is mapping input value
x with the discrete output variable of y. (x) with continuous output variable (y).

Output is Categorical labels. Output is Continuous numerical values.

8. Short note: Logistic Regression

Logistic Regression is a widely used statistical method for binary classification tasks, where the goal is to predict the pro bability that an
instance belongs to a particular class. Despite its name, logistic regression is a classification algorithm rather than a regression algorithm. It
models the relationship between one or more independent variables (features) and a binary dependent variable (target) using t he logistic
function, also known as the sigmoid function.

In logistic regression, the logistic function transforms the output of a linear combination of the input features into a value between 0 and
1, representing the probability of the positive class. The model parameters (coefficients) are estimated using maximum likelihood
estimation or other optimization techniques to fit the training data.

Key features of logistic regression include:

Simplicity: Logistic regression is relatively simple and interpretable compared to more complex models like neural networks.

Efficient: It can handle large datasets efficiently and is computationally inexpensive.

Regularization: Techniques like L1 (Lasso) and L2 (Ridge) regularization can be applied to prevent overfitting.

Probability Estimation: Logistic regression provides probabilistic outputs, making it useful for ranking and thresholding predictions.
Applications of logistic regression include:

Binary classification tasks such as spam detection, credit risk assessment, and medical diagnosis.

Click-through rate prediction in online advertising.

Risk prediction in healthcare, such as identifying patients at high risk of developing a disease.

Social science research, for predicting outcomes like voting behavior or customer churn

9. Difference between Supervised and Unsupervised Learning.

Supervised Learning Unsupervised Learning

Uses Known and Labeled Data


Uses Unknown Data as input
Input Data as input

Computational Complexity Less Computational Complexity More Computational Complex

Real-Time Uses off-line analysis Uses Real-Time Analysis of Data

The number of Classes is not


The number of Classes is known
Number of Classes known

Moderate Accurate and Reliable


Accurate and Reliable Results
Accuracy of Results Results

Output data The desired output is given. The desired, output is not given.

In supervised learning it is not In unsupervised learning it is


possible to learn larger and more possible to learn larger and more
complex models than in complex models than in supervised
Model unsupervised learning learning

In supervised learning training In unsupervised learning training


Training data data is used to infer model data is not used.

Supervised learning is also Unsupervised learning is also


Another name called classification. called clustering.

Test of model We can test our model. We can not test our model.

Example Optical Character Recognition Find a face in an image.

10. Explanation of well-defined learning problem with example.

A well-defined learning problem refers to a scenario where the task, the data, and the evaluation criteria are clearly
specified, allowing for the application of machine learning techniques to solve the problem effectively. Here's an
explanation of a well-defined learning problem with an example:
Problem Statement: Predicting House Prices

Task: The task is to develop a machine learning model that can predict the selling price of houses based on their features.

Data: The dataset consists of historical records of houses along with their corresponding features and selling prices. The
features may include attributes such as the size of the house, number of bedrooms and bathrooms, location, proximity to
amenities, age of the house, etc. Each record in the dataset represents a single house, and the target variable is the selling
price.

Evaluation Criteria: The performance of the machine learning model will be evaluated based on its ability to accurately
predict house prices on unseen data. Common evaluation metrics for regression tasks like this include Mean Absolute Error
(MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), or R-squared (R2) score.

Example:

Let's consider a scenario where a real estate company wants to develop a pricing model to assist home sellers and buyers
in estimating the value of properties. They collect a dataset containing information about recently sold houses in a
particular city:

Features:

Size of the house (in square feet)

Number of bedrooms

Number of bathrooms

Location (e.g., neighborhood, zip code)

Age of the house (in years)

Distance to the nearest school

Distance to the city center

Target Variable:

Selling price of the house

The real estate company decides to use this dataset to train a machine learning model to predict house prices for new
listings. They split the dataset into training and testing sets, where the training set is used to train the model, and the
testing set is used to evaluate its performance.

They choose to use Mean Absolute Error (MAE) as the evaluation metric to measure the average absolute difference
between the predicted prices and the actual prices on the testing set.

After training the model and evaluating its performance on the testing set, they deploy the model to their website, where
users can input the features of a house and get an estimated selling price based on the trained model.

11. Short Notes: i) Find-S algorithm, ii) Candidate Elimination algorithm

i) Find-S Algorithm:

The Find-S algorithm is a simple, incremental, and greedy algorithm used for learning a hypothesis within the
version space framework in the context of concept learning from examples. It is specifically designed for
learning concepts that can be represented as conjunctions of attribute-value pairs.

Key Steps:
Initialize the hypothesis to the most specific hypothesis in the hypothesis space.

Iterate through each positive training example:

For each attribute-value pair in the example, if the value matches the hypothesis, continue to the next pair. If
not, generalize the hypothesis to include the attribute-value pair.

Repeat step 2 for each positive training example, gradually refining the hypothesis to become more specific.

The final hypothesis obtained after processing all positive examples is the most specific hypothesis consistent
with the training data.

Example: Consider learning a concept of "sunny day" based on weather attributes (e.g., outlook, temperature,
humidity). The Find-S algorithm iteratively updates the hypothesis to include attribute-value pairs that match
positive examples of sunny days until it becomes the most specific hypothesis consistent with the training
data.

Advantages:

Simple and easy to implement.

Guarantees convergence to the most specific hypothesis consistent with the training data.

Limitations:

Assumes that the training data is noise-free and that the hypothesis space contains the true concept.

May not perform optimally with noisy or inconsistent data.

ii) Candidate Elimination Algorithm:

The Candidate Elimination algorithm is a method for learning a hypothesis within the version space
framework, similar to the Find-S algorithm. It maintains a set of consistent hypotheses that collectively define
the version space—a space containing all hypotheses consistent with the observed training examples.

Key Steps:

Initialize the version space with the most general and most specific hypotheses.

Iterate through each training example:

For each positive example, generalize the most specific hypothesis as necessary to include the example.

For each negative example, specialize the most general hypothesis as necessary to exclude the example.

Remove any hypotheses from the version space that are no longer consistent with the observed examples.

Repeat step 2 for each training example, gradually refining the version space.

The final version space contains all hypotheses consistent with the training data, representing the learned
concept.
Example: Using the same "sunny day" concept learning task, the Candidate Elimination algorithm maintains a
version space of consistent hypotheses that collectively define the boundaries of the concept based on
observed positive and negative examples.

Advantages:

More flexible than the Find-S algorithm, as it can handle both positive and negative examples.

Maintains a version space that represents the set of all possible hypotheses consistent with the training data.

Limitations:

Requires storing and updating a potentially large version space, which can be computationally expensive.

May converge to an overly general or overly specific hypothesis if the training data is noisy or incomplete.

12. . Definition of Perceptron model.

A Perceptron is a single-layer feedforward neural network consisting of a single layer of artificial neurons, also known as
perceptrons. Each perceptron takes multiple input values, applies weights to these inputs, sums them up, and passes the
result through an activation function to produce an output.

13. Definition of instance-based learning

In instance-based learning, the training data is memorized or stored, and predictions for new instances are made based on
the similarity between the new instance and the stored training instances. Instead of generalizing from the training data to
construct a model, instance-based learning methods rely on local approximation and similarity measures to make
predictions.

14. Role of Mean Squared Error (MSE) in Machine Learning

The Mean Squared Error (MSE) is a widely used evaluation metric in machine learning, particularly in regression tasks. It
measures the average squared difference between the predicted values and the actual values in a dataset. MSE plays
several important roles in the context of machine learning:

Model Evaluation: MSE serves as a measure of the goodness-of-fit of a regression model to the training data. A lower MSE
indicates that the model's predictions are closer to the true values, implying better performance. Therefore, MSE is
commonly used to compare the performance of different regression models and select the one that yields the lowest MSE
on the training data.

Loss Function: In the context of training machine learning models, MSE often serves as the loss function that the
optimization algorithm seeks to minimize. During the training process, the model's parameters are adjusted iteratively to
minimize the MSE between the predicted and actual values on the training data. Optimization techniques like gradient
descent are commonly used to minimize the MSE and optimize the model parameters.

Regularization: MSE can be augmented with regularization terms, such as L1 (Lasso) or L2 (Ridge) regularization, to prevent
overfitting and improve the generalization ability of the model. Regularization penalizes large parameter values,
encouraging simpler models that generalize better to unseen data.

Gradient Estimation: MSE is differentiable with respect to the model parameters, making it suitable for gradient-based
optimization algorithms. The gradient of the MSE with respect to the model parameters provides information about the
direction and magnitude of parameter updates needed to minimize the MSE, facilitating efficient model training.
Interpretability: MSE has a clear interpretation—it quantifies the average squared error between predicted and actual
values. This makes it easy to understand and interpret the performance of a regression model in terms of the magnitude of
prediction errors.

15. Role of gradient descent in linear regression.

Gradient descent is a fundamental optimization algorithm used in linear regression to iteratively update the
model parameters (coefficients) in order to minimize the cost function, typically the Mean Squared Error
(MSE), between the predicted and actual values.

Role of Gradient Descent:

Optimization: Gradient descent optimizes the model parameters to minimize the cost function, effectively
fitting the linear regression model to the training data.

Efficiency: Gradient descent efficiently updates the model parameters in the direction that reduces the cost
function, converging to the optimal solution over multiple iterations.

Scalability: Gradient descent is scalable to large datasets and high-dimensional feature spaces, making it
suitable for linear regression tasks with large amounts of data.

16. Numerical problems from confusion matrix, given actual value, predicted value, calculate
accuracy, precision, recall, f1-score, and specificity.(see notes
17. Concepts of entropy and information gain with example.

Entropy and information gain are concepts commonly used in decision tree algorithms, particularly in the
context of feature selection and splitting criteria.

Entropy: Entropy measures the impurity or randomness in a dataset. In the context of decision trees, entropy
is used to quantify the uncertainty in the distribution of class labels within a dataset.

Information Gain: Information gain measures the effectiveness of a feature in classifying instances by reducing
uncertainty (entropy) in the dataset when the dataset is split based on that feature

Example:

Suppose we have a dataset of emails classified as spam or not spam, with two features: "Contains word 'free'"
and "Contains word 'buy'". We want to determine which feature is the best to split the dataset on to create a
decision tree.

Calculate the entropy of the original dataset 𝐻(𝐷)H(D).

For each feature (e.g., "Contains word 'free'"), calculate the weighted average entropy after splitting the
dataset based on that feature using information gain.

Select the feature with the highest information gain as the splitting criterion.

For example, if splitting the dataset based on "Contains word 'free'" results in the highest information gain, the
decision tree algorithm would use this feature to split the dataset into subsets with lower entropy , making it
easier to classify instances into the appropriate class labels.
18. . Short Note: Random Forest.

Random Forest is an ensemble learning method that combines the predictions of multiple decision trees to improve the
accuracy and robustness of the model. It builds a "forest" of decision trees, each trained on a random subset of the training
data and using a random subset of features for each split.

Key Characteristics:

Ensemble Learning: Random Forest is an ensemble learning technique that aggregates the predictions of multiple
individual decision trees to produce a more accurate and stable prediction.

Decision Trees: Each tree in the Random Forest is trained independently on a bootstrapped subset of the training data
using a random subset of features at each split.

Randomness: Random Forest introduces randomness in the training process by using bootstrapping (sampling with
replacement) and feature sampling, which helps to decorrelate the individual trees and reduce overfitting.

Voting (Classification) or Averaging (Regression): For classification tasks, the mode of the predictions of individual trees is
taken as the final prediction. For regression tasks, the average of the predictions is computed.

Advantages:

Random Forest is highly robust and less prone to overfitting compared to individual decision trees.

It can handle high-dimensional data and large datasets efficiently.

Random Forest provides estimates of feature importance, allowing for interpretability and insights into the predictive
power of features.

Applications:

Random Forest is widely used in various domains, including finance, healthcare, marketing, and bioinformatics.

It is suitable for tasks such as classification, regression, and feature selection.

Applications include credit scoring, customer churn prediction, disease diagnosis, and image classification.

Limitations:

Random Forest may not perform as well as more complex models like gradient boosting machines (GBMs) on certain tasks.

Training a Random Forest can be computationally expensive, especially with a large number of trees and features.

19. Short Note: Hyperplane of SVM algorithm with diagram


Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is
used for Classification as well as Regression problems. However, primarily, it is used for Classification
problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category in
the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called
as support vectors, and hence algorithm is termed as Support Vector Machine. Consider the below diagram
in which there are two different categories that are classified using a decision boundary or hyperplane:
21. Difference between clustering and classification with example

Factor Classification Regression

When values need to be converted to a


Using the mapping function, values are
Basic continuous output, the Mapping Function
mapped to preset classes.
is what you need.

Includes anticipation
Values that are distinct Values that are constant
of

Characteristics of
anticipated data
Unordered Stacked
Procedure for The root mean square error (RMSE) is
By gauging the level of precision
calculating calculated

For example, logistic regression and Regression trees, linear regression, and
Typical Algorithms
decision trees. more methods are available.

23. Short note: i) Euclidean distance, ii) Manhattan distance, iii) Minkowski distance, iv)
Hamming distance.

i) Euclidean Distance:

Euclidean distance is a measure of the straight-line distance between two points in a Euclidean space. In two-
dimensional space, the Euclidean distance between points (𝑥1,𝑦1)(x1,y1) and (𝑥2,𝑦2)(x2,y2) is calculated using
the formula:

Euclidean Distance=(𝑥2−𝑥1)2+(𝑦2−𝑦1)2Euclidean Distance=(x2−x1)2+(y2−y1)2

In general 𝑛n-dimensional space, the Euclidean distance between two points (𝑥1,𝑥2,...,𝑥𝑛)(x1,x2,...,xn) and
(𝑦1,𝑦2,...,𝑦𝑛)(y1,y2,...,yn) is given by:

Euclidean Distance=∑𝑖=1𝑛(𝑥𝑖−𝑦𝑖)2Euclidean Distance=∑i=1n(xi−yi)2

Euclidean distance is commonly used in clustering algorithms, nearest neighbor search, and as a similarity
measure in various machine learning tasks.

ii) Manhattan Distance:

Manhattan distance, also known as taxicab or city block distance, is a measure of the distance between two
points in a grid-like space. It is calculated as the sum of the absolute differences between the coordinates of
the points.

In two-dimensional space, the Manhattan distance between points (𝑥1,𝑦1)(x1,y1) and (𝑥2,𝑦2)(x2,y2) is given
by:

Manhattan Distance=∣𝑥2−𝑥1∣+∣𝑦2−𝑦1∣Manhattan Distance=∣x2−x1∣+∣y2−y1∣

In 𝑛n-dimensional space, the Manhattan distance between two points (𝑥1,𝑥2,...,𝑥𝑛)(x1,x2,...,xn) and
(𝑦1,𝑦2,...,𝑦𝑛)(y1,y2,...,yn) is calculated as:

Manhattan Distance=∑𝑖=1𝑛∣𝑥𝑖−𝑦𝑖∣Manhattan Distance=∑i=1n∣xi−yi∣

Manhattan distance is commonly used in distance-based clustering algorithms, such as k-means clustering, and
in problems involving navigation or route planning.
iii) Minkowski Distance:

Minkowski distance is a generalization of both Euclidean and Manhattan distances. It is a parameterized


distance measure that can be adjusted based on the value of a parameter 𝑝p.

In 𝑛n-dimensional space, the Minkowski distance between two points (𝑥1,𝑥2,...,𝑥𝑛)(x1,x2,...,xn) and
(𝑦1,𝑦2,...,𝑦𝑛)(y1,y2,...,yn) is given by:

Minkowski Distance=(∑𝑖=1𝑛∣𝑥𝑖−𝑦𝑖∣𝑝)1𝑝Minkowski Distance=(∑i=1n∣xi−yi∣p)p1

When 𝑝=1p=1, the Minkowski distance reduces to the Manhattan distance, and when 𝑝=2p=2, it reduces to
the Euclidean distance. Minkowski distance is a flexible distance measure that can be adjusted to suit different
applications.

iv) Hamming Distance:

Hamming distance is a measure of the number of positions at which corresponding symbols differ between
two strings of equal length. It is commonly used in information theory, coding theory, and error detection.

For example, the Hamming distance between the strings "karolin" and "kathrin" is 3, as there are three
positions where the characters differ ('a' vs 't', 'o' vs 'h', 'l' vs 'r').

Hamming distance is particularly useful in applications involving error detection and correction, DNA sequence
analysis, and similarity search in binary data.

27. Short note: Linear and Non-Linear SVM

Linear Support Vector Machine (SVM):

Linear Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification tasks. It
works by finding the optimal hyperplane that separates the data points into different classes with the maximum
margin. The hyperplane is defined as the decision boundary that maximizes the margin between the nearest data
points of different classes, known as support vectors.

Key Characteristics:

1. Linear Separability: Linear SVM is suitable for datasets where the classes can be separated by a straight line or
hyperplane in the feature space.
2. Optimal Margin: Linear SVM aims to find the hyperplane that maximizes the margin between the support
vectors of different classes, which helps improve the generalization performance of the model.
3. Kernel Trick: Linear SVM can be efficiently implemented using the kernel trick, which allows the algorithm to
implicitly map the input features into a higher-dimensional space without explicitly computing the
transformation.

Applications:

 Text classification
 Image classification
 Bioinformatics
 Handwriting recognition

Advantages:

 Effective in high-dimensional feature spaces


 Robust to overfitting
 Memory efficient

Disadvantages:

 Limited to linearly separable datasets


 May not perform well with complex data distributions

Non-Linear Support Vector Machine (SVM):

Non-Linear Support Vector Machine (SVM) extends the linear SVM to handle datasets that are not linearly
separable in the original feature space. It achieves this by mapping the input features into a higher-dimensional
space using kernel functions, where the data becomes linearly separable, allowing for the construction of non-
linear decision boundaries.

Key Characteristics:

1. Kernel Functions: Non-linear SVM uses kernel functions such as polynomial, radial basis function (RBF), or
sigmoid to map the input features into a higher-dimensional space, where the data becomes linearly separable.
2. Flexibility: Non-linear SVM can capture complex relationships and decision boundaries in the data by
transforming the feature space using kernel functions.
3. Tuning Parameters: Non-linear SVM may require tuning parameters such as the choice of kernel function and
the regularization parameter 𝐶C to achieve optimal performance.

Applications:

 Image recognition
 Bioinformatics
 Sentiment analysis
 Financial forecasting

Advantages:

 Ability to handle non-linear relationships in the data


 Flexibility in choice of kernel functions
 Effective in high-dimensional feature spaces

Disadvantages:

 Computationally intensive, especially with large datasets


 Sensitive to the choice of kernel function and parameters
 Interpretability may be reduced compared to linear models

28. Short note: KNN algorithm

The k-nearest neighbors (KNN) algorithm is a non-parametric, supervised learning classifier, which uses
proximity to make classifications or predictions about the grouping of an individual data point. It is one of the
popular and simplest classification and regression classifiers used in machine learning today.

How KNN works

• Step-1: Select the number K of the neighbors


• Step-2: Calculate the Euclidean distance of K number of neighbors

• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance

• Step-4: Among these k neighbors, count the number of the data points in each category

• Step-5: Assign the new data points to that category for which the number of the neighbor is maximum

• Step-6: Our model is ready

• Firstly, we will choose the number of neighbors, so we will choose the k=5.

• Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the distance
between two points, which we have already studied.

• By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in category A and
two nearest neighbors in category B.

(except 20,22,24,25,26)

You might also like