Artificial Intelligence and Machine Learning (Theory Exam)
Artificial Intelligence and Machine Learning (Theory Exam)
Module 1
Q) Application of AI?
Gaming: AI plays a crucial role in strategic games by enabling machines to analyze large numbers of
possible positions based on heuristic knowledge. AI algorithms can make intelligent decisions, plan
strategies, and compete against human players in games like chess, poker, tic-tac-toe, and more.
Natural Language Processing: AI enables computers to understand and interact with humans using
natural language. Through techniques like speech recognition, AI systems can comprehend spoken
language, handle accents and slang, and respond appropriately. This technology finds applications in
virtual assistants, chatbots, voice-controlled systems, and more.
Expert Systems: AI can be used to create expert systems that integrate machine learning, software,
and specialized information to provide reasoning and advice. These systems can analyze complex
data, provide explanations, and offer expert-level guidance in various domains such as medicine,
finance, and law.
Vision Systems: AI-powered vision systems can understand and interpret visual input. They can
analyze images or videos to recognize objects, detect patterns, and extract meaningful information.
Vision systems find applications in surveillance, image recognition, medical diagnostics, and more.
Speech Recognition: AI enables intelligent systems to comprehend spoken language and extract
meaning from it. Speech recognition technology allows computers to understand and transcribe
spoken words, enabling applications like voice assistants, transcription services, and interactive voice
response systems.
Handwriting Recognition: AI-based handwriting recognition software can interpret and convert
handwritten text into editable digital format. This technology finds applications in digitizing
handwritten documents, signature verification, and electronic forms processing.
Intelligent Robots: AI empowers robots with the ability to perform tasks and interact with the
physical world. Through sensor integration, advanced processors, and machine learning algorithms,
intelligent robots can sense and understand their environment, learn from experiences, and adapt to
new situations. They find applications in industries like manufacturing, healthcare, exploration, and
more.
Q) Intelligent Agents, Types of Agents?
An intelligent agent is a software or system that perceives its environment, processes information,
and takes actions to achieve specific goals or objectives. Intelligent agents are designed to exhibit
intelligent behavior, which includes the ability to learn, reason, adapt, and make decisions based on
available information.
Agent Terminology
• Performance Measure of Agent − It is the criteria, which determines how successful an agent is.
• Behavior of Agent − It is the action that agent performs after any given sequence of percepts.
• Percept Sequence − It is the history of all that an agent has perceived till date.
Simple Reflex Agent: A simple reflex agent selects actions based solely on the current percept (input)
without considering the agent's internal state or history. It operates using a set of condition-action
rules or "if-then" statements. This type of agent lacks memory or the ability to consider the
consequences of its actions beyond the current percept.
Model-Based Reflex Agent: Unlike a simple reflex agent, a model-based reflex agent maintains an
internal model or representation of the world. It uses this model along with the current percept to
decide on the appropriate action. By considering the current state and the expected consequences of
different actions, the agent can make more informed decisions.
Goal-Based Agents: Goal-based agents have a predefined set of goals or objectives. They evaluate
the current state and compare it to the desired goal state. The agent selects actions that bring it
closer to achieving its goals. These agents often employ search algorithms or planning techniques to
determine the optimal sequence of actions.
Utility-Based Agents: Utility-based agents make decisions based on the expected utility or value of
different actions. They consider not only the goal but also the potential outcome and associated
utilities. The agent evaluates various actions and selects the one with the highest expected utility.
This approach is particularly useful when there are trade-offs between different goals or objectives.
Learning Agents: Learning agents have the ability to improve their performance over time through
learning from experience. They can acquire knowledge, adapt to changing environments, and adjust
their behavior based on feedback
Q) PEAS Representation?
Performance measure: The performance measure quantifies the success or effectiveness of the
agent in achieving its objectives. It defines the criteria for evaluating the agent's performance. The
measure can be based on various factors such as accuracy, speed, efficiency, quality, cost, customer
satisfaction, or any other relevant metric specific to the agent's task or domain.
Environment: The environment represents the external context or the world in which the agent
operates. It includes all the entities, objects, and conditions that the agent interacts with and that
can influence its behavior. The environment can be physical, virtual, or a combination of both. It
encompasses factors such as other agents, objects, events, rules, regulations, resources, constraints,
and any other relevant elements that impact the agent's actions and decision-making.
Actuators: Actuators are the physical or virtual components through which the agent interacts with
the environment. They enable the agent to perform actions or behaviors based on its decision-
making process. Actuators can include motors, robotic limbs, speech synthesizers, displays,
keyboards, touchscreens, software interfaces, or any other devices or mechanisms that allow the
agent to affect the environment and execute chosen actions.
Sensors: Sensors are the components that enable the agent to perceive or gather information from
the environment. They capture sensory inputs, signals, or data that provide the agent with a
representation of the current state of the environment. Sensors can be cameras, microphones,
temperature sensors, pressure sensors, proximity sensors, motion detectors, GPS receivers, or any
other input devices or systems that allow the agent to receive relevant information about the
environment.
To illustrate the PEAS representation, let's consider an example of a vacuum cleaning agent:
Performance measure: The performance measure for the vacuum cleaning agent could be the
cleanliness level of the room, measured by the percentage of dirt and debris removed or the time
taken to clean the room.
Environment: The environment for the vacuum cleaning agent consists of the room or space to be
cleaned. It includes furniture, walls, carpets, dirt, and any other objects or elements that the agent
encounters while cleaning.
Actuators: The actuators for the vacuum cleaning agent are the vacuum cleaner itself, which moves
across the room, activates suction, and performs cleaning actions.
Sensors: The sensors for the vacuum cleaning agent could include bump sensors to detect collisions
with furniture or walls, dirt sensors to detect the presence of dirt, and distance sensors to perceive
the proximity of objects.
Q) Architecture of Intelligent agents?
Logic-based architecture: In a logic-based architecture, the agent uses logical inference based on
symbolic representations from the environment. The agent represents the knowledge and rules
about the environment using logical statements or predicates. By applying logical reasoning, the
agent can derive new information, make deductions, and infer conclusions. This architecture is well-
suited for problems that can be represented and reasoned about using logical formalisms, such as
first-order logic or propositional logic.
Knowledge-level architecture: This architecture focuses on the knowledge and beliefs of the agent.
The agent extracts knowledge from the environment, which is considered as the agent's conviction
or faith. The agent has specific goals or requirements that it aims to achieve. The agent then selects
options or actions that align with its intended goal, which is known as the intension. This architecture
emphasizes the agent's knowledge acquisition, belief formation, and goal-directed behavior.
Layered architecture: The layered architecture is based on the idea of organizing the agent's
components into different layers, each responsible for a specific aspect of the agent's functionality.
Typically, the layers include perception, cognition, and action. The perception layer processes sensory
inputs and generates a representation of the environment. The cognition layer performs higher-level
processing, such as decision-making, planning, and reasoning. The action layer executes the selected
actions based on the decisions made in the cognition layer. The layered architecture provides a
modular and hierarchical approach to designing intelligent agents.
Q) Reasoning in AI?
Reasoning refers to the cognitive process of using logical and cognitive abilities to make sense of
information, draw conclusions, and make decisions. It is a fundamental aspect of human intelligence
and plays a crucial role in problem-solving, decision-making, and understanding the world.
1.Deductive Reasoning:
• the arguments can be valid or invalid based on the value of the premises
Eg:
• People who are aged 20 or above are active users of the internet.
• Out of the total number of students present in the class, the ratio of girls is
Example:
2.Inductive Reasoning
rather than drawing any particular conclusion to the facts at the beginning of
the process
• bottom-up reasoning.
• Even if the premises are true there is no chance that the conclusion will also be
true because it depends upon the inductive argument which can be either
strong or weak
Example:
• Premise: All of the pigeons we have seen in the zoo are white.
• begins with an incomplete set of facts, information and knowledge and then proceeds to
• draws conclusions based on what facts you know at present rather than collecting some
• Abductive reasoning is a form of logical reasoning which starts with single or multiple
observations then seeks to find the most likely explanation or conclusion for the observation.
Example:
• Conclusion: It is raining.
experiences.
• next point of time it faces a similar type of situation then it uses its
Example:
problem
add new information or facts to the existing one the conclusion remains the
logic-based systems.
• Monotonic reasoning is not useful for the real-time systems, as in real time,
Example:
in knowledge base like, "The moon revolves around the earth" Or "Earth is
6. Non-monotonic Reasoning
• "Human perceptions for various things in daily life, "is a general example of
non-monotonic reasoning.
Example:
Pitty is a bird
• So from the above sentences, we can conclude that Pitty can fly.
Q) Logic …zero order and first order?
Logic: Logic is the study of reasoning and inference. It provides a systematic way to analyze and
evaluate arguments, determine their validity, and draw logical conclusions. Logic establishes rules
and principles for correct reasoning and helps in understanding the structure of arguments. It is
concerned with the relationships between propositions, the truth or falsity of statements, and the
validity of logical deductions.
Propositional Logic: Propositional logic, also known as sentential logic, is a branch of logic that deals
with the relationships and operations between propositions or statements. Propositions in
propositional logic are atomic units of meaning that can be either true or false. It uses logical
connectives, such as AND, OR, NOT, and IF-THEN, to build complex statements from simpler
propositions. Propositional logic is valuable for analyzing the truth values and logical relationships of
statements without considering the internal structure of the propositions.
Example
“The road is closed, If the road is closed, then the traffic is blocked”
PQ
First-Order Logic: First-order logic, also called predicate logic or first-order predicate calculus, is an
extension of propositional logic that allows for more expressive and detailed reasoning. It
incorporates quantifiers, variables, and predicates to represent relationships between objects and to
quantify over sets of objects. First-order logic enables the representation of complex statements
involving properties, relations, and quantification, making it suitable for capturing a wide range of
logical and mathematical concepts.
And since there are all birds who fly so it will be represented as follows.
∀x bird(x) →fly(x).
Since there are not all students, so we will use ∀ with negation
Universal Generalization
• Universal generalization is a valid inference rule which states that if premise P(c) is true
for any arbitrary element c in the universe of discourse, then we can have a conclusion
as ∀ x P(x).
• This rule can be used if we want to show that every element has a similar property.
Example:
• Let's represent, P(c): "A byte contains 8 bits", so for ∀ x P(x) "All bytes contain 8 bits.",
Universal Instantiation
• As per UI, we can infer any sentence obtained by substituting a ground term for the
variable.
• a rule of inference which says that, given a formula of the form , one may infer for a new
constant symbol c.
• The UI rule state that we can infer any sentence P(c) by substituting a ground term c
(a constant within domain x) from ∀ x P(x) for any object in the universe of discourse.
• For example, from ∀ x Likes(x,IceCream), we can use the substitution {x/Ben} and infer
Likes(Ben, IceCream).
Example
Instantiation:
Existential Instantiation
• Existential instantiation is also called as Existential Elimination, which is a valid inference rule
in first-order logic.
• This rule states that one can infer P(c) from the formula given in the form of ∃x P(x) for a new
constant symbol c.
• The new KB is not logically equivalent to old KB, but it will be satisfiable if old KB was
satisfiable.
• The restriction with this rule is that c used in the rule must be a new term for which P(c ) is
true.
• It can be represented as
• For example, from ∃x Kill(x, Victim), we can infer Kill(Murderer, Victim), as long as Murderer
Existential introduction
• This rule states that if there is some element c in the universe of discourse which has a
property P, then we can infer that there exists something in the universe which has
the property P.
• Example
Problem Reduction
complex problem into smaller, more manageable subproblems that are easier to
solve.
science.
• These operators take a problem state as input and produce a new problem state
The Gradient Descent algorithm is an iterative optimization algorithm commonly used in machine
learning and optimization problems to find the minimum of a function. It is particularly useful in
training machine learning models by adjusting the model parameters to minimize the error or cost
function.
Steps:
Choose initial parameters: Start by initializing the model parameters with some initial values. These
values can be randomly assigned or set to predetermined values.
Calculate the cost function: Evaluate the cost function, which measures the error or discrepancy
between the model's predictions and the actual values. The cost function should be differentiable,
allowing us to compute gradients.
Compute the gradients: Calculate the partial derivatives of the cost function with respect to each
model parameter. These gradients represent the direction and magnitude of the steepest ascent of
the function at a given point.
Update the parameters: Adjust the model parameters in the opposite direction of the gradients to
move towards the minimum of the cost function. The update equation for each parameter is
typically: θ_new = θ_old - learning_rate * gradient, where θ_old is the old parameter value,
learning_rate is the step size or learning rate, and gradient represents the partial derivatives.
Repeat steps 2-4: Iterate the process by recalculating the cost function, gradients, and updating the
parameters. The number of iterations can be predetermined or stopped when a convergence
criterion is met.
Convergence check: Check for convergence by evaluating a convergence criterion, such as the
change in the cost function or the magnitude of the gradients. If the convergence criterion is
satisfied, stop the iterations; otherwise, go back to step 2.
Output: After convergence, the final parameter values are obtained, representing the optimized
solution. These parameter values can be used for making predictions or further analysis.
Q) Perceptron ?
Training Algorithm :
Step 4: Each input unit receives the signal unit and transmitsthe signal xi signal to all the units.
Step 5 : Each hidden unit Zj (z=1 to a) sums its weighted input signal to calculate its net input
Applying activation function zj = f(zinj) and sends this signals to all units in the layer about i.e
output units
For each output l=unit yk = (k=1 to m) sums its weighted input signals.
yk = f(yink)
Backpropagation Error :
Step 6: Each output unit yk (k=1 to n) receives a target pattern corresponding to an input pattern
then error is calculated as:
δk = ( tk – yk ) + yink
Step 7: Each hidden unit Zj (j=1 to a) sums its input from all units in the layer above
δinj = Σ δj wjk
δj = δinj + zinj
Step 8: Each output unit yk (k=1 to m) updates its bias and weight (j=1 to a). The weight correction
term is given by :
Δ wjk = α δk zj
Δ vij = α δj xi
Δ v0j = α δj
Step 9: Test the stopping condition. The stopping condition can be the minimization of error, number
of epochs.
MODULE 4
Q) Applications of ML?
Machine Learning has a wide range of applications across various industries. Here are some common
applications:
Image and speech recognition: ML algorithms can analyze and classify images, identify objects, and
recognize speech. This technology is used in facial recognition systems, self-driving cars, voice
assistants, and medical imaging analysis.
Natural language processing (NLP): ML algorithms enable machines to understand and generate
human language. NLP is used in chatbots, language translation, sentiment analysis, and text
summarization.
Recommender systems: ML algorithms can analyze user preferences and behavior to provide
personalized recommendations. This is commonly seen in streaming platforms, e-commerce
websites, and content recommendation systems.
Fraud detection: ML techniques can identify patterns of fraudulent activities by analyzing large
volumes of data, such as credit card transactions, insurance claims, or cybersecurity threats.
Healthcare and medicine: ML is used for diagnosing diseases, predicting patient outcomes, drug
discovery, personalized medicine, and analyzing medical images and records.
Financial analysis: ML algorithms are employed in stock market prediction, credit risk assessment,
algorithmic trading, and fraud detection in financial transactions.
Autonomous vehicles: ML plays a crucial role in developing self-driving cars by analyzing sensor data,
making real-time decisions, and ensuring safe navigation.
Q) Data Mining vs. Machine Learning vs. Big Data Analytics?
Data Mining: Data Mining is the process of discovering patterns, relationships, and insights from
large datasets. It involves techniques like statistical analysis, pattern recognition, and data
visualization. Data Mining aims to extract valuable information and knowledge from data, which can
be used for decision-making, marketing strategies, fraud detection, and customer segmentation.
Machine Learning: Machine Learning focuses on developing algorithms and models that enable
computers to learn and make predictions or decisions without being explicitly programmed. ML
algorithms analyze data to automatically learn patterns and relationships, enabling tasks like
classification, regression, clustering, and reinforcement learning. ML is applied in various domains
such as image recognition, natural language processing, recommender systems, and predictive
analytics.
Big Data Analytics: Big Data Analytics involves the processing and analysis of large and complex
datasets (Big Data). It encompasses techniques from both Data Mining and Machine Learning to
extract meaningful insights, patterns, and trends from massive datasets. Big Data Analytics leverages
distributed computing and parallel processing to handle the challenges associated with the volume,
velocity, and variety of data. It is used in industries like finance, healthcare, marketing, and social
media to gain insights, optimize processes, and make data-driven decisions.
Q) Naïve Base Classifier, , Classifying with k-Nearest Neighbour classifier, Decision Tree classifier,
Naive Bayes classifier?
Algorithm Steps:
Step 1: Preprocess the training data by converting it into a suitable format and handling missing
values or outliers if needed.
Step 2: Calculate the prior probabilities of each class based on the frequency of occurrence in the
training data.
Step 3: For each feature, calculate the likelihood probabilities of each class based on the feature
values and the class labels.
Step 4: Calculate the posterior probabilities of each class for a given instance using Bayes' theorem.
Step 5: Predict the class label for a new instance by selecting the class with the highest posterior
probability.
Algorithm Steps:
Step 1: Preprocess the training data by normalizing or scaling the features to ensure they are on a
similar scale.
Step 3: For a new instance, calculate the distances to all instances in the training data using a
distance metric such as Euclidean distance.
Step 5: Determine the majority class label among the k nearest neighbors.
Step 6: Predict the class label for the new instance as the majority class label.
Decision Tree Classifier:
Algorithm Steps:
Step 1: Preprocess the training data and handle missing values or outliers if required.
Step 2: Select an attribute or feature to split the data based on a criterion such as information gain or
Gini impurity.
Step 3: Split the data into subsets based on the selected attribute, creating branches in the decision
tree.
Step 4: Recursively repeat steps 2 and 3 for each subset of data until a stopping criterion is met (e.g.,
reaching a maximum depth or a minimum number of instances per leaf).
Step 5: Assign a class label or a predicted value to the leaf nodes based on the majority class or the
average value of instances in the leaf.
Step 6: Predict the class label or value for a new instance by traversing the decision tree based on the
attribute values of the instance.
Algorithm Steps:
Step 1: Preprocess the training data by converting it into a suitable format and handling missing
values or outliers if needed.
Step 2: Calculate the prior probabilities of each class based on the frequency of occurrence in the
training data.
Step 3: For each feature, calculate the likelihood probabilities of each class based on the feature
values and the class labels.
Step 4: Calculate the posterior probabilities of each class for a given instance using Bayes' theorem.
Step 5: Predict the class label for a new instance by selecting the class with the highest posterior
probability.
MODULE 5
Non-linear regression is a type of regression analysis where the relationship between the dependent
variable and the independent variables is modeled using a non-linear function. The steps involved in
non-linear regression are as follows:
Step 1: Preprocess the data by handling missing values, outliers, and scaling if necessary.
Step 2: Select an appropriate non-linear model that represents the relationship between the
variables. This can be a polynomial function, exponential function, logarithmic function, or any other
non-linear function.
Step 3: Estimate the parameters of the chosen non-linear model using optimization techniques like
least squares or maximum likelihood estimation.
Step 4: Assess the goodness of fit by evaluating metrics such as R-squared, adjusted R-squared, or
root mean squared error (RMSE).
Step 5: Interpret the estimated parameters and make predictions using the non-linear model.
The formula for a general non-linear regression model can be represented as:
y = f(x, β) + ε
Where:
Logistic regression is a type of regression analysis used for predicting binary or categorical outcomes.
It models the relationship between the independent variables and the probability of the outcome
using the logistic function. The steps involved in logistic regression are as follows:
Step 1: Preprocess the data by handling missing values, outliers, and scaling if necessary.
Step 2: Set up the logistic regression model by specifying the dependent variable and independent
variables.
Step 3: Estimate the parameters of the logistic regression model using maximum likelihood
estimation or other optimization techniques.
Step 4: Assess the model's performance using metrics like accuracy, precision, recall, or area under
the receiver operating characteristic (ROC) curve.
Step 5: Interpret the estimated coefficients to understand the relationship between the independent
variables and the log-odds of the outcome.
Step 6: Make predictions by applying the logistic function to the independent variables.
The logistic function, also known as the sigmoid function, is represented as:
p = 1 / (1 + e^(-z))
Where:
z is the linear combination of the independent variables and their corresponding coefficients.
Q) Random Forest?
Random Forest is an ensemble learning algorithm that combines multiple decision trees to make
predictions. It uses bootstrap sampling and random feature selection to create a diverse set of trees.
The steps involved in Random Forest are as follows:
Step 1: Preprocess the data by handling missing values, outliers, and scaling if necessary.
Step 2: Create a random forest by specifying the number of trees and other hyperparameters.
Build the decision tree using the selected data and features.
Step 4: Make predictions by aggregating the predictions of all the trees. For classification, this can be
done through majority voting, while for regression, it can be done by averaging the predicted values.
Step 5: Assess the performance of the Random Forest using appropriate evaluation metrics.
Step 6: Interpret the importance of features based on their contribution to the overall performance
of the Random Forest.
Bayesian Belief Networks (BBNs), also known as Bayesian Networks, are graphical models that
represent probabilistic relationships between variables. They are based on Bayesian inference and
use conditional probability distributions. The steps involved in using BBNs are as follows:
Step 1: Define the variables and their relationships by constructing a directed acyclic graph (DAG).
Step 2: Assign prior probabilities to the variables based on expert knowledge or available data.
Step 3: Specify the conditional probability distributions (CPDs) for each variable given its parents in
the graph.
Step 4: Update the probabilities using Bayes' theorem and available evidence to obtain the posterior
probabilities.
Step 5: Use the BBN to perform inference, make predictions, or analyze the relationships between
variables.
Step 6: Assess the performance and validity of the BBN by comparing the predictions with observed
data or using appropriate evaluation metrics.
Q) Bias Variance TradeOff?
The bias-variance tradeoff is a fundamental concept in machine learning that deals with the
relationship between model complexity, bias, and variance. It refers to the tradeoff between a
model's ability to capture the true underlying pattern (bias) and its sensitivity to variations in the
training data (variance).
High Bias: A model with high bias oversimplifies the underlying pattern and tends to underfit the
data. It may have a high training error and a high test error.
High Variance: A model with high variance captures the noise or random variations in the training
data and tends to overfit. It may have a low training error but a high test error.
The goal is to find the right balance between bias and variance to achieve good generalization and
avoid overfitting or underfitting. Regularization techniques, such as L1 and L2 regularization, can help
in managing the bias-variance tradeoff by controlling model complexity.
Tuning model complexity involves adjusting the hyperparameters or the structure of a model to find
the optimal balance between underfitting and overfitting. The steps involved in tuning model
complexity are as follows:
Step 1: Define a range of possible values for the hyperparameters that control model complexity,
such as the number of hidden layers in a neural network or the maximum depth of a decision tree.
Step 2: Split the data into training and validation sets. Use the training set to train the models with
different hyperparameter settings.
Step 3: Evaluate the performance of each model on the validation set using appropriate evaluation
metrics.
Step 4: Select the hyperparameter setting that yields the best performance on the validation set.
Step 5: Assess the final model's performance on a separate test set to ensure its generalization
capability.
Q) Model Selection Dilemma?
The model selection dilemma refers to the challenge of choosing the best model or algorithm for a
given problem. It involves considering factors such as model complexity, interpretability,
computational requirements, and the available data.
Step 1: Identify the problem and define the goals and requirements, such as accuracy,
interpretability, or computational efficiency.
Step 2: Explore and evaluate different models or algorithms suitable for the problem, considering
their strengths, limitations, and assumptions.
Step 3: Select evaluation metrics that align with the problem and compare the performance of the
models using techniques like cross-validation or hold-out validation.
Step 4: Consider other factors like computational requirements, interpretability, and any domain-
specific constraints in the model selection process.
Step 5: Make an informed decision based on a comprehensive evaluation of the models and their
suitability for the problem at hand.
Q) Expectation Maximization?
The Expectation-Maximization (EM) algorithm is an iterative method commonly used for estimating
parameters in statistical models when there are missing or incomplete data. It is particularly useful in
situations where there is an unobserved or latent variable. Here's a step-by-step explanation of the
EM algorithm:
Step 1: Initialization:
Start by initializing the parameters of the statistical model, either randomly or with some
predetermined values.
In this step, compute the expected values or probabilities of the missing or latent variables given the
observed data and the current parameter estimates.
In this step, update the parameter estimates based on the expected values obtained from the E-step.
Maximize the likelihood or maximize the complete data log-likelihood (which includes the missing or
latent variables) with respect to the parameters.
Step 4: Iterate Steps 2 and 3:
Repeat the E-step and M-step iteratively until convergence criteria are met.
Convergence criteria can be defined based on a maximum number of iterations or when the change
in the parameter estimates becomes smaller than a threshold.
Step 5: Output:
After convergence, the final parameter estimates are obtained, representing the maximum likelihood
estimates or maximum posterior estimates.
Q) Hierarchical Clustering?
Data Preparation:
Preprocess the data by handling missing values, outliers, and normalizing or standardizing the
features, if necessary.
Distance Calculation:
Compute the pairwise distances or dissimilarities between all pairs of data points.
Choose an appropriate distance metric, such as Euclidean distance for continuous variables or
Manhattan distance for categorical variables.
Calculate the similarity or dissimilarity between clusters based on the distance metric and chosen
linkage criteria.
The linkage criteria define how the similarity between clusters is measured, such as single linkage,
complete linkage, or average linkage.
Merge Clusters:
Identify the two most similar clusters based on the chosen linkage criteria and their distances.
Merge the two clusters into a single cluster, creating a new level in the hierarchy.
Update the distance matrix to reflect the new distances or dissimilarities between the merged cluster
and the remaining individual clusters.
This step ensures that the distances between the merged cluster and other clusters are recalculated.
Repeat Steps 4-6:
Repeat Steps 4-6 iteratively until all data points are merged into a single cluster or until a predefined
number of clusters is reached.
In each iteration, identify the most similar clusters, merge them, and update the distance matrix.
Dendrogram Construction:
The dendrogram shows the sequence of merges and the distances at which they occur.
Cluster Selection:
Alternatively, use other methods like the elbow method or silhouette analysis to determine the
optimal number of clusters.
Based on the selected number of clusters, assign each data point to its corresponding cluster.
This step finalizes the clustering process and assigns data points to the identified clusters.
MODULE 6
Q) SVM?
Data Preparation:
Start with a labeled dataset consisting of input features and corresponding class labels.
Preprocess the data by handling missing values, outliers, and scaling or normalizing the features, if
required.
Selecting a Kernel:
Choose a suitable kernel function that transforms the input data into a higher-dimensional feature
space.
Common kernel functions include linear, polynomial, radial basis function (RBF), and sigmoid.
Determine the optimal hyperplane that separates the data points with the largest margin using a
process called "training" or "fitting" the SVM model.
The SVM algorithm aims to find the decision boundary that maximizes the margin while minimizing
the classification error.
Define the optimization problem as maximizing the margin subject to certain constraints.
The constraints ensure that data points are classified correctly or lie within a certain margin around
the decision boundary.
Use mathematical optimization techniques, such as quadratic programming, to solve the formulated
optimization problem.
The solution provides the coefficients (weights) for the hyperplane and the support vectors.
Support Vectors:
Identify the support vectors, which are the data points that lie on the margin or violate the margin
constraints.
Support vectors play a crucial role in defining the decision boundary and are used to make
predictions.
Predictions:
After training, the SVM model can be used to predict the class labels of new, unseen data points.
For classification, the predicted class label is determined by evaluating the position of the data point
relative to the decision boundary.
For regression, the predicted value is obtained from the distance of the data point to the hyperplane.
Handling Non-linear Data:
If the data is not linearly separable, SVM can still classify the data by using kernel tricks to transform
the input space into a higher-dimensional feature space.
Kernel functions allow SVM to effectively handle non-linear data by implicitly mapping it into a
higher-dimensional space where linear separation is possible.
Q) Bagging?
Bagging is an ensemble learning technique that aims to reduce variance and improve the
performance of classifiers by combining multiple models trained on different subsets of the dataset.
Here's how it works:
Create multiple bootstrap samples by randomly sampling the dataset with replacement.
Each bootstrap sample has the same size as the original dataset but may contain duplicate instances
and miss some original instances.
Step 2: Training
Train a base classifier (e.g., decision trees) on each bootstrap sample independently.
Each base classifier is trained on a different subset of the data, resulting in diverse classifiers.
Step 3: Aggregation
Aggregate the predictions of individual base classifiers by majority voting (for classification) or
averaging (for regression).
Bagging helps improve the generalization and stability of the model by reducing overfitting and
capturing different aspects of the data.
Q) Stacking?
Stacking, also known as stacked generalization, is an ensemble learning technique that combines
multiple models by training a meta-model to make predictions based on the outputs of individual
models. Here's how it works:
Train multiple base models (e.g., decision trees, SVMs, neural networks) on the training data.
The meta-model is trained on this dataset, where the target variable is the actual class label.
Use the trained base models to generate predictions for unseen data.
Use these predictions as input features for the trained meta-model to make the final prediction.
Stacking allows for more complex relationships between features and can potentially improve
prediction accuracy compared to individual models.
Q) Boosting?
Boosting is an ensemble learning technique that combines multiple weak learners to create a strong
learner. The key idea is to give more importance to misclassified instances, allowing subsequent
learners to focus on these instances. Here's an overview of the boosting process:
Train a series of weak learners sequentially, where each learner is trained on a modified version of
the training set.
The modification involves giving higher weight to misclassified instances from the previous learners.
Assign weights to each weak learner based on their performance on the training data.
In the final prediction, the weak learners' predictions are combined by weighted voting, with higher
weight given to more accurate learners.
Boosting iteratively improves the overall model's performance by focusing on difficult instances and
adjusting the learners' weights accordingly.
Q) Implementing ADA Boosting?
AdaBoost (Adaptive Boosting) is a popular boosting algorithm that adjusts the weights of training
instances based on their classification errors. Here's how AdaBoost works:
Each weak learner focuses on the misclassified instances from the previous learners by adjusting the
weights.
Combine the predictions of weak learners by weighted voting, where each learner's vote is weighted
by its accuracy.
Increase the weights of misclassified instances to focus more on them in the next iteration.
Combine the predictions of all weak learners using their respective weights to obtain the final
ensemble prediction.
AdaBoost creates a strong learner by iteratively improving its ability to classify difficult instances.
For each learner, make a prediction and adjust the instance weights based on its accuracy.
Weighted voting gives more weight to accurate learners and less weight to weak learners.
Step 4: Final Prediction
The final prediction is obtained by combining the predictions of weak learners using weighted voting.
Q) Bootstrap?
Bootstrapping:
Bootstrapping involves randomly sampling the dataset with replacement to create multiple subsets
of the data.
Each subset is called a bootstrap sample and has the same size as the original dataset.
Bootstrapping is often used in bagging and random forest techniques to create diverse subsets for
training.
Q) Cross Validatton
The model is trained on a subset of the data and evaluated on the remaining portion.
It helps in estimating the model's performance on unseen data and assessing its generalization
ability.
Q) Dimensionality Reduction?
Data Preprocessing:
Preprocess the dataset by handling missing values, outliers, and scaling the features if necessary.
Choose between feature selection or feature extraction based on the goals and characteristics of the
dataset.
Select a feature selection method, such as filter methods, wrapper methods, or embedded methods.
Define the evaluation criterion (e.g., correlation, information gain, or model performance) to rank
the features.
Select a feature extraction method, such as Principal Components Analysis (PCA) or Linear
Discriminant Analysis (LDA).
Apply the chosen method to the dataset to obtain the transformed features.
Determine the number of components or dimensions to retain based on explained variance or other
criteria.
Evaluate the performance of the reduced feature set on a separate test dataset using appropriate
evaluation metrics.
Compare the results with the original feature set to assess the effectiveness of dimensionality
reduction.
Q) PCA?
Principal Component Analysis (PCA) is a dimensionality reduction algorithm that aims to transform a
high-dimensional dataset into a lower-dimensional space while retaining the most important
information or patterns in the data. PCA achieves this by finding a set of orthogonal axes, called
principal components, that capture the maximum variance in the data.
Data Standardization:
PCA typically requires standardizing the dataset to have zero mean and unit variance.
The covariance matrix measures the relationships between different features in the dataset.
Eigenvalue Decomposition:
This decomposition breaks down the covariance matrix into its eigenvectors and eigenvalues.
Sorting Eigenvalues:
The eigenvalues represent the amount of variance explained by each eigenvector or principal
component.
Select the top k eigenvectors with the highest eigenvalues to retain as principal components.
The number of principal components chosen depends on the desired dimensionality reduction.
Project the original standardized dataset onto the selected principal components.
This projection transforms the data from the original high-dimensional space to the lower-
dimensional space spanned by the principal components.
Reconstruction:
If desired, the data can be reconstructed back to the original space using the retained principal
components.
Reconstruction involves multiplying the projected data by the transpose of the retained principal
components and adding the mean of the original data.
Q) Multi Dimensional Scaling?
Multidimensional Scaling (MDS) is a dimensionality reduction algorithm that aims to visualize high-
dimensional data in a lower-dimensional space while preserving the pairwise distances or
dissimilarities between the data points. MDS seeks to represent the data points in a way that
maintains their relative similarities or dissimilarities.
Dissimilarity Matrix:
The dissimilarity matrix measures the pairwise dissimilarities between data points.
Dissimilarity measures can vary depending on the nature of the data (e.g., Euclidean distance,
correlation distance, or other custom measures).
Define Dimensionality:
Determine the desired dimensionality of the lower-dimensional space in which to visualize the data.
Typically, MDS aims to reduce the dimensionality to two or three for visualization purposes.
Stress Function:
Define a stress function that quantifies the discrepancy between the pairwise distances in the
original high-dimensional space and the lower-dimensional space.
Common stress functions include Kruskal's stress formula or the Sammon mapping stress formula.
Optimization:
Optimize the stress function to find the optimal configuration of data points in the lower-dimensional
space.
The optimization process adjusts the positions of the data points iteratively to minimize the stress
function.
Position Initialization:
Iterative Update:
Iteratively update the positions of the data points in the lower-dimensional space to minimize the
stress function.
The updates are typically performed using gradient descent or other optimization techniques.
Convergence:
The algorithm stops when the stress function reaches a satisfactory level or when a maximum
number of iterations is reached.
Visualization:
Once the optimization process is complete, visualize the data points in the lower-dimensional space.
The positions of the data points represent their representations in the lower-dimensional space
while preserving the pairwise dissimilarities.
Linear Discriminant Analysis (LDA) is a dimensionality reduction technique used for feature extraction
and classification. It aims to find a linear combination of features that maximizes the separation
between different classes while minimizing the within-class variance. LDA is particularly useful for
supervised classification tasks.
Data Preparation:
Start with a labeled dataset where the input features are represented as X and the corresponding
class labels as y.
Calculate the mean vectors of each class by averaging the feature vectors of instances belonging to
the same class.
Compute the overall mean vector, which is the average of all feature vectors in the dataset.
Compute the between-class scatter matrix (SB) as the sum of the outer products of the difference
between class mean vectors and the overall mean vector.
SB = ∑(mi - m)(mi - m)T, where mi represents the mean vector of class i and m represents the overall
mean vector.
Compute Within-Class Scatter Matrix (SW):
Calculate the within-class scatter matrix (SW) as the sum of the scatter matrices of individual classes.
The scatter matrix of each class is the sum of the outer products of the difference between each
instance's feature vector and its class mean vector.
SW = ∑(xi - mi)(xi - mi)T, where xi represents a feature vector of an instance from class i and mi
represents the mean vector of class i.
Perform eigenvalue decomposition on the matrix SW^-1 * SB to obtain the eigenvalues and
corresponding eigenvectors.
Select the k eigenvectors corresponding to the k largest eigenvalues to retain as the discriminant
components.
Typically, the number of discriminant components selected is less than the number of classes.
Project the original feature vectors onto the selected discriminant components.
Multiply the feature vectors by the eigenvectors corresponding to the selected discriminant
components.
Classification:
Use the reduced feature vectors as input to a classifier of choice (e.g., logistic regression, support
vector machine) for classification tasks.