0% found this document useful (0 votes)

66 views5 pages

ML Viva and Oral Question and Answers

Uploaded by

dnyanesh26mali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views5 pages

ML Viva and Oral Question and Answers

Uploaded by

dnyanesh26mali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

ML VIVA / ORAL QUESTION

ASSIGNMENT 1: Predict Uber Ride Prices

Q.1: How did you preprocess the Uber fare dataset?

o Answer: I cleaned the data by handling missing values, converting
categorical features into numerical values, scaling the data if
necessary, and performing any date-time processing to extract
relevant features like day, month, or hour.

Q.2: How did you identify outliers in the dataset?

o Answer: I used statistical methods like the IQR (Interquartile
Range) and Z-score methods to detect outliers in the continuous
variables. Visualization tools like box plots also helped identify
anomalies.

Q.3: Why did you choose Linear Regression and Random Forest models for
this prediction?
o Answer: Linear Regression is a simple and interpretable model,
ideal for understanding relationships between variables, while
Random Forest handles non-linearity better and can capture
complex interactions in the data.

Q.4: How did you evaluate and compare the models?

o Answer: I used metrics such as R-squared (R²), Root Mean
Squared Error (RMSE), and Mean Absolute Error (MAE) to assess
and compare the accuracy and error of each model.

Q.5: How did you handle feature selection and engineering to improve model
accuracy?
 Answer: I engineered features from date-time data, like hour of day, day
of week, and month, which could influence pricing due to demand
patterns. Additionally, I explored the distance between pickup and
drop-off as a key feature, using Haversine distance for more precise
measurements, which helped improve model relevance.

Q.6: Why is Random Forest often preferred over Linear Regression in price
prediction tasks?
 Answer: Random Forest handles non-linear relationships and
interactions between variables, which are common in real-world pricing
data. It’s also less sensitive to outliers and can better handle missing or
sparse data, making it more robust in varied conditions compared to
Linear Regression.
ASSIGNMENT 2: Email Spam Detection

Q.1: Why did you choose K-Nearest Neighbors (KNN) and Support Vector
Machine (SVM) for this classification?
o Answer: KNN is a straightforward algorithm with effective results
for smaller datasets, while SVM works well with high-dimensional
data and can create a distinct boundary between classes, making
it suitable for spam detection.

Q.2: How did you preprocess the email data for spam classification?
o Answer: I tokenized the email text, removed stop words,
converted text to lowercase, and used techniques like TF-IDF or
Bag of Words to convert the text data into numerical features
suitable for the algorithms.

Q.3: What metrics did you use to evaluate the model performance?
o Answer: I used accuracy, precision, recall, and F1-score to
evaluate performance. Precision and recall are particularly
important for spam detection to balance false positives and false
negatives.

Q.4: What challenges do you face with text data in machine learning, and
how did you address them?
 Answer: Text data is high-dimensional and unstructured, so
preprocessing is essential. I used techniques like tokenization, stop
word removal, and TF-IDF transformation to convert text into numerical
vectors while retaining important information. This reduced data
complexity and improved classification accuracy.

Q.5: Why is precision particularly important in spam detection models?

 Answer: Precision is crucial because a high false positive rate (non-
spam labeled as spam) can lead to important emails being marked as
spam, which disrupts user experience. High precision ensures that
flagged emails are likely to be actual spam, improving trust in the
model.
ASSIGNMENT 3: Diabetes Prediction Using KNN

Q.1: What preprocessing steps were applied to the diabetes dataset?

o Answer: The preprocessing involved handling missing values,
scaling the data since KNN is sensitive to feature magnitudes, and
converting any categorical features into numerical format if
present.

Q.2: How did you determine the optimal value for K in KNN?
o Answer: I used cross-validation and/or the elbow method by
plotting accuracy against different K values to select the value
with the best performance.

Q.3: What does the confusion matrix tell you about model performance?
o Answer: The confusion matrix provides insights into the number of
true positives, false positives, true negatives, and false negatives,
helping calculate precision, recall, and error rates for a deeper
understanding of model accuracy.

Q.4: Why is feature scaling important in K-Nearest Neighbors (KNN), and

how did you implement it?
 Answer: KNN is distance-based, meaning feature scales can heavily
influence distance calculations. Without scaling, features with larger
ranges dominate the distance metric, skewing the results. I applied
MinMax scaling to normalize feature values between 0 and 1, ensuring
fair contribution from all features.

Q.5: What insights do you gain from the confusion matrix beyond overall
accuracy?
 Answer: The confusion matrix allows us to see the distribution of true
positives, false positives, true negatives, and false negatives, giving a
clear picture of errors. From this, I calculated precision, recall, and the
error rate, which are critical for understanding the model’s reliability,
especially in a healthcare context where misclassifications can have
serious implications.
ASSIGNMENT 4: Sales Data Clustering

Q.1: What’s the purpose of clustering in this context?

o Answer: Clustering helps group sales data into segments based
on similar patterns, which can assist in identifying customer
types, product preferences, or regional trends, aiding in targeted
marketing.

Q.2: How did you decide on the number of clusters?

o Answer: I used the elbow method by plotting the Within-Cluster-
Sum of Squared Errors (WCSS) against the number of clusters and
chose the point where the decrease in WCSS starts to diminish,
indicating the optimal cluster count.

Q.3: Can you explain the difference between K-Means and hierarchical
clustering?
o Answer: K-Means is a partition-based method that divides the
dataset into clusters by minimizing the variance within clusters,
while hierarchical clustering builds a tree of clusters by either
agglomerating or dividing them in a nested manner.

Q.4: How does the elbow method assist in determining the optimal number of
clusters?
 Answer: The elbow method involves plotting the Within-Cluster-Sum of
Squares (WCSS) against different cluster counts. The “elbow” point,
where WCSS reduction starts to plateau, suggests the optimal number
of clusters as it balances between model simplicity and cluster
separation effectiveness.

Q.5: Can you explain a real-world application of K-Means clustering in sales

data?
 Answer: K-Means clustering can segment customers based on
purchasing behavior, helping identify high-value customer groups or
regional preferences. This enables businesses to tailor marketing
strategies, optimize product recommendations, and improve customer
targeting, thus boosting sales efficiency.
MINI PROJECT: Titanic Survival Prediction
Q.1: How did you handle missing data in the Titanic dataset?
o Answer: For features like age, I used mean or median imputation,
and for categorical variables like embarked, I used the mode.
Missing values in fare were filled based on the median or grouped
median of similar passenger classes.

Q.2: What features did you use to build the survival prediction model?
o Answer: Key features included passenger age, gender, class,
fare, and embarked location. These features were chosen
because they likely influenced survival chances based on
historical accounts of the disaster.

Q.3: Which machine learning algorithms did you use, and why?
o Answer: I used models like Logistic Regression for its
interpretability and Random Forest for its robustness in handling
mixed data types and complex interactions, allowing for a more
nuanced survival prediction.

Q.4: How did you handle categorical features like gender and socio-
economic class in your Titanic dataset?
 Answer: I used label encoding for gender, as it’s a binary variable, and
one-hot encoding for socio-economic class to ensure the model treats
each class independently without any ordinal assumption. This
representation enabled the model to understand categorical influences
on survival.

Q.5: Which performance metric is most relevant for this classification task,
and why?
 Answer: Accuracy is useful, but due to the importance of minimizing
false negatives (misclassifying survivors as non-survivors), I focused on
recall for the survival class. High recall ensures that most actual
survivors are identified, which is crucial in scenarios where it’s
essential to avoid missing positive cases.

Harrison Kinsley, Daniel Kukieła - Neural Networks From Scratch in Python (2020) - 1-30
No ratings yet
Harrison Kinsley, Daniel Kukieła - Neural Networks From Scratch in Python (2020) - 1-30
30 pages
Data Science For Online Customer Analytics - Assignment
No ratings yet
Data Science For Online Customer Analytics - Assignment
11 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
120 DS-With Answer
100% (1)
120 DS-With Answer
32 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Lab 2
100% (1)
Lab 2
4 pages
examBD2223 January Solutions
No ratings yet
examBD2223 January Solutions
7 pages
QCM DL
No ratings yet
QCM DL
7 pages
SEM MLOps
No ratings yet
SEM MLOps
58 pages
Sem Rpa
No ratings yet
Sem Rpa
61 pages
DIT865 2018 Mar Solution
No ratings yet
DIT865 2018 Mar Solution
9 pages
ML Viva Questions
No ratings yet
ML Viva Questions
4 pages
Ai Possible Qns
No ratings yet
Ai Possible Qns
15 pages
Untitled 10
No ratings yet
Untitled 10
12 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
IML-IITKGP - Assignment 1 Solution
No ratings yet
IML-IITKGP - Assignment 1 Solution
7 pages
Interview Questions On Machine Learning
100% (4)
Interview Questions On Machine Learning
22 pages
40 ML Interview Questions
No ratings yet
40 ML Interview Questions
12 pages
ML Important Questions
No ratings yet
ML Important Questions
7 pages
Interview Questions Companie
No ratings yet
Interview Questions Companie
72 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
ISE 529 Mock Test Answers
No ratings yet
ISE 529 Mock Test Answers
6 pages
5 Markd
No ratings yet
5 Markd
24 pages
Supervised Classification Notes
No ratings yet
Supervised Classification Notes
31 pages
ML Questions Answers
No ratings yet
ML Questions Answers
4 pages
Machine Learning Midterm
No ratings yet
Machine Learning Midterm
18 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
40 ML Interview Questions That You Must Know (2024) - Reader View
No ratings yet
40 ML Interview Questions That You Must Know (2024) - Reader View
13 pages
ML FA24 Final Term Exam (Solution)
No ratings yet
ML FA24 Final Term Exam (Solution)
19 pages
ML 5 Marks Questions Answers 1 To 30
No ratings yet
ML 5 Marks Questions Answers 1 To 30
5 pages
QB For AIML
No ratings yet
QB For AIML
4 pages
40 Interview Questions On Machine Learning - AnalyticsVidhya
100% (1)
40 Interview Questions On Machine Learning - AnalyticsVidhya
21 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Answer 2022-23
No ratings yet
Answer 2022-23
22 pages
Machine Learning Solutions
No ratings yet
Machine Learning Solutions
6 pages
ASSIGNMENT2
No ratings yet
ASSIGNMENT2
6 pages
15 Mlops Interview Questions For 2025
No ratings yet
15 Mlops Interview Questions For 2025
13 pages
Viva ML
No ratings yet
Viva ML
10 pages
Project Presentation Viva Question and Answers
No ratings yet
Project Presentation Viva Question and Answers
4 pages
ML Medium Questions Answers Full
No ratings yet
ML Medium Questions Answers Full
7 pages
Group B: Machine Learning
No ratings yet
Group B: Machine Learning
25 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
Data Mining Report
No ratings yet
Data Mining Report
7 pages
ML 2m Cie2
No ratings yet
ML 2m Cie2
4 pages
Machine Learning Interview Questions PDF
No ratings yet
Machine Learning Interview Questions PDF
14 pages
M L
No ratings yet
M L
4 pages
40 Interview Questions On Machine Learning From Analytics Vidhya
No ratings yet
40 Interview Questions On Machine Learning From Analytics Vidhya
14 pages
Week 4 Q&A
No ratings yet
Week 4 Q&A
7 pages
Q1-What's The Trade-Off Between Bias and Variance?
100% (1)
Q1-What's The Trade-Off Between Bias and Variance?
5 pages
Midterm - APS1070 - 2019 - 09 Fall
No ratings yet
Midterm - APS1070 - 2019 - 09 Fall
2 pages
Aiml Nts
No ratings yet
Aiml Nts
33 pages
PA Answers
No ratings yet
PA Answers
4 pages
ML Mid Sem Sep2023 Paper
No ratings yet
ML Mid Sem Sep2023 Paper
3 pages
Nptel ML Questions
No ratings yet
Nptel ML Questions
12 pages
Final Exam Sujective Ch-1-8 Question Bank Fill in Blanks
No ratings yet
Final Exam Sujective Ch-1-8 Question Bank Fill in Blanks
5 pages
Ai Chapter 4
No ratings yet
Ai Chapter 4
3 pages
Ai ML Unit 3
No ratings yet
Ai ML Unit 3
15 pages
MLL
No ratings yet
MLL
2 pages
MLT Essentials
No ratings yet
MLT Essentials
32 pages
Ml-Unit 2-QB
No ratings yet
Ml-Unit 2-QB
6 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
PJ 1
No ratings yet
PJ 1
4 pages
DMDW Syllabus
No ratings yet
DMDW Syllabus
2 pages
AIA 6550 Module 5
50% (2)
AIA 6550 Module 5
21 pages
Dbscan: Presented By: Garrett Poppe
No ratings yet
Dbscan: Presented By: Garrett Poppe
22 pages
XL Miner User Guide
No ratings yet
XL Miner User Guide
420 pages
Module IV - K NN
No ratings yet
Module IV - K NN
15 pages
Remote Sensing: Mapping Mangrove Forests Based On Multi-Tidal High-Resolution Satellite Imagery
No ratings yet
Remote Sensing: Mapping Mangrove Forests Based On Multi-Tidal High-Resolution Satellite Imagery
20 pages
BAD613B Important Questions
No ratings yet
BAD613B Important Questions
2 pages
Supply Chain
No ratings yet
Supply Chain
14 pages
Chapter 7
No ratings yet
Chapter 7
18 pages
Principles of Deep Learning
No ratings yet
Principles of Deep Learning
2 pages
Data Analytics Lab Manual Using R Programming
No ratings yet
Data Analytics Lab Manual Using R Programming
27 pages
ACCEPTED Meidan Et Al (2017) ProfilIoT A Machine Learning Approach For IoT Device Identification Based On Network Traffic Analysis
No ratings yet
ACCEPTED Meidan Et Al (2017) ProfilIoT A Machine Learning Approach For IoT Device Identification Based On Network Traffic Analysis
4 pages
BRM Unit 3 PDF
No ratings yet
BRM Unit 3 PDF
24 pages
ML UNIT-4 Notes PDF
100% (1)
ML UNIT-4 Notes PDF
40 pages
Very Deep Learning
No ratings yet
Very Deep Learning
38 pages
Human Diseases Detection Based On Machine Learning Algorithms: A Review
No ratings yet
Human Diseases Detection Based On Machine Learning Algorithms: A Review
13 pages
Myocardial Risk As Preventive Medicine
No ratings yet
Myocardial Risk As Preventive Medicine
18 pages
Data Analysis With Open Source Tools 1st Edition Philipp K. Janert Download
100% (2)
Data Analysis With Open Source Tools 1st Edition Philipp K. Janert Download
61 pages
PBL Project
No ratings yet
PBL Project
18 pages
A Complete Guide To KNN
No ratings yet
A Complete Guide To KNN
16 pages
Methodology For Gender Identification, Classification and Recognition of Human Age
No ratings yet
Methodology For Gender Identification, Classification and Recognition of Human Age
6 pages
WIREs Data Min Knowl - 2020 - Ntoutsi - Bias in Data Driven Artificial Intelligence Systems An Introductory Survey
No ratings yet
WIREs Data Min Knowl - 2020 - Ntoutsi - Bias in Data Driven Artificial Intelligence Systems An Introductory Survey
14 pages
Homework 4: Multivariate Analysis, S. S. Mukherjee, Fall 2018
No ratings yet
Homework 4: Multivariate Analysis, S. S. Mukherjee, Fall 2018
7 pages
Intrusion Detection Using Big Data and Deep Learning Techniques
No ratings yet
Intrusion Detection Using Big Data and Deep Learning Techniques
9 pages
Journal Paper 1
No ratings yet
Journal Paper 1
5 pages
Predictive Maintenance Solution
No ratings yet
Predictive Maintenance Solution
15 pages
Classification Tutorial
No ratings yet
Classification Tutorial
35 pages

ML Viva and Oral Question and Answers

Uploaded by

ML Viva and Oral Question and Answers

Uploaded by

ML VIVA / ORAL QUESTION

ASSIGNMENT 1: Predict Uber Ride Prices

Q.1: How did you preprocess the Uber fare dataset?

Q.2: How did you identify outliers in the dataset?

Q.4: How did you evaluate and compare the models?

Q.5: Why is precision particularly important in spam detection models?

Q.1: What preprocessing steps were applied to the diabetes dataset?

Q.4: Why is feature scaling important in K-Nearest Neighbors (KNN), and

Q.1: What’s the purpose of clustering in this context?

Q.2: How did you decide on the number of clusters?

Q.5: Can you explain a real-world application of K-Means clustering in sales

You might also like