0% found this document useful (0 votes)
28 views21 pages

Unit 1- Capstone Project-Answer Key

Cppp system

Uploaded by

Rashmi Kaith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views21 pages

Unit 1- Capstone Project-Answer Key

Cppp system

Uploaded by

Rashmi Kaith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

UNIT 1: CAPSTONE PROJECT

1. What is the main purpose of a Capstone Project?


o A) To demonstrate theoretical knowledge.
o B) To complete a thesis paper.
o C) To integrate all knowledge gained through a comprehensive project.
o D) To learn about different industries.
o Answer: C
2. Which of the following is not an objective of a Capstone Project?
o A) Solving real-world problems.
o B) Expressing solutions in technical terms.
o C) Selecting appropriate algorithms for a problem.
o D) Learning teamwork.
o Answer: B
3. Which AI project involves predicting stock prices?
o A) Movie Ticket Price Predictor
o B) Stock Prices Predictor
o C) Sentiment Analyzer
o D) Student Results Predictor
o Answer: B
4. Which AI model is typically used for classification?
o A) Regression
o B) Clustering
o C) Classification
o D) Anomaly Detection
o Answer: C
5. What is the first step in the AI project cycle?
o A) Model construction
o B) Data gathering
o C) Problem definition
o D) Evaluation & refinements
o Answer: C
6. Which step is critical in determining whether AI techniques are applicable to a problem?
o A) Gathering data
o B) Identifying a pattern in data
o C) Deploying the model
o D) Selecting the right algorithm
o Answer: B
Design Thinking and Problem Decomposition
7. Which of the following is not a stage of Design Thinking?
o A) Empathize
o B) Define
o C) Deploy
o D) Prototype
o Answer: C
8. Which is an example of problem decomposition?
o A) Gathering data from sensors
o B) Breaking down app development into multiple tasks
o C) Running machine learning models
o D) Collecting user feedback
o Answer: B
9. Which concept involves the breakdown of time series data into trend, seasonality, and noise
components?
• A) Time Series Forecasting
• B) Design Thinking
• C) Problem Decomposition
• D) Time Series Decomposition
• Answer: D
AI Model Construction and Analytics
10. Which is the first step in any AI or machine learning project?
• A) Data modeling
• B) Data collection
• C) Business understanding
• D) Cross-validation
• Answer: C
11. Which is the foundational methodology for data science?
• A) Data mining
• B) CRISP-DM
• C) Agile
• D) SDLC
• Answer: B
12. Which of these approaches would you use for showing relationships between variables?
• A) Predictive approach
• B) Descriptive approach
• C) Classification approach
• D) Regression
• Answer: B
13. When might a predictive model be used?
• A) To explain historical data
• B) To show relationships between data
• C) To predict future outcomes
• D) To cluster similar data points
• Answer: C
Data Requirements and Modeling
14. What question should be asked first in a data project?
• A) What is the business outcome?
• B) How will the data be collected?
• C) What data is needed?
• D) What algorithm will be used?
• Answer: A
15. Which dataset is commonly used for predicting house prices?
• A) Airline Passenger Dataset
• B) Forestfires Dataset
• C) Housing Dataset
• D) MNIST Dataset
• Answer: C
16. What is a descriptive model used for?
• A) Prediction of new data
• B) Describing relationships in historical data
• C) Anomaly detection
• D) Identifying missing data
• Answer: B
17. Which concept refers to adjusting models using new data to improve their accuracy?
• A) Refinement
• B) Validation
• C) Cross-validation
• D) Feature selection
• Answer: A
Model Validation
18. What does the train-test split method achieve?
• A) Collecting the data
• B) Evaluating model performance
• C) Data pre-processing
• D) Model deployment
• Answer: B
19. What percentage is commonly used for training data in a train-test split?
• A) 80%
• B) 20%
• C) 67%
• D) 50%
• Answer: A
20. In a cross-validation process, how many subsets are generally created in a 5-fold cross-
validation?
• A) 2
• B) 5
• C) 10
• D) 3
• Answer: B
21. When is cross-validation more beneficial than train-test split?
• A) For large datasets
• B) For datasets with limited rows
• C) When doing unsupervised learning
• D) For high computational costs
• Answer: B
Metrics of Model Quality
22. Which of the following is a commonly used metric for regression models?
• A) Accuracy
• B) Precision
• C) Recall
• D) Root Mean Squared Error (RMSE)
• Answer: D
23. Which metric is most suitable for classification tasks?
• A) MSE
• B) Accuracy
• C) RMSE
• D) Noise ratio
• Answer: B
24. Which of the following is used to calculate RMSE?
• A) Mean of residuals
• B) Sum of absolute errors
• C) Square root of the mean of squared errors
• D) Mean of absolute differences
• Answer: C
25. What does a low RMSE indicate?
• A) Poor model performance
• B) High variance in predictions
• C) Accurate predictions
• D) Overfitting
• Answer: C
Advanced Topics and Applications
26. Which algorithm is used in the example of the Airline Passenger Dataset?
• A) Decision Tree
• B) Random Forest
• C) Seasonal Decomposition
• D) Support Vector Machine
• Answer: C
27. Which value represents the best prediction in MSE?
• A) The highest value
• B) The lowest value
• C) The mean of predictions
• D) The median of predictions
• Answer: B
28. What type of learning involves algorithms like regression or classification?
• A) Supervised learning
• B) Unsupervised learning
• C) Reinforcement learning
• D) Semi-supervised learning
• Answer: A
29. Which algorithm is most suitable for a regression problem?
• A) Decision tree
• B) Linear regression
• C) K-nearest neighbors
• D) Naive Bayes
• Answer: B
30. In a recommendation system, which method is typically used to suggest new items?
• A) Clustering
• B) Regression
• C) Collaborative filtering
• D) Anomaly detection
• Answer: C
31. Which of the following would be considered a feature in a dataset?
• A) The target label
• B) An algorithm
• C) A variable used for prediction
• D) The test set
• Answer: C
32. Which is the most reliable method to evaluate model performance on smaller datasets?
• A) Simple train-test split
• B) Leave-one-out cross-validation
• C) Randomized testing
• D) Bootstrap aggregation
• Answer: B
33. Cross-validation is typically used to:
• A) Build the model
• B) Split data into train and test sets
• C) Test the model with multiple subsets
• D) Apply unsupervised learning
• Answer: C
34. What is one major drawback of cross-validation compared to train-test split?
• A) It uses less data.
• B) It takes more time and computational resources.
• C) It produces less accurate results.
• D) It can only be applied to classification problems.
• Answer: B
35. Which validation method involves using every data point for testing at least once?
• A) K-fold cross-validation
• B) Simple validation
• C) Random split
• D) Hold-out validation
• Answer: A
36. What is the primary advantage of using cross-validation?
• A) Requires fewer computational resources
• B) More accurate representation of model performance
• C) Faster training of the model
• D) Higher accuracy for large datasets
• Answer: B
37. What is the goal of hyperparameter tuning?
• A) Choosing the right model
• B) Optimizing algorithm performance
• C) Collecting more data
• D) Scaling the data
• Answer: B
Metrics of Model Quality
38. What does MAPE stand for?
• A) Mean Absolute Prediction Error
• B) Mean Absolute Percentage Error
• C) Mean Adjusted Prediction Error
• D) Minimum Absolute Prediction Estimate
• Answer: B
39. Which error metric penalizes large errors more than small errors?
• A) RMSE
• B) MSE
• C) Accuracy
• D) Precision
• Answer: B
40. Which error metric would you use to compare different regression models?
• A) Classification accuracy
• B) RMSE
• C) ROC-AUC score
• D) F1-Score
• Answer: B
41. Which of the following is most important when evaluating a model’s accuracy on unseen
data?
• A) Precision
• B) Validation data
• C) Recall
• D) Feature engineering
• Answer: B
42. Which metric is used to evaluate classification tasks in binary classification?
• A) Precision and recall
• B) RMSE
• C) MSE
• D) MAE
• Answer: A
43. Which evaluation metric balances precision and recall in a classification problem?
• A) F1-Score
• B) Accuracy
• C) RMSE
• D) Cross-validation
• Answer: A
44. What does MAE stand for in machine learning?
• A) Model Accuracy Estimate
• B) Mean Absolute Error
• C) Maximum Accuracy Estimate
• D) Minimum Adjustment Error
• Answer: B
45. Which error metric is less sensitive to outliers in regression problems?
• A) RMSE
• B) MAE
• C) MSE
• D) Cross-entropy
• Answer: B
46. Which evaluation metric is best for highly imbalanced classification datasets?
• A) Accuracy
• B) F1-Score
• C) RMSE
• D) MAE
• Answer: B
Practical Applications and AI Techniques
47. Which AI project involves recognizing human activities using smartphone data?
• A) Stock Prices Predictor
• B) Human Activity Recognition
• C) Student Results Predictor
• D) Sentiment Analysis
• Answer: B
48. Which of the following best describes anomaly detection?
• A) Grouping similar data points
• B) Identifying unusual patterns in data
• C) Predicting continuous outcomes
• D) Labeling data based on features
• Answer: B
49. In AI, what is a common use of clustering algorithms?
• A) Predicting future outcomes
• B) Grouping similar data points without labels
• C) Detecting anomalies
• D) Improving model accuracy
• Answer: B
1. Assertion (A): The Capstone Project integrates all learning from an academic program.
Reason (R): It focuses solely on individual work rather than collaboration.

 A) Both A and R are true, and R is the correct explanation of A.


 B) Both A and R are true, but R is not the correct explanation of A.
 C) A is true, but R is false.
 D) A is false, but R is true.
Answer: C

2. Assertion (A): Data gathering is a critical step in an AI project cycle.


Reason (R): Without proper data, the AI model cannot be trained effectively.

 A) Both A and R are true, and R is the correct explanation of A.


 B) Both A and R are true, but R is not the correct explanation of A.
 C) A is true, but R is false.
 D) A is false, but R is true.
Answer: A

3. Assertion (A): AI development is always suitable for every type of problem.


Reason (R): AI techniques are applied when a pattern exists in the data.

 A) Both A and R are true, and R is the correct explanation of A.


 B) Both A and R are true, but R is not the correct explanation of A.
 C) A is true, but R is false.
 D) A is false, but R is true.
Answer: D
4. Assertion (A): Data scientists use training sets to evaluate model performance.
Reason (R): Test sets are used to adjust models after training is complete.

 A) Both A and R are true, and R is the correct explanation of A.


 B) Both A and R are true, but R is not the correct explanation of A.
 C) A is true, but R is false.
 D) A is false, but R is true.
Answer: C

5. Assertion (A): Cross-validation ensures more reliable model evaluation than a train-test
split.
Reason (R): Cross-validation evaluates models using different data folds, making the process
more computationally efficient.

 A) Both A and R are true, and R is the correct explanation of A.


 B) Both A and R are true, but R is not the correct explanation of A.
 C) A is true, but R is false.
 D) A is false, but R is true.
Answer: C

6. Assertion (A): RMSE (Root Mean Squared Error) is a commonly used metric for
evaluating regression models.
Reason (R): RMSE penalizes larger errors more significantly than smaller errors, making it
sensitive to outliers.

 A) Both A and R are true, and R is the correct explanation of A.


 B) Both A and R are true, but R is not the correct explanation of A.
 C) A is true, but R is false.
 D) A is false, but R is true.
Answer: A

7. Assertion (A): The final stage in model evaluation is deployment.


Reason (R): Deployment occurs after thorough testing, validation, and refinement of the model.

 A) Both A and R are true, and R is the correct explanation of A.


 B) Both A and R are true, but R is not the correct explanation of A.
 C) A is true, but R is false.
 D) A is false, but R is true.
Answer: A

8. Assertion (A): MSE (Mean Squared Error) penalizes large errors more severely than
RMSE.
Reason (R): MSE focuses on the average squared difference between predicted and actual
values.

 A) Both A and R are true, and R is the correct explanation of A.


 B) Both A and R are true, but R is not the correct explanation of A.
 C) A is true, but R is false.
 D) A is false, but R is true.
Answer: B

9. . Assertion (A): Data preprocessing is not necessary if the dataset is large.


Reason (R): Large datasets inherently contain all necessary information and require no
adjustments.

 A) Both A and R are true, and R is the correct explanation of A.


 B) Both A and R are true, but R is not the correct explanation of A.
 C) A is true, but R is false.
 D) A is false, but R is true.
Answer: D

CASE STUDY BASED QUESTIONS:


1. A team of students is working on a stock price prediction model as part of their Capstone
Project. They are facing issues because the stock prices show a lot of volatility, and the
patterns are not clear. The team is unsure about how to proceed with building their AI
model.
Question: What should the team do first before applying any AI model?
 Answer: The team should first check if there is any identifiable pattern in the stock price
data. If no pattern exists, AI techniques should not be applied. However, if a pattern is
present, the team can proceed with data gathering and feature definition to build an AI
model for stock price prediction.
2. A student team is tasked with building a sentiment analyzer that classifies text as positive,
negative, or neutral. They collected a dataset of tweets but discovered that the data includes
irrelevant information like URLs and emojis.
Question: How should the team handle the data before building their AI model?
 Answer: The team should preprocess the data by removing irrelevant information like
URLs, emojis, and special characters. They should clean the dataset to focus on the text
content, which will help in improving the accuracy of the sentiment analysis model.

3. A team is working on a project to address the issue of crop yield prediction in agriculture.
They collected a large dataset but are unsure which AI model to use for this type of prediction.
Question: What type of model should the team consider for predicting crop yields, and why?
 Answer: The team should consider using a regression model because crop yield prediction
is a continuous variable problem. Regression models are suitable for predicting “how much”
or “how many” based on the input data.

4. A group is predicting brain weights based on head size using linear regression. After running
their model, they calculated an RMSE of 73.
Question: How should the team interpret this RMSE value, and what should be their next step?
 Answer: An RMSE of 73 indicates the average error in their predictions. A good model
should have an RMSE significantly lower than 180, so their model’s performance is
acceptable. However, if the team wants to improve the model, they could consider tweaking
the hyperparameters or refining the feature set.

5. A student team is developing a recommendation system for improving educational resources


in schools. They want to recommend learning materials based on students’ learning habits.
Question: Which AI technique should the team use to build the recommendation system, and
why?
 Answer: The team should use a recommendation model, which can suggest learning
materials based on patterns in student behavior. Recommendation systems work by
analyzing previous choices or habits and predicting what a user may prefer next.

6. A team is working on predicting movie ticket prices based on factors such as location, movie
type, and time of day. They have used a dataset but are not sure if their model is performing well.
Question: What metrics should the team use to evaluate their model’s performance?
 Answer: The team should use regression evaluation metrics such as Mean Squared Error
(MSE) and Root Mean Squared Error (RMSE). These metrics will give them insights into
how accurate their predictions are by comparing predicted values to actual values.

7. A team is tasked with using AI to predict patient recovery times based on medical history and
treatment data. However, the data has missing values.
Question: What steps should the team take to handle the missing data before applying their AI
model?
 Answer: The team should use data imputation techniques, such as replacing missing values
with the mean or median of the dataset, or using more advanced techniques like K-nearest
neighbors imputation. This will ensure that their model has complete data to work with.

8. A group of students is working on a project to classify human activities using smartphone


sensors (like accelerometers). The data includes features such as time, accelerometer readings,
and gyroscope readings.
Question: What type of AI model should the team use for this classification task?
 Answer: The team should use a classification model since their task involves categorizing
data into different activities (e.g., walking, running, sitting). Models like Decision Trees,
Random Forest, or Neural Networks would be suitable for this type of problem.

QUESTION-ANSWERS:
1. What is a Capstone Project in the context of AI education?
 Answer: A Capstone Project is the final project of an academic program where students
integrate all their learning and apply it to solve real-world problems. It involves teamwork,
discussion, research, and hands-on activities.

2. What are the key steps in the AI Project Cycle?


 Answer: The six key steps are: 1) Problem definition, 2) Data gathering, 3) Feature
definition, 4) AI model construction, 5) Evaluation and refinements, 6) Deployment.

3. Why is “problem definition” important in an AI project?


 Answer: Problem definition is crucial because it sets the direction for the entire AI project.
It involves understanding if there is a pattern in the data, which is fundamental for deciding
whether AI techniques should be applied.

4. What is Design Thinking in AI problem-solving?


 Answer: Design Thinking is a solution-based approach to problem-solving, which involves
five stages: Empathize, Define, Ideate, Prototype, and Test. It is useful in tackling complex,
ill-defined problems.

5. What is time series decomposition?


 Answer: Time series decomposition involves breaking down a time series into components
such as level, trend, seasonality, and noise to better understand the data for analysis and
forecasting.

6. What is the main advantage of problem decomposition in computational tasks?


 Answer: Problem decomposition breaks complex problems into smaller, manageable
pieces, making coding, debugging, and problem-solving more efficient.
7. Why is data gathering essential in an AI project?
 Answer: Data gathering is essential because AI models need relevant and accurate data to
train on. Without proper data, the model cannot produce valid predictions.

8. What is RMSE, and why is it important in AI models?


 Answer: RMSE (Root Mean Squared Error) measures the accuracy of an AI model by
calculating the square root of the average squared differences between predicted and actual
values. It is important because it penalizes larger errors more heavily.

9. What is cross-validation, and how does it improve model performance evaluation?


 Answer: Cross-validation is a technique where the dataset is divided into several folds, and
the model is trained and tested on different subsets of data. It provides a more reliable
measure of model performance compared to a simple train-test split.

10. What is the purpose of a recommendation model in AI?


 Answer: A recommendation model suggests items or actions to users based on patterns in
their behavior or preferences. It is commonly used in applications like e-commerce and
streaming services.

11. What are the key components of time series data?


 Answer: The key components of time series data are: 1) Level (average value), 2) Trend
(increasing or decreasing pattern), 3) Seasonality (repeating cycles), and 4) Noise (random
variation).

12. What is the goal of AI model construction?


 Answer: The goal of AI model construction is to build an algorithm that can learn from the
data and make accurate predictions or decisions based on the problem being addressed.

13. What is the difference between regression and classification in AI?


 Answer: Regression predicts continuous values (e.g., house prices), while classification
predicts discrete categories (e.g., spam or not spam).

14. How is data preprocessed for AI projects?


 Answer: Data preprocessing involves cleaning the data (removing irrelevant or incorrect
data), handling missing values, normalizing or standardizing data, and transforming it into a
format suitable for modeling.

15. What is MSE, and how is it different from RMSE?


 Answer: MSE (Mean Squared Error) calculates the average of the squared differences
between predicted and actual values. RMSE is the square root of MSE. RMSE is preferred
because it provides a more interpretable measure by being in the same units as the target
variable.

16. Why is model validation important in AI?


 Answer: Model validation ensures that the AI model generalizes well to new, unseen data
and helps prevent overfitting, where the model performs well on training data but poorly on
test data.

17. What is the purpose of using a training dataset in AI?


 Answer: The training dataset is used to teach the AI model, allowing it to learn patterns in
the data. The model uses this data to make predictions and adjust its parameters during
training.

18. What is an anomaly detection model used for in AI?


 Answer: Anomaly detection models identify data points that deviate significantly from the
norm, which can be useful in detecting fraud, equipment malfunctions, or unusual behavior.

19. Why is the business understanding stage crucial in data science projects?
 Answer: The business understanding stage is crucial because it defines the problem,
objectives, and success criteria from a business perspective, ensuring the solution aligns
with business goals.

20. What is the significance of model deployment in an AI project?


 Answer: Model deployment is the final stage in an AI project where the trained model is
integrated into a production environment to make real-time predictions or decisions.

21. How does the “empathize” stage in Design Thinking help in AI projects?
 Answer: In the empathize stage, developers focus on understanding the user’s needs and
challenges, which helps in designing AI solutions that are user-centric and address real-
world problems.

22. What is the purpose of using a prototype in the Design Thinking process?
 Answer: The prototype is a preliminary model used to test and explore ideas before final
implementation. It helps in identifying potential issues and refining solutions early in the
development process.

23. How is the concept of clustering applied in AI?


 Answer: Clustering is used to group data points based on their similarities without
predefined labels. It is commonly used in customer segmentation, image recognition, and
market analysis.
24. What is the primary objective of feature engineering in AI?
 Answer: The primary objective of feature engineering is to transform raw data into features
that better represent the underlying problem, thereby improving the performance of AI
models.

25. Why is it important to avoid overfitting in AI models?


 Answer: Overfitting occurs when a model learns the noise or random fluctuations in the
training data rather than the actual pattern. This leads to poor performance on new data,
making it essential to avoid overfitting for reliable predictions.

26. What is the role of gradient descent in machine learning?


 Answer: Gradient descent is an optimization algorithm used to minimize the loss function
in machine learning models. It iteratively adjusts the model parameters to reduce prediction
error.

QUESTION AND ANSWERS:


1. Define Capstone Project.
The final project of an academic program, typically integrating all of the learning from the
program is called the Capstone Project. A capstone project is a project where students must
research a topic independently to find a deep understanding of the subject matter. It allows the
student to integrate all their knowledge and demonstrate it through a comprehensive project.

2. Give examples of Capstone Projects.


i. Stock Prices Predictor
ii. Develop A Sentiment Analyzer
iii. Movie Ticket Price Predictor
iv. Student Results Predictor
v. Human Activity Recognition using Smartphone Data Set
vi. Classifying humans and animals in a photo

3. What are the different steps that the AI Project follows?


AI project follows the following six steps:
i. Problem definition i.e. Understanding the problem
ii. Data gathering
iii. Feature definition
iv. AI model construction
v. Evaluation & refinements
vi. Deployment

4. Which five types of questions should be answered during Understanding the AI


problem?
i. Which category? (Classification)
ii. How much or how many? (Regression)
iii. Which group? (Clustering)
iv. Is this unusual? (Anomaly Detection)
v. Which option should be taken? (Recommendation)

5. Define Design Thinking.


Design Thinking is a design methodology that provides a solution-based approach to
solving problems. It’s extremely useful in tackling complex problems that are ill-defined or
unknown.

6. What are the five stages of Design Thinking?


The five stages of Design Thinking are as follows: Empathize, Define, Ideate, Prototype, and
Test.

7.What are the steps of Problem Decomposition?


Or
How to down the problem into smaller units before coding?
The steps of Problem Decomposition are given below:
i. Understand the problem and then restate the problem in your own words
• Know what the desired inputs and outputs are
• Ask questions for clarification
ii. Break the problem down into a few large pieces. Write these down, either on paper
or as comments in a file.
iii. Break complicated pieces down into smaller pieces. Keep doing this until all of the
pieces are small.
iv. Code one small piece at a time.
• Think about how to implement it
• Write the code/query
• Test it… on its own.
• Fix problems, if any

8. Imagine that you want to create your first app. This is a complex problem. How would
you decompose the task of creating an app?
To decompose this task, you would need to know the answer to a series of smaller
problems:
•what kind of app you want to create?
• what will your app will look like
who is the target audience for your app?
• what will the graphics will look like?
• what audio will you include?
• what software will you use to build your app?
• how will the user navigate your app?
• how will you test your app?

9. Explain time series decomposition.


Time series decomposition involves thinking of a series as a combination of level, trend,
seasonality, and noise components. Decomposition provides a useful abstract model for
thinking about time series generally and for better understanding problems during time series
analysis and forecasting.

10. What are the components of Time series decomposition?


• Level: The average value in the series.
• Trend: The increasing or decreasing value in the series.
• Seasonality: The repeating short-term cycle in the series.
• Noise: The random variation in the series.

11. What are different Analytic approaches?


i. Descriptive
ii. Diagnostic
iii. Predictive
iv. Prescriptive
v. Classification

12. What type of Questions can be asked for the following approaches:
i. Descriptive
ii. Diagnostic
iii. Predictive
iv. Prescriptive
v. Classification
Predictive
The question is to determine the probabilities of an action
Descriptive
The question is to show relationships
Classification
The question requires a yes/ no answer
Diagnostic
The question of why an occurrence or anomaly occurred within your data.
Prescriptive
The question is how to solve a problem
13. Explain Data Gathering Phase.
OR
How Data is collected for AI Projects?
• Identifying the necessary data content, formats and sources for initial data
collection.
• The data requirements are revised and decisions are made as to whether or not the
collection requires more or less data.
• Techniques such as descriptive statistics and visualization can be applied to the
data set, to assess the content, quality, and initial insights about the data.
• Gaps in data will be identified and plans to either fill or make substitutions will have to be
made.

14. Explain Data Modelling.


OR
Differentiate between Predictive and Descriptive model.
Data Modeling focuses on developing models that are either descriptive or predictive.
• An example of a descriptive model might examine things like: if a person did this,
then they're
likely to prefer that.
• A predictive model tries to yield yes/no, or stop/go type outcomes. These models
are based on the analytic approach that was taken, either statistically driven or
machine learning driven.

15. Define training set.


A training set is a set of historical data in which the outcomes are already known. The
training set acts like a gauge to determine if the model needs to be calibrated.

16. What are the necessary things to be done for the success of Data modelling stage?
•First, understand the question at hand.
• Second, select an analytic approach or method to solve the problem.
• Third, obtain, understand, prepare, and model the data.
17. Differentiate between Cross-Validation and Train-Test Split
18. Explain Train-Test Split Evaluation
The train-test split is a technique for evaluating the performance of a machine learning
algorithm.
• It can be used for classification or regression problems
• It can be used for any supervised learning algorithm.
• The procedure involves taking a dataset and dividing it into two subsets.
• The first subset is used to fit the model and is referred to as the training dataset.
• The second subset is not used to train the model; instead, the input element of the dataset is
provided to the model, then predictions are made and compared to the expected values. This
second dataset is referred to as the test dataset.
19. How to Configure the Train-Test Split?
The procedure has one main configuration parameter, which is the size of the train and test sets.
This is most commonly expressed as a percentage between 0 and 1 for either the train or test
datasets. For example, a training set with the size of 0.67 (67 percent) means that the remainder
percentage 0.33 (33 percent) is assigned to the test set.

20. How to choose a split percentage in Train-Test Split?


• Computational cost in training the model.
• Computational cost in evaluating the model.
• Training set representativeness.
• Test set representativeness.

21. What are the prerequisites for Train and Test Data?
We will need the following Python Libraries for this tutorial: • Pandas • Sklearn
We can install these with pip 1. pip install pandas 2. pip install sklearn
22. Explain Cross validation
Cross validation is a technique used in machine learning to evaluate the performance of
a model on unseen data. It involves dividing the available data into multiple folds or
subsets, using one of these folds as a validation set, and training the model on the
remaining folds. This process is repeated multiple times, each time using a different fold
as the validation set. Finally, the results from each validation step are averaged to produce
a more robust estimate of the model’s performance.

23. Explain loss functions.


All the algorithms in machine learning rely on minimizing or maximizing a function,
which we call “objective function”. The group of functions that are minimized are called
“loss functions”. A loss function is a measure of how good a prediction model does in
terms of being able to predict the expected outcome.
24. What are the categories of Loss Functions?
Loss functions can be broadly categorized into 2 types: Classification and Regression Loss.
25. Differentiate Classification Loss and Regression Loss.
Regression functions predict a quantity, and classification functions predict a label.

26. Study the following graph and answer the given questions:

i. What does red dots represent?


Red dots are the actual values
ii. What does blue line represent?
Blue line is the set of predicted values drawn by our model.
iii. What does line X represents?
This line represents the error
iv. How can we calculate RMSE using graph?
Taking mean of all those distances between the actual value and the predicted line
and squaring them and finally taking the root will give us RMSE of our model.
v. What is a good model based on RMSE?
A good model should have an RMSE value less than 180.
27. Explain MSE.
Mean Square Error the most commonly used regression loss function. MSE is the sum of squared
difference between our target variable and predicted values.

28. When should we use mean squared error?


Use MSE when doing regression, believing that your target, conditioned on the input, is
normally distributed, and want large errors to be significantly (quadratically) more
penalized than small ones.

29. Can MSE be a negative value? Give reasons.


The MSE value cannot be negative. The difference between projected and actual values
are always squared. As a result, all outcomes are positive.

30. Explain RMSE


In machine Learning when we want to look at the accuracy of our model we take the root
mean square of the error that has occurred between the test values and the predicted values
mathematically.

You might also like