Unit 1- Capstone Project-Answer Key
Unit 1- Capstone Project-Answer Key
5. Assertion (A): Cross-validation ensures more reliable model evaluation than a train-test
split.
Reason (R): Cross-validation evaluates models using different data folds, making the process
more computationally efficient.
6. Assertion (A): RMSE (Root Mean Squared Error) is a commonly used metric for
evaluating regression models.
Reason (R): RMSE penalizes larger errors more significantly than smaller errors, making it
sensitive to outliers.
8. Assertion (A): MSE (Mean Squared Error) penalizes large errors more severely than
RMSE.
Reason (R): MSE focuses on the average squared difference between predicted and actual
values.
3. A team is working on a project to address the issue of crop yield prediction in agriculture.
They collected a large dataset but are unsure which AI model to use for this type of prediction.
Question: What type of model should the team consider for predicting crop yields, and why?
Answer: The team should consider using a regression model because crop yield prediction
is a continuous variable problem. Regression models are suitable for predicting “how much”
or “how many” based on the input data.
4. A group is predicting brain weights based on head size using linear regression. After running
their model, they calculated an RMSE of 73.
Question: How should the team interpret this RMSE value, and what should be their next step?
Answer: An RMSE of 73 indicates the average error in their predictions. A good model
should have an RMSE significantly lower than 180, so their model’s performance is
acceptable. However, if the team wants to improve the model, they could consider tweaking
the hyperparameters or refining the feature set.
6. A team is working on predicting movie ticket prices based on factors such as location, movie
type, and time of day. They have used a dataset but are not sure if their model is performing well.
Question: What metrics should the team use to evaluate their model’s performance?
Answer: The team should use regression evaluation metrics such as Mean Squared Error
(MSE) and Root Mean Squared Error (RMSE). These metrics will give them insights into
how accurate their predictions are by comparing predicted values to actual values.
7. A team is tasked with using AI to predict patient recovery times based on medical history and
treatment data. However, the data has missing values.
Question: What steps should the team take to handle the missing data before applying their AI
model?
Answer: The team should use data imputation techniques, such as replacing missing values
with the mean or median of the dataset, or using more advanced techniques like K-nearest
neighbors imputation. This will ensure that their model has complete data to work with.
QUESTION-ANSWERS:
1. What is a Capstone Project in the context of AI education?
Answer: A Capstone Project is the final project of an academic program where students
integrate all their learning and apply it to solve real-world problems. It involves teamwork,
discussion, research, and hands-on activities.
19. Why is the business understanding stage crucial in data science projects?
Answer: The business understanding stage is crucial because it defines the problem,
objectives, and success criteria from a business perspective, ensuring the solution aligns
with business goals.
21. How does the “empathize” stage in Design Thinking help in AI projects?
Answer: In the empathize stage, developers focus on understanding the user’s needs and
challenges, which helps in designing AI solutions that are user-centric and address real-
world problems.
22. What is the purpose of using a prototype in the Design Thinking process?
Answer: The prototype is a preliminary model used to test and explore ideas before final
implementation. It helps in identifying potential issues and refining solutions early in the
development process.
8. Imagine that you want to create your first app. This is a complex problem. How would
you decompose the task of creating an app?
To decompose this task, you would need to know the answer to a series of smaller
problems:
•what kind of app you want to create?
• what will your app will look like
who is the target audience for your app?
• what will the graphics will look like?
• what audio will you include?
• what software will you use to build your app?
• how will the user navigate your app?
• how will you test your app?
12. What type of Questions can be asked for the following approaches:
i. Descriptive
ii. Diagnostic
iii. Predictive
iv. Prescriptive
v. Classification
Predictive
The question is to determine the probabilities of an action
Descriptive
The question is to show relationships
Classification
The question requires a yes/ no answer
Diagnostic
The question of why an occurrence or anomaly occurred within your data.
Prescriptive
The question is how to solve a problem
13. Explain Data Gathering Phase.
OR
How Data is collected for AI Projects?
• Identifying the necessary data content, formats and sources for initial data
collection.
• The data requirements are revised and decisions are made as to whether or not the
collection requires more or less data.
• Techniques such as descriptive statistics and visualization can be applied to the
data set, to assess the content, quality, and initial insights about the data.
• Gaps in data will be identified and plans to either fill or make substitutions will have to be
made.
16. What are the necessary things to be done for the success of Data modelling stage?
•First, understand the question at hand.
• Second, select an analytic approach or method to solve the problem.
• Third, obtain, understand, prepare, and model the data.
17. Differentiate between Cross-Validation and Train-Test Split
18. Explain Train-Test Split Evaluation
The train-test split is a technique for evaluating the performance of a machine learning
algorithm.
• It can be used for classification or regression problems
• It can be used for any supervised learning algorithm.
• The procedure involves taking a dataset and dividing it into two subsets.
• The first subset is used to fit the model and is referred to as the training dataset.
• The second subset is not used to train the model; instead, the input element of the dataset is
provided to the model, then predictions are made and compared to the expected values. This
second dataset is referred to as the test dataset.
19. How to Configure the Train-Test Split?
The procedure has one main configuration parameter, which is the size of the train and test sets.
This is most commonly expressed as a percentage between 0 and 1 for either the train or test
datasets. For example, a training set with the size of 0.67 (67 percent) means that the remainder
percentage 0.33 (33 percent) is assigned to the test set.
21. What are the prerequisites for Train and Test Data?
We will need the following Python Libraries for this tutorial: • Pandas • Sklearn
We can install these with pip 1. pip install pandas 2. pip install sklearn
22. Explain Cross validation
Cross validation is a technique used in machine learning to evaluate the performance of
a model on unseen data. It involves dividing the available data into multiple folds or
subsets, using one of these folds as a validation set, and training the model on the
remaining folds. This process is repeated multiple times, each time using a different fold
as the validation set. Finally, the results from each validation step are averaged to produce
a more robust estimate of the model’s performance.
26. Study the following graph and answer the given questions: