Capstone Project Question Answers
Capstone Project Question Answers
48. What’s the benefit of cross validation over train test split?
A) Faster
B) More accurate
C) Less data needed
D) Simpler code
Answer: B
3. How do you break down a complex problem into smaller parts? Give an example.
Answer: To break down a problem:
1. Understand it and restate it simply.
2. Split it into big pieces.
3. Break those into smaller, manageable tasks.
4. Code and test each task.
Example: To create an app, decide its purpose (e.g., a weather app), design its
look, pick features (like temperature display), code each part, and test it separately
before combining.
4. What’s the difference between a training set and a test set? Why are both
important?
Answer: A training set is data used to build and teach the model, while a test set is
separate data used to check how well the model works. Both are important because
the training set helps the model learn patterns, and the test set shows if it can predict
accurately on new, unseen data, ensuring it’s not just memorizing.
5. Describe the train test split method and how it’s done in Python.
Answer: Train test split divides data into two parts: one to train the model (e.g.,
80%) and one to test it (e.g., 20%). In Python, using
`sklearn.model_selection.train_test_split`, you load data (e.g., with pandas), split it
into features (X) and labels (y), then use `train_test_split(X, y, test_size=0.2)` to get
`X_train`, `X_test`, `y_train`, and `y_test`.
6. What is cross validation, and why is it better than a simple traintest split?
Answer: Cross validation tests a model by splitting data into multiple folds (e.g., 5),
using each fold as a test set while training on the rest, then averaging the results. It’s
better than traintest split because it uses all data for testing, reducing randomness
and giving a more reliable measure of model quality, especially for small datasets.
9. What are the components of time series data? Give an example from the airline
passengers dataset.
Answer: Time series data has four components:
1. Level: Average value.
2. Trend: Longterm increase/decrease.
3. Seasonality: Repeating cycles.
4. Noise: Random changes.
Example: In the airline passengers dataset (19491960), there’s a rising trend
(more passengers over time), seasonality (peaks each year), and noise (random
fluctuations).
10. Discuss a Capstone Project idea like the Sentiment Analyzer and how you’d
approach it.
Answer: A Sentiment Analyzer predicts if text (e.g., reviews) is positive, negative,
or neutral. Approach:
1. Problem: Define it as classifying text emotions.
2. Data: Gather reviews or tweets.
3. Features: Use words or phrases as inputs.
4. Model: Build a classification model (e.g., with sklearn).
5. Test: Split data, train, and check accuracy with metrics like MSE.
6. Deploy: Use it to analyze new text. This integrates AI skills to solve a practical
problem.