0% found this document useful (0 votes)

209 views8 pages

AI Capstone Project - Notes-Part2

1. The document describes the stages of the data science methodology including business understanding, data collection, data preparation, model training, evaluation and deployment. 2. It explains the train-test split evaluation technique where the dataset is divided into training and test sets. Common split percentages are 80-20, 67-33 and 50-50. 3. Cross-validation is described as a resampling technique for evaluation where the dataset is divided into k folds and each fold is used once as the validation set. It is more reliable than train-test split but takes longer.

Uploaded by

minha.fathima737373

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

209 views8 pages

AI Capstone Project - Notes-Part2

Uploaded by

minha.fathima737373

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

RAJAGIRI PUBLIC SCHOOL

DOHA, QATAR

Grade-12

843- Artificial Intelligence

Ch:1 Capstone Project -Part2

1. Draw the diagram of Analytic Approach and explain each stage?

Reference : Data Science Methodology 101. How can a Data Scientist organize his… | by Nunzio Logallo
| Towards Data Science

1.Business understanding
• What problem you are trying to solve?
• Every project, whatever its size, begins with the understanding of the
business.
• Business partners who need the analytics solution play a critical role
in this phase by defining the problem, the project objectives, and the
solution requirements from a business perspective.
2. Analytic approach
• How can you use the data to answer the question?
• The problem must be expressed in the context of statistical learning to
identify the appropriate machine learning techniques to achieve the
desired result.
3.Data Requirement
What data do you need to answer the question?
• Analytic approach determines the data requirements - specific
content, formats, and data representations, based on domain
knowledge.
4.Data collection
• Where is the data coming from (identify all sources) and how
will you get it?
• The Data Scientist identifies and collects data
resources (structured, unstructured and semi-structured) that
are relevant to the problem area.
• If the data scientist finds gaps in the data collection, he may need
to review the data requirements and collect more data.
5.Data understanding
• Is the data that you collected representative of the problem to be
solved?
• Descriptive statistics and visualization techniques can help a data
scientist understand the content of the data, assess its quality, and
obtain initial information about the data.
6. Data preparation
• What additional work is required to manipulate and work with the
data?
• The Data preparation step includes all the activities used to create
the data set used during the modeling phase.
• This includes cleansing data, combining data from multiple
sources, and transforming data into more useful variables.
• In addition, feature engineering and text analysis can be used to
derive new structured variables to enrich all predictors and improve
model accuracy.
7.Model Training
• In What way can the data be visualized to get the answer that is
required?
• From the first version of the prepared data set, Data scientists use a
Training dataset (historical data in which the desired result is
known) to develop predictive or descriptive models.
• The modeling process is very iterative.
8.Model Evaluation
• Does the model used really answer the initial question or does it
need to be adjusted?
• The Data Scientist evaluates the quality of the model and verifies that
the business problem is handled in a complete and adequate manner.
9.Deployment
• Can you put the model into practice?
• Once a satisfactory model has been developed and approved by
commercial sponsors, it will be implemented in the production
environment or in a comparable test environment.
10.Feedback
• Can you get constructive feedback into answering the question?
• By collecting the results of the implemented model, the
organization receives feedback on the performance of the model
and its impact on the implementation environment.

2. Explain Train-Test Split Evaluation?

• The train-test split is a technique for evaluating the performance of a

machine learning algorithm.

• It can be used for classification or regression problems and can be used

for any supervised learning algorithm.

• The procedure involves taking a dataset and dividing it into two subsets.

• The first subset is used to fit the model and is referred to as the training
dataset.

• The second subset is not used to train the model; but to evaluate the fit
machine learning model. It is referred to as testing dataset.
3. How will you configure train test split procedure?

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=0.67)

• The procedure has one main configuration parameter, which is the size of
the train and test sets.
• This is most commonly expressed as a percentage between 0 and 1 for
either the train or test datasets.
• For example, a training set with the size of 0.67 (67 percent) means that
the remainder percentage 0.33 (33 percent) is assigned to the test set.
• There is no optimal split percentage.
Nevertheless, common split percentages include:
• Train: 80%, Test: 20%
• Train: 67%, Test: 33%
• Train: 50%, Test: 50%
4. What are the considerations to choose split percentage in train-test-split
procedure?
• Computational cost in training the model.
• Computational cost in evaluating the model.
• Training set representativeness.
• Test set representativeness.
5. Explain cross validation?
• It is a resampling technique for evaluating machine learning models on a
sample of data.
• The process includes a parameter k, which specifies the number of groups
in to which a given data sample should be divided.
• The process is referred as K- fold cross validation.
• More reliable, though it takes longer to run.
• For example, we could have 5 folds or experiments(k=5). We divide the data
into 5 pieces, each being 20% of the full dataset.
• During first iteration (Experiment 1) the first fold (piece) is used as
holdout set(test data/validation data) and everything else as training
data.

• During second iteration(Experiment 2) the second fold (piece) is

used as holdout set(test data/validation data) and everything else
as training data.

• We repeat this process, using every fold once as the holdout. Putting
this together, 100% of the data is used as a holdout at some point.
6. Explain difference between cross validation and train test split?
• On small datasets, the extra computational burden of running
cross-validation isn't a big deal. So, if your dataset is smaller, you
should run cross-validation
• If your dataset is larger, you can use train-test-split method.

7. What are hyper parameters?

Hyper parameters are parameters whose values govern the learning
process. They also determine the value of model parameters learned by a
learning algorithm.
Eg: The ratio of train-test-split, Number of hidden layers in neural
network, Number of clusters in clustering task.
8. How are MSE and RMSE related? What is their range? Are they sensitive
to outliers?
MSE: One of the most used regression loss functions is MSE. We
determine the error in Mean-Squared-Error, also known as L2 loss, by
squaring the difference between the predicted and actual values and
average it throughout the dataset.

• Squaring the error gives outliers more weight, resulting in a smooth

gradient for minor errors.
• Because the errors are squared, MSE can never be negative. The
error value varies from 0 to infinity.
• The MSE grows exponentially as the error grows. An MSE value close
to zero indicates a good model.
• It is especially useful in removing outliers with substantial errors
from the model by giving them additional weight.

RMSE: The square root of MSE is used to calculate RMSE. The Root Mean
Square Deviation (RMSE) is another name for the Root Mean Square Error.
• A RMSE value of 0 implies that the model is perfectly fitted. The
model and its predictions perform better when the RMSE is low. A
greater RMSE indicates a substantial discrepancy between the
residual and the ground truth.
• The RMSE of a good model should be less than 180
9. What is loss function? What are the different categories of loss function?
• All the algorithms in machine learning rely on minimizing or
maximizing a function, which we call “objective function”.
• The group of functions that are minimized are called “loss
functions”.
• A loss function is a measure of how good a prediction model does in
terms of being able to predict the expected outcome.
• Loss functions can be broadly categorized into 2 types: Classification
and Regression Loss.
Regression functions predict a quantity, and classification functions
predict a label.

10. Consider the following data:

x y

40 42
42 45

44 47
46 44

48 50
50 48
52 49
54 50
58 55
60 58

Regression line equation: Y=0.681x + 15.142. Calculate MSE and RMSE from
the above information

AI Capstone Project - Notes Part1 1
No ratings yet
AI Capstone Project - Notes Part1 1
4 pages
Ai Capstone Project Logbook
No ratings yet
Ai Capstone Project Logbook
33 pages
Apache Spark vs Dask: Big Data Tools
No ratings yet
Apache Spark vs Dask: Big Data Tools
55 pages
Data Science Interview Q - A
No ratings yet
Data Science Interview Q - A
165 pages
Scribd DDD
No ratings yet
Scribd DDD
175 pages
7 Libraries That Help in Time-Series problems-AI Data Science
No ratings yet
7 Libraries That Help in Time-Series problems-AI Data Science
20 pages
NoSQL for Data Engineers
No ratings yet
NoSQL for Data Engineers
144 pages
Hive L1
No ratings yet
Hive L1
134 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
139 pages
Class 12 Computer Science Sample Paper Set 9
No ratings yet
Class 12 Computer Science Sample Paper Set 9
13 pages
Chapter 3
No ratings yet
Chapter 3
24 pages
Grade 10 Unit 7 - Evaluation
No ratings yet
Grade 10 Unit 7 - Evaluation
50 pages
19 - PPT - Confusion Matrix, Precision, Recall
No ratings yet
19 - PPT - Confusion Matrix, Precision, Recall
23 pages
Chapter 5
No ratings yet
Chapter 5
44 pages
Exam Prep: CBSE & Entrance Tips
No ratings yet
Exam Prep: CBSE & Entrance Tips
13 pages
Generative AI and LLMs
No ratings yet
Generative AI and LLMs
226 pages
You Exec - Student Loan Tracker Free
No ratings yet
You Exec - Student Loan Tracker Free
129 pages
You Exec - Family Budget Planner Free
No ratings yet
You Exec - Family Budget Planner Free
96 pages
AI Practical File Part 1 2024-25
No ratings yet
AI Practical File Part 1 2024-25
6 pages
Page 1 of 10
No ratings yet
Page 1 of 10
10 pages
A Student Guide
No ratings yet
A Student Guide
420 pages
Class 12 Computer Science Sample Paper Set 8
No ratings yet
Class 12 Computer Science Sample Paper Set 8
11 pages
You Exec - Client Time Tracker Free
No ratings yet
You Exec - Client Time Tracker Free
184 pages
On Introdution To NoSQL
No ratings yet
On Introdution To NoSQL
56 pages
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
No ratings yet
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
112 pages
Non Tech Placement Preparation Guide by Vikas Upmanyu
No ratings yet
Non Tech Placement Preparation Guide by Vikas Upmanyu
7 pages
Feature Scaling Techniques: Machine Learning
No ratings yet
Feature Scaling Techniques: Machine Learning
27 pages
Over 843 - Ai - MS 1 2024-25
No ratings yet
Over 843 - Ai - MS 1 2024-25
5 pages
Machine Learning Operations
No ratings yet
Machine Learning Operations
97 pages
Short Qns CNN
No ratings yet
Short Qns CNN
11 pages
Class 12 Complete Series and DataFrame Last Year Question
No ratings yet
Class 12 Complete Series and DataFrame Last Year Question
3 pages
Reinforcement Learning Notes
No ratings yet
Reinforcement Learning Notes
167 pages
CSSC - XII Artificial Intelligence (843) - MARKING SCHEME
No ratings yet
CSSC - XII Artificial Intelligence (843) - MARKING SCHEME
8 pages
RT 12 CS RevisionPapers2025
No ratings yet
RT 12 CS RevisionPapers2025
121 pages
Matplotlib Worksheet
No ratings yet
Matplotlib Worksheet
11 pages
Class 12 Computer Science Sample Paper Set 3
No ratings yet
Class 12 Computer Science Sample Paper Set 3
14 pages
Class 12 Sample Paper
No ratings yet
Class 12 Sample Paper
13 pages
Computer Science Study Material Class 12
No ratings yet
Computer Science Study Material Class 12
105 pages
Class Xi Python
100% (2)
Class Xi Python
138 pages
Fluentzy Book-14 Fluency in Functional English - Part II (Kev Nair) (Z-Library)
No ratings yet
Fluentzy Book-14 Fluency in Functional English - Part II (Kev Nair) (Z-Library)
101 pages
Seaborn - Plots - Jupyter Notebook
No ratings yet
Seaborn - Plots - Jupyter Notebook
36 pages
c11 Ip SSM Final 2025 - 26
No ratings yet
c11 Ip SSM Final 2025 - 26
146 pages
Class XII IP DataFrame Guide
No ratings yet
Class XII IP DataFrame Guide
2 pages
DSGO 2019 Official Notes
No ratings yet
DSGO 2019 Official Notes
75 pages
Data Science Statistics Notes
No ratings yet
Data Science Statistics Notes
8 pages
Informatics Practices Record
No ratings yet
Informatics Practices Record
92 pages
RNN Vanishing Gradients LSTM Compressed
No ratings yet
RNN Vanishing Gradients LSTM Compressed
53 pages
Unit-V Graphs and Trees Megha Patil
No ratings yet
Unit-V Graphs and Trees Megha Patil
36 pages
Ai Practical File
No ratings yet
Ai Practical File
10 pages
Practical File Question 28.09.2022
No ratings yet
Practical File Question 28.09.2022
15 pages
File Hand
No ratings yet
File Hand
4 pages
Unit 3 Making Machines See
No ratings yet
Unit 3 Making Machines See
27 pages
Capstone Project
No ratings yet
Capstone Project
9 pages
Artificial Intelligence Grade 12 Notes-Capstone Project CBSE Skill Education-Artificial Intelligence
92% (12)
Artificial Intelligence Grade 12 Notes-Capstone Project CBSE Skill Education-Artificial Intelligence
10 pages
02jul2024 StaticMedia AI UNIT 2-CAPSTONE PROJECT NOTES 6759955093464609405
No ratings yet
02jul2024 StaticMedia AI UNIT 2-CAPSTONE PROJECT NOTES 6759955093464609405
6 pages
Capstone Project
No ratings yet
Capstone Project
6 pages
U-1 Capstone Q&A
No ratings yet
U-1 Capstone Q&A
10 pages
Artificial
No ratings yet
Artificial
5 pages
Notes XII AI
No ratings yet
Notes XII AI
11 pages
BCS-040-J14 - Compressed PDF
No ratings yet
BCS-040-J14 - Compressed PDF
3 pages
Asmerom Tekle .
No ratings yet
Asmerom Tekle .
296 pages
5 Forecasting PDF
No ratings yet
5 Forecasting PDF
24 pages
Diving Into ROOT
No ratings yet
Diving Into ROOT
67 pages
Fundamentals of Qualitative Research 1
No ratings yet
Fundamentals of Qualitative Research 1
7 pages
Updated-Resume Farhana Mannan
No ratings yet
Updated-Resume Farhana Mannan
2 pages
How To Write A Concept Paper by Dr. Lango
No ratings yet
How To Write A Concept Paper by Dr. Lango
10 pages
Factorial Design Basics & Analysis
No ratings yet
Factorial Design Basics & Analysis
28 pages
Business Statistics
No ratings yet
Business Statistics
4 pages
ML - Training - Evaluation For Machine Learning Course
No ratings yet
ML - Training - Evaluation For Machine Learning Course
31 pages
Research Imp. Ques. Hinglish
No ratings yet
Research Imp. Ques. Hinglish
29 pages
Rishu Ranjan Raj
No ratings yet
Rishu Ranjan Raj
1 page
Significance Test on Ad Impact
No ratings yet
Significance Test on Ad Impact
4 pages
Azure Data Analyst Learning Pathway
No ratings yet
Azure Data Analyst Learning Pathway
1 page
Gender Regime Connell
No ratings yet
Gender Regime Connell
21 pages
Previous Year Question BSS-323
No ratings yet
Previous Year Question BSS-323
40 pages
Mapeh 4 Item Analysis Quarter 2
No ratings yet
Mapeh 4 Item Analysis Quarter 2
59 pages
Least Square Method Using Excel
No ratings yet
Least Square Method Using Excel
21 pages
Primera Practica de Economia Internacional II
No ratings yet
Primera Practica de Economia Internacional II
9 pages
Demand Forecasting Using Qualitative
No ratings yet
Demand Forecasting Using Qualitative
2 pages
Comprehensive Guide to Research Methods
100% (1)
Comprehensive Guide to Research Methods
104 pages
SSAS Cube Drillthrough Guide
No ratings yet
SSAS Cube Drillthrough Guide
23 pages
LBOLYTC Quiz No. 1 - T3 2023
No ratings yet
LBOLYTC Quiz No. 1 - T3 2023
13 pages
Chapter-4: The Business Research Process: An Overview
No ratings yet
Chapter-4: The Business Research Process: An Overview
34 pages
Correlational Analysis
No ratings yet
Correlational Analysis
5 pages
Ankit CV DA 1
No ratings yet
Ankit CV DA 1
1 page
Expanded Internship Report Data Science
No ratings yet
Expanded Internship Report Data Science
9 pages
Research Presentation: Written and Oral
No ratings yet
Research Presentation: Written and Oral
37 pages
Bsbfim501 - Manage Budgets and Financial Plans
100% (4)
Bsbfim501 - Manage Budgets and Financial Plans
16 pages
Oracle Analytics Cloud - A Guide To Maximizing Business Intelligence
No ratings yet
Oracle Analytics Cloud - A Guide To Maximizing Business Intelligence
10 pages

AI Capstone Project - Notes-Part2

Uploaded by

AI Capstone Project - Notes-Part2

Uploaded by

RAJAGIRI PUBLIC SCHOOL

843- Artificial Intelligence

Ch:1 Capstone Project -Part2

1. Draw the diagram of Analytic Approach and explain each stage?

2. Explain Train-Test Split Evaluation?

• The train-test split is a technique for evaluating the performance of a

• It can be used for classification or regression problems and can be used

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=0.67)

• During second iteration(Experiment 2) the second fold (piece) is

7. What are hyper parameters?

• Squaring the error gives outliers more weight, resulting in a smooth

10. Consider the following data:

You might also like