Data Science Exercises
Data Science Exercises
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
OBJECTIVE:
1. Create a repository CDSA-CMDI.
2. Save your process.
3. Name the process CDSA-EX01PROCESS
ACTIVITY OUTPUT: Write the step-by-step procedure on how to create repository and save your process.
Include screenshots.
1. There are more male customers than female. True or false? Show evidence.
FEMALE
2. There are more customers aged 50 and above. True or false? Show evidence. TRUE
4. The maximum cost of goods sold (COGS) is PHP 39,134.69. True or false? Show evidence.
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
By: AGE
Result:
Result
Result
Result:
Questions:
1. What are the ages of customers from Mindanao?
Result:
AYO-AYO BANK OF ASIA is studying its business performance in the Philippines. A group of data scientists
now studies the data sets given to them by the corporation. Open the data set and provide the company
with an insight. What do you see and what can you share as a data scientist?
FILE: CDSA_EX03.xlsx
FILE: CDSA_EX03B.xlsx
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
RESULT:
OBJECTIVE:
1. Remove duplicate data from the file.
RESULT:
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
RESULT:
OBJECTIVE: USE THE OPERATOR FILTER EXAMPLES TO LOOK FOR MISSING DATA SETS.
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
RESULTS:
RESULT:
A. Using CDSA_EX06.xlsx
Compute for CSAT if
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
RESULT:
A. Using CDSA_EX06.xlsx
RESULT:
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
DECISION TREES
Example:
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
STUDY RESULTS:
Level 2:
Level 3:
Level 4:
EXERCISE #16: DETERMINE FACTORS THAT AFFECTS HIGH SUGAR LEVEL (EXAMPLE 2)
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
*** NOTE: Use the first 30 CLEAN dataset as baseline for UCL and LCL
RESULT
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
LINEAR REGRESSION
USE LINEAR REGRESSION to determine relationship between a response and factor, such as SALES AND
MARKETING.
EXAMPLE: For the dataset CDSA_EX18.XLSX, can you determine whether marketing is effective in
increasing sales. Also predict sales, if for example, I spend PHP 1,000,000 for marketing– how much is
the predicted sales.
SHOW RESULTS:
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
POLYNOMIAL REGRESSION
USE POLYNOMIAL REGRESSION to determine relationship between a response and factor, such as SALES
AND MARKETING.
EXAMPLE: For the dataset LINEARREGRESSION01.XLSX, determine the relationship between YEARS OF
SERVICE AND SALARY
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
SHOW RESULTS:
EXAMPLE DATA:
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
Use general linear modelling to determine the relationship of factors to response MSAT (Marriage
Satisfaction).
DATA: CDSA_EX12.xlsx
RESULTS:
Use general linear modelling to determine the relationship of factors to response MSAT (Marriage
Satisfaction).
DATA: CDSA_EX12.xlsx
USE RAPIDMINER
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
RESULTS:
Analysis models are measured via ABSOLUTE ERROR, CORRELATION AND SQUARED
CORRELATION including all other possible measures.
EXERCISE #22: MEASURE PERFORMANCE OF THE GENERAL LINEAR MODEL (USING PEFORMANCE)
Use general linear modelling to determine the relationship of factors to response MSAT (Marriage
Satisfaction). Then determine the model performance using OPERATOR: PERFORMANCE (REGRESSION)
DATA: CDSA_EX12.xlsx
USE RAPIDMINER
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
SHOW RESULTS:
Use NEURAL NETWORK to determine the relationship of factors to response MSAT (Marriage
Satisfaction). Then determine the model performance using OPERATOR: PERFORMANCE (REGRESSION)
DATA: CDSA_EX12.xlsx
USE RAPIDMINER
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
RESULTS:
Use DEEP LEARNING to determine the relationship of factors to response MSAT (Marriage Satisfaction).
Then determine the model performance using OPERATOR: PERFORMANCE (REGRESSION)
DATA: CDSA_EX12.xlsx
USE RAPIDMINER
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
Use RANDOM FOREST to determine the relationship of factors to response MSAT (Marriage Satisfaction).
Then determine the model performance using OPERATOR: PERFORMANCE (REGRESSION)
DATA: CDSA_EX12.xlsx
USE RAPIDMINER
EXERCISE #25: MEASURE PERFORMANCE OF THE DECISION TREE FOREST (USING PEFORMANCE)
Use RANDOM FOREST to determine the relationship of factors to response MSAT (Marriage Satisfaction).
Then determine the model performance using OPERATOR: PERFORMANCE (REGRESSION)
DATA: CDSA_EX12.xlsx
USE RAPIDMINER
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
GIVEN the file CDSA_EX14, determine the LOAN SCORE PREDICTION model and its performance if the
following models are used:
A. DECISION TREE
B. GLM
C. NEURAL NET
D. RANDOM FOREST
E. DEEP LEARNING
F. GLM VIA EXCEL
Create a table and show absolute error, relative error, correlation and squared correlation.
Determine which factors affect the loan score.
CDAA
DBMA
CDRA
CDSA
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
https://www.nievgen.com/CDAA.html
ONE SAMPLE T
Given the file CDSA_E16.xlsx, determine if the Absolute Percentage Error meets the 3.0% error
criterion/requirement of the company.
SHOW RESULT:
TWO SAMPLE T
Given the file CDSA_E17.xlsx, compare the model performance between NEURAL NET and DEEP
LEARNING. Which one is better?
SHOW RESULT:
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE
NIEVGEN DATA SCIENCE DEPARTMENT | CDSA
PAIRED T
Given the file CDSA_E19.xlsx, determine if the employee performance survey improved?
Use SPCforExcel in the process.
SHOW RESULT:
Given the file CDSA_E20.xlsx, determine if the MODELS ARE EQUAL OR NOT
SHOW RESULT:
ENGR. JOHN C. PLACENTE | ALL RIGHTS RESERVED | DO NOT SHARE WITH ANYONE