0% found this document useful (0 votes)
32 views

Data Mining Process

The CRISP DM process is an iterative data science process consisting of six steps: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. It involves gaining an understanding of the problem domain and objectives, preparing data through activities like data cleaning and feature selection, developing a model using algorithms on training data, evaluating the model on test data, and deploying the final model for use.

Uploaded by

sanee.yadav
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Data Mining Process

The CRISP DM process is an iterative data science process consisting of six steps: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. It involves gaining an understanding of the problem domain and objectives, preparing data through activities like data cleaning and feature selection, developing a model using algorithms on training data, evaluating the model on test data, and deploying the final model for use.

Uploaded by

sanee.yadav
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Data Science ProcessProcess

CRISP DM process
 The methodical discovery of useful relationships and patterns in data is enabled by
a set of iterative activities collectively known as data science process

 Understanding the process


Business Data
 Preparing the data samples Understanding Understanding

 Developing the model


Data Preparation

 Applying the model on dataset Deployment


Data

Modeling
 Deploying and maintaining the
model
Evaluation
Process
Business Data
Understanding Understanding 1. Prior Knowledge

Prepare Data

2. Preparation
Building Model using
Training Data
Algorithms

3. Modeling
Test Data Applying Model and
performance evaluation

4. Application
Deployment

Knowledge and Actions 5. Knowledge


1. Prior Knowledge
 Prior knowledge refers to information that is already known about a subject
 Gaining information on:
 Objective of the problem
 Subject area of the problem
 Data

Example: for the lending example, a simple data set of ten points
 Terminologies used
 A Dataset

 A datapoint

 An Attribute

 A label

 Identifiers
2. Data Preparation

Data Exploration
Data quality
Handling missing values
Data type conversion
Transformation
Outliers
Feature selection
Sampling
3. Modeling
Training Data Build model

Test Data Evaluation

Final Model
3.Spliting
Modeling
training and test data sets
3.Spliting
Modeling
training and test data sets

Training Data
Test Data
3. Modeling
3. Modeling

Evaluation of test dataset


3. Application

Product readiness
Technical integration
Model response time
Remodeling
Assimilation
5. Knowledge

Posterior knowledge

Kotu, V., & Deshpande, B. (2014). Predictive analytics and data mining: concepts and practice with rapidminer. Morgan Kaufmann.

You might also like