0% found this document useful (0 votes)

198 views26 pages

Unit-4 Introduction To Data Mining

Data mining is an information extraction activity that aims to discover hidden facts contained within large databases. Some basic data mining tasks include classification, regression, clustering, pattern mining, summarization, and link analysis. Data preprocessing is an important step in the KDD process and involves cleaning data by filling in missing values, smoothing noisy data, identifying outliers, and resolving inconsistencies.

Uploaded by

Shaheen Mondal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views26 pages

Unit-4 Introduction To Data Mining

Uploaded by

Shaheen Mondal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Unit-4

Introduction to Data Mining

Data Mining is an information extraction activity
whose goal is to discover hidden facts
contained in large
databases.

2
Data Mining Models and
Tasks
BASIC TASKS

 Classification : Classification is a data mining technique

used for systematic placement of group membership
for data.

 For example, you may wish to use classification to

predict whether the weather on a particular day will be
“sunny”, “rainy” or “cloudy”. Popular classification
techniques include decision trees and neural networks.

4
Classification

 Given old data about customers and payments, predict

new applicant’s loan eligibility.

Previous
customers Classifier Decision rules
Salary > 5 L
Age
Salary Good/
Profession
Prof. = Exec
bad
Location
Customer
type New applicant’s
data
DATA MINING TASKS…………cntd
 Regression : Used to predict for individuals on the basis of
information gained from a previous sample of similar
individuals.

Example:
 A person wants to do some savings for future, and then it wil be
based on his current values and several past values. He uses a
linear regression formula to predict his future savings.

6
DATA MINING TASKS…………cntd
Clustering : Clustering is a data mining technique used to place
data elements into related groups without advance knowledge
of the group definitions.

Example : A department store chain creates special catalogues

targeted to various types of customer groups based on
attributes such as income, location, etc.

7
DATA MINING TASKS…………cntd
 Pattern mining is a data mining method that involves
finding existing patterns in data. In this context patterns
often means association rules. The original motivation for
searching association rules came from the desire to analyze
supermarket transaction data, that is, to examine customer
behavior in terms of the purchased products.

 For example, an association rule “cold drink ⇒ potato chips

(80%)" states that four out of five customers that bought
cold drink also bought potato chips.

8
DATA MINING TASKS…………cntd
 Summarization maps data into subsets with associated
simple descriptions (Characterization or Generalization)
 Ex- GATE score

 Link Analysis uncovers relationships among data.

 Association Rules
 Sequential Analysis determines sequential patterns.

9
Data Mining Application: Marketing
 Sales Analysis
• associations between product sales:
 bread and butter
 Toothpaste and toothbrush

 Customer Profiling
• data mining can tell you what types of customers
buy what products
 Identifying Customer Requirements
• identify the best products for different customers
• use prediction to find what factors will attract
new
customers
10
Data Mining Application:
Fraud Detection
• Association Rule Mining can detect a group of people who
stage accidents to collect on insurance

• a data-mining application can be used to detect suspicious

money transactions

• data mining can be used to help commercial lending

decisions and to prevent fraud

11
Data Preprocessing

12
Why Data
Preprocessing?
 Data in the real world is dirty
incomplete: lacking attribute values, lacking certain
attributes of interest, or containing only aggregate
data
 e.g., occupation=“ ”
noisy: containing errors or outliers
e.g., Salary=“-10”
inconsistent: containing discrepancies in codes or
names
e.g., Age=“42” Birthday=“03/07/1997”
 e.g.,Was rating “1,2,3”, now rating “A, B, C”
 e.g., discrepancy between duplicate records
13 Data Mining: Concepts and Techniques
Why Is Data Dirty?

 Incomplete data may come from

 “Not applicable” data value when collected
 Different considerations between the time when the data was collected and when it
is analyzed.
 Human/hardware/software problems
 Noisy data (incorrect values) may come from
 Faulty data collection instruments
 Human or computer error at data entry
 Errors in data transmission
 Inconsistent data may come from
 Different data sources
 Functional dependency violation (e.g., modify some linked data)
 Duplicate records also need data cleaning
14 Data Mining: Concepts and Techniques August 10, 2015
Why Is Data Preprocessing
Important?

 No quality data, no quality mining results!

 Quality decisions must be based on quality data
e.g., duplicate or missing data may cause incorrect or even misleading statistics.
 Data warehouse needs consistent integration of quality data
 Data extraction, cleaning, and transformation comprises the majority
of the work of building a data warehouse

15 Data Mining: Concepts and Techniques

Multi-Dimensional Measure of Data
Quality
 Properties of a well-accepted multidimensional
view:
 Accuracy
 Completeness
 Consistency
 Timeliness
 Believability
 Value added
 Interpretability
 Accessibility

16 Data Mining: Concepts and Techniques August 10, 2015

Major Tasks in Data
Preprocessing
 Data cleaning
 Fill in missing values, smooth noisy data, identify or remove outliers, and
resolve
inconsistencies
 Data integration
 Integration of
multiple databases,
data cubes, or files
 Data
transformation
 Normalization and
aggregation
 Data reduction
 Obtains reduced representation in volume but produces the same or
17 Data Mining: Concepts and Techniques August 10, 2015
similar analytical results
Forms of Data
Preprocessing

18 Data Mining: Concepts and Techniques August 10, 2015

KDD Process

19
The KDD
process
"KDD is the nontrivial process of identifying valid, novel,
potentially useful, and ultimately understandablepatterns in
data".

20
Steps
: The process operates on the following basic steps:
 (i) identifying the goal from the user's point of view ( based on
the relevant knowledge about the domain),
 (ii) creating a target data,
 (iii) data preprocessing,
 (iv) data reduction and projection,
 (v) matching the goals of the KDD process,
 (vi) exploratory analysis,
 (vii) data mining,
 (viii) interpreting mined patterns,
 (ix) acting on the discovered knowledge.

21
 These steps can be divided into three tasks:
 the preprocessing of data(steps i - vi),
 the mining of data (steps vii) and
 the postprocessing of data (steps viii - ix).

 The domain knowledge helps the process to focus on the

research content.

22
Fig. : The KDD Process

23
KDD Process Ex: Web
Log
 Selection:
 Select log data (dates and locations) to use
 Preprocessing:
 Remove identifying URLs
 Remove error logs
 Transformation:
 Sessionize (sort and group)
 Data Mining:
 Identify and count patterns
 Construct data structure
 Interpretation/Evaluation:
 Identify and display frequently accessed sequences.
 Potential User Applications:
 Cache prediction
 Personalization

24
KDD
Issues
 Human Interaction
 Outliers
 Interpretation
 Visualization
 Large Datasets
 High Dimensionality

25
KDD Issues…………
cntd
 Multimedia Data
 Missing Data
 Irrelevant Data
 Noisy Data
 Changing Data
 Integration
 Application

Road Safety Week-Quiz Question Sets - A, B, C, D
100% (1)
Road Safety Week-Quiz Question Sets - A, B, C, D
8 pages
A Brief Overview On Data Mining Survey PDF
No ratings yet
A Brief Overview On Data Mining Survey PDF
8 pages
Datawarehouse&Data mining_ALL
No ratings yet
Datawarehouse&Data mining_ALL
46 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Unit 3 DW
No ratings yet
Unit 3 DW
19 pages
Dwdm Unit-II Notes
No ratings yet
Dwdm Unit-II Notes
29 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Unit 4 Intro DM
No ratings yet
Unit 4 Intro DM
30 pages
DM Module1
No ratings yet
DM Module1
15 pages
dm 1
No ratings yet
dm 1
47 pages
Unit-1
No ratings yet
Unit-1
148 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
DW&M Unit - 1-Imp Vii Sem
No ratings yet
DW&M Unit - 1-Imp Vii Sem
9 pages
Unit III Dwdm
No ratings yet
Unit III Dwdm
113 pages
DataMining S
No ratings yet
DataMining S
103 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
LECTURE 3-BDM 411 Data Analytics and BIG Data
No ratings yet
LECTURE 3-BDM 411 Data Analytics and BIG Data
49 pages
Data Mining 2.0
No ratings yet
Data Mining 2.0
15 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Combine 056
No ratings yet
Combine 056
57 pages
Lecture 2 Data Mining Functions
No ratings yet
Lecture 2 Data Mining Functions
40 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
Introduction-to-Data-Mining
No ratings yet
Introduction-to-Data-Mining
32 pages
Introduction to Data Mining
No ratings yet
Introduction to Data Mining
11 pages
UNIT 3
No ratings yet
UNIT 3
22 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Lecture - 1 02032023 095637am 1 29022024 124126pm
No ratings yet
Lecture - 1 02032023 095637am 1 29022024 124126pm
33 pages
Module 2 Data Mining
No ratings yet
Module 2 Data Mining
49 pages
Penambangan Data: Program Pascasarjana Fakultas Teknik Jteti - Ugm
No ratings yet
Penambangan Data: Program Pascasarjana Fakultas Teknik Jteti - Ugm
33 pages
Course Manual on Data Mining_CSC 425_015446
No ratings yet
Course Manual on Data Mining_CSC 425_015446
44 pages
Chapter 1
No ratings yet
Chapter 1
35 pages
UNIT I DBMI
No ratings yet
UNIT I DBMI
35 pages
IT326 - Ch1
100% (1)
IT326 - Ch1
17 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Data Mining Concepts and Techniques
50% (2)
Data Mining Concepts and Techniques
136 pages
DATA_MINING_UNIT_1
No ratings yet
DATA_MINING_UNIT_1
13 pages
References: Machine Learning Tools and Techniques, 2 Edition
No ratings yet
References: Machine Learning Tools and Techniques, 2 Edition
32 pages
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
No ratings yet
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
36 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
2-Tasks and Techniques
No ratings yet
2-Tasks and Techniques
17 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
31 pages
combinepdf-1
No ratings yet
combinepdf-1
74 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
62 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
Unit - 2 Data Minig Notes
No ratings yet
Unit - 2 Data Minig Notes
15 pages
IS352_ Lecture 01
No ratings yet
IS352_ Lecture 01
62 pages
data mining 1
No ratings yet
data mining 1
39 pages
BIS 541 Ch01 20-21 S
No ratings yet
BIS 541 Ch01 20-21 S
129 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
1 IT326 - Ch1 - Introduction
No ratings yet
1 IT326 - Ch1 - Introduction
37 pages
UNIT-III
No ratings yet
UNIT-III
33 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Parag Nijhawan
No ratings yet
Parag Nijhawan
8 pages
Unit 2 - Bipolar Junction Transistor
No ratings yet
Unit 2 - Bipolar Junction Transistor
165 pages
Unit-6 Groupware Chapter 19
No ratings yet
Unit-6 Groupware Chapter 19
58 pages
Unit-3-Heuristics Search Techniques
No ratings yet
Unit-3-Heuristics Search Techniques
81 pages
Basic Objects Necessary - Setting Up The Xmlhttprequest Object - Making The Call - How The Server Responds - Using The Reply - XML Basics
No ratings yet
Basic Objects Necessary - Setting Up The Xmlhttprequest Object - Making The Call - How The Server Responds - Using The Reply - XML Basics
19 pages
Online Security: Instructor: Prof. T. Vijayetha
No ratings yet
Online Security: Instructor: Prof. T. Vijayetha
35 pages
Compresor Ingersoll Rand P900a Wcu
No ratings yet
Compresor Ingersoll Rand P900a Wcu
112 pages
Lab p2
No ratings yet
Lab p2
9 pages
MBF100 Subject Outline
No ratings yet
MBF100 Subject Outline
2 pages
Mse 7
No ratings yet
Mse 7
7 pages
Moisture Content Test: Tugas Akhir
No ratings yet
Moisture Content Test: Tugas Akhir
2 pages
Clsu Abe Review Abe Laws
No ratings yet
Clsu Abe Review Abe Laws
208 pages
CL 8
No ratings yet
CL 8
8 pages
PDF Bs en Iso 17636 1 2013 DL
No ratings yet
PDF Bs en Iso 17636 1 2013 DL
18 pages
Electrical Panel Inspection
No ratings yet
Electrical Panel Inspection
5 pages
FOIA - All (532 Cases With 545 Relevant Allegations)
No ratings yet
FOIA - All (532 Cases With 545 Relevant Allegations)
36 pages
C-SPAN Rubric
No ratings yet
C-SPAN Rubric
2 pages
ASME Pressure Vessel Design-A
No ratings yet
ASME Pressure Vessel Design-A
190 pages
18eng25: Building Structures-Ii
No ratings yet
18eng25: Building Structures-Ii
61 pages
ISUZU
No ratings yet
ISUZU
5 pages
NIA Regional Secretariat
No ratings yet
NIA Regional Secretariat
62 pages
Instrument Detection Limit For LCMS - Internal Training
100% (1)
Instrument Detection Limit For LCMS - Internal Training
22 pages
Wa0006 PDF
No ratings yet
Wa0006 PDF
1 page
IR Spectroscopy: Structural Prediction of Organic Compounds
100% (1)
IR Spectroscopy: Structural Prediction of Organic Compounds
17 pages
IMU CET Coaching in Delhi
No ratings yet
IMU CET Coaching in Delhi
9 pages
HVAC Calculation
100% (1)
HVAC Calculation
1 page
PWS1700 Series: Intelligent Operator Interface
No ratings yet
PWS1700 Series: Intelligent Operator Interface
2 pages
Neal-Maes - Twins and Families
No ratings yet
Neal-Maes - Twins and Families
308 pages
Sacub-ES BE Learning-Continuity-Plan-2021-2022
No ratings yet
Sacub-ES BE Learning-Continuity-Plan-2021-2022
13 pages
TSW1506 2025
No ratings yet
TSW1506 2025
25 pages
Topographic Map of Utopia
No ratings yet
Topographic Map of Utopia
1 page
Specifying Requirements
No ratings yet
Specifying Requirements
78 pages
Trigonometric Functions
No ratings yet
Trigonometric Functions
5 pages
E3m Service Manual
No ratings yet
E3m Service Manual
320 pages
BXSW21N
No ratings yet
BXSW21N
64 pages
Study On Disappearance of House Sparrow Using Induced Fuzzy Cognitive Maps (IFCMs)
No ratings yet
Study On Disappearance of House Sparrow Using Induced Fuzzy Cognitive Maps (IFCMs)
4 pages

Unit-4 Introduction To Data Mining

Uploaded by

Unit-4 Introduction To Data Mining

Uploaded by

Unit-4

Introduction to Data Mining

 Classification : Classification is a data mining technique

 For example, you may wish to use classification to

 Given old data about customers and payments, predict

Example : A department store chain creates special catalogues

 For example, an association rule “cold drink ⇒ potato chips

 Link Analysis uncovers relationships among data.

• a data-mining application can be used to detect suspicious

• data mining can be used to help commercial lending

 Incomplete data may come from

 No quality data, no quality mining results!

15 Data Mining: Concepts and Techniques

16 Data Mining: Concepts and Techniques August 10, 2015

18 Data Mining: Concepts and Techniques August 10, 2015

 The domain knowledge helps the process to focus on the

You might also like