10 Data Mining

Data mining, also known as Knowledge Discovery in Database (KDD), is the process of extracting valuable information from large datasets to identify patterns and trends for data-driven decision-making. It involves several steps including data cleaning, integration, selection, transformation, mining, evaluation, and presentation. While data mining offers advantages such as cost efficiency and improved decision-making, it also has disadvantages like potential misuse of customer data and the complexity of some analytical tools.

Uploaded by

ishmaelkipruto704

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

10 Data Mining

Uploaded by

ishmaelkipruto704

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Data Mining

• Data mining is one of the most useful techniques that

help entrepreneurs, researchers, and individuals to
extract valuable information from huge sets of data.
• Data mining is also called Knowledge Discovery in
Database (KDD).
Definition
Data mining is the process of extracting information to
identify patterns, trends, and useful data that would allow
the business to take the data-driven decision from huge
sets of data is called Data Mining.
Data Mining
The knowledge discovery
process includes:
1. Data cleaning
2. Data integration
3. Data selection
4. Data transformation
5. Data mining
6. Pattern evaluation
7. Knowledge
presentation.
Data Mining – Other Definitions
1. Data Mining is the process of investigating hidden patterns of
information to various perspectives for categorization into useful
data, which is collected and assembled in particular areas such as
data warehouses, efficient analysis, data mining algorithm,
helping decision making and other data requirement to eventually
cost-cutting and generating revenue.
2. Data mining is the act of automatically searching for large stores
of information to find trends and patterns that go beyond simple
analysis procedures.
3. Data mining utilizes complex mathematical algorithms for data
segments and evaluates the probability of future events.
4. Data Mining is a process used by organizations to extract specific
data from huge databases to solve business problems. It primarily
turns raw data into useful information.
Data Mining vs Data Science
• Data Mining is similar to Data Science carried out by a
person, in a specific situation, on a particular data set,
with an objective.
• This process includes various types of services such as:
– text mining,
– web mining,
– audio and video mining,
– pictorial data mining, and
– social media mining
• This is usually done through software that is simple or
highly specific.
Advantages of Data Mining
1. The Data Mining technique enables organizations to obtain
knowledge-based data.
2. Data mining enables organizations to make lucrative
modifications in operation and production.
3. Compared with other statistical data applications, data mining is
a cost-efficient.
4. Data Mining helps the decision-making process of an
organization.
5. It Facilitates the automated discovery of hidden patterns as well
as the prediction of trends and behaviors.
6. It can be induced in the new system as well as the existing
platforms.
7. It is a quick process that makes it easy for new users to analyze
enormous amounts of data in a short time.
Disadvantages of Data Mining
1. There is a probability that the organizations may sell useful
data of customers to other organizations for money.
- As per the report, American Express has sold credit card purchases of
their customers to other organizations.
2. Many data mining analytics software is difficult to operate
and needs advance training to work on.
3. Different data mining instruments operate in distinct ways
due to the different algorithms used in their design.
Therefore, the selection of the right data mining tools is a
very challenging task.
4. The data mining techniques are not precise, so that it may
lead to severe consequences in certain conditions.
Data Mining Applications
Data Mining Applications
• Data Mining is primarily used by organizations with
intense consumer demands-
– Retail,
– Communication,
– Financial,
– marketing company,
– determine price,
– consumer preferences,
– product positioning, and
– impact on sales, customer satisfaction, and corporate profits.
• Data mining enables a retailer to use point-of-sale
records of customer purchases to develop products and
promotions that help the organization to attract the
customer.
Data Mining Techniques
• There are four main techniques
– Predictive Modelling
– Database Segmentation
– Link Analysis
– Deviation Direction
• Many applications may work well when several or a
combination of operations are used
Data Mining Techniques
1. Predictive Modelling
– This technique uses observations to form a model of the
important characteristics of some phenomenon
– This technique can be used to analyse an existing
database to determine some essential characteristics about
the data set.
– Uses supervised learning.
• There are two main Techniques used in predictive
modelling:
– Classification
– Regression
Data Mining Techniques
a. Classification
– It is used to establish a specific predetermined class for
each record in a database from a finite set of possible class
values, e.g. if a customer has rented for > 2 years and >
25 years old then they are most likely to buy property.
– Can use the following classifiers: neural network,
decision tree, Bayes Naïve etc
b. Value prediction (Regression)
– It is used to estimate a continuous numeric value that is
associated with a database record,
– It uses statistical techniques e.g. linear/non-linear
regression.
Classification Example-
Tree Induction
Customer renting property
> 2 years
No Yes

Customer age
Rent property > 25 years?

No Yes

Rent property Buy property

Source: Connolly and Begg

Data Mining Techniques
2. Database Segmentation (Cluster Analysis)
– This techniques creates clusters by partitioning a database
into an unknown number of segments (or clusters) of
records which share a number of propertiesi.e.
homogenous
– Uses unsupervised learning to discover sub-populations
in the database.
• The two main Techniques in database segmentation
are:
– Demographic clustering
– Neural clustering
Segmentation: Scatterplot
Example

Source: Connolly and Begg

Data Mining Techniques
3. Link Analysis (Association Rule Analysis)
– This technique is used to establish associations between
individual records (or sets of records) in a database
• e.g. ‘when a customer rents property for more than two years
and is more than 25 years old, then in 40% of cases, the
customer will buy the property’
• The main Techniques used in link analysis are:
• Association discovery
• Sequential pattern discovery
• Similar time sequence discovery
Data Mining Techniques
a. Association discovery – items which imply the
presence of other item in same event
a. Sequential discovery – presence of 1 set of item
implies presence of another in a period of time (e.g.
long term customer buying behaviour)
b. Similar time sequence discovery – discovery of link
between 2 sets of data that are time dependent, e.g.
buying property -> buy household goods within 2
months.
Data Mining Techniques
4. Deviation Detection
– This technique is used to identifies records or
‘outliers’, which deviates from some known
expectation or norm (value that are out of the
ordinary)
– Can be done either statistically (e.g. linear
regression) or by visualisation (e.g. graphically).
Deviation Detection:
Visualisation Example

Source: Connolly and Begg

Mining and Warehousing
• Data warehouse is the ideal data source for data
mining.
• Data mining needs single, separate, clean, integrated,
self-consistent data source:
– It is populated with clean, consistent data
– Contains multiple sources that allow to discover as many
inter-relationships as possible
– Utilises Query capabilities that allow for selection of
relevant subsets of records and fields
– Has capability to go back to data source i.e. provides a
way for data mining results to allow further investigation
of uncovered patterns.
Further Reading
• Connolly and Begg, chapters 31 to 34.
• W H Inmon, Building the Data Warehouse, New
York, Wiley and Sons, 1993.
• Benyon-Davies P, Database Systems (2nd ed),
Macmillan Press, 2000, ch 34, 35 & 36.

ML Tutorial Con Ejemplos
No ratings yet
ML Tutorial Con Ejemplos
236 pages
UNIT I DBMI
No ratings yet
UNIT I DBMI
35 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Seminar on Data Mining Concepts and Its
No ratings yet
Seminar on Data Mining Concepts and Its
8 pages
L_1 Data Mining
No ratings yet
L_1 Data Mining
17 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Data Mining Tutorial - Javatpoint
No ratings yet
Data Mining Tutorial - Javatpoint
12 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
DATA MINIING Unit 1 Notes
No ratings yet
DATA MINIING Unit 1 Notes
22 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
DM ITERA 2020 w1
No ratings yet
DM ITERA 2020 w1
35 pages
Synopsis Print
No ratings yet
Synopsis Print
4 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
Data Mining Unit 1(Msc Ds 3 Sem)
No ratings yet
Data Mining Unit 1(Msc Ds 3 Sem)
119 pages
Data Mining AND Warehousing: Abstract
No ratings yet
Data Mining AND Warehousing: Abstract
12 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
30 pages
PPT 1
No ratings yet
PPT 1
34 pages
Dadm (1) Sidra
No ratings yet
Dadm (1) Sidra
9 pages
Data-Mining-OVERVIEW (1)
No ratings yet
Data-Mining-OVERVIEW (1)
8 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Data Mining
No ratings yet
Data Mining
20 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
Module 3
No ratings yet
Module 3
187 pages
Digital Design - Morris Mano-Fifth Edition
No ratings yet
Digital Design - Morris Mano-Fifth Edition
31 pages
IDW Lecture 32-Data Mining Techniques
No ratings yet
IDW Lecture 32-Data Mining Techniques
17 pages
Fundamentals of Data Science Unit 1
No ratings yet
Fundamentals of Data Science Unit 1
29 pages
L1 CH 1 Introd
No ratings yet
L1 CH 1 Introd
97 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
1.1 - Intro DM
No ratings yet
1.1 - Intro DM
4 pages
DM Unit-1
No ratings yet
DM Unit-1
27 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
Notes DATA MINING MBA III
No ratings yet
Notes DATA MINING MBA III
8 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
Presentation Data Mining
No ratings yet
Presentation Data Mining
22 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
29 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Unit 1
No ratings yet
Unit 1
27 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
DMI UNIT 1
No ratings yet
DMI UNIT 1
8 pages
Why Data Mining? Behavioral Data: From Lecture Notes
No ratings yet
Why Data Mining? Behavioral Data: From Lecture Notes
5 pages
Introduction and Meaning: The Most Commonly Used Techniques in Data Mining Are
No ratings yet
Introduction and Meaning: The Most Commonly Used Techniques in Data Mining Are
2 pages
Data mining M1
No ratings yet
Data mining M1
64 pages
data mining 1
No ratings yet
data mining 1
39 pages
Technincal Report
No ratings yet
Technincal Report
10 pages
Unit 1 Data Warehouse and Data Mining
No ratings yet
Unit 1 Data Warehouse and Data Mining
13 pages
Data Warehousing&Dat Mining
No ratings yet
Data Warehousing&Dat Mining
12 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
Big Data 4 (3 - 4)
No ratings yet
Big Data 4 (3 - 4)
13 pages
VO_MCA_S4_Data Mining Unit 1
No ratings yet
VO_MCA_S4_Data Mining Unit 1
18 pages
Unit - I
No ratings yet
Unit - I
22 pages
Data Mining Concepts and Techniques
50% (2)
Data Mining Concepts and Techniques
136 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
Data Mining
No ratings yet
Data Mining
395 pages
Data Mining 445545
No ratings yet
Data Mining 445545
11 pages
DM Module1
No ratings yet
DM Module1
15 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Cost Segregation and Estimation (Final)
No ratings yet
Cost Segregation and Estimation (Final)
3 pages
Itron Analytics Water
100% (1)
Itron Analytics Water
4 pages
DAA - Chapter 01
No ratings yet
DAA - Chapter 01
14 pages
Dyslexia Treatment Studies A Systematic Review and
No ratings yet
Dyslexia Treatment Studies A Systematic Review and
19 pages
Mining Various Kinds of Association Rules
No ratings yet
Mining Various Kinds of Association Rules
11 pages
LUMBA FC Syllabus
No ratings yet
LUMBA FC Syllabus
15 pages
Sample Exam Questions
No ratings yet
Sample Exam Questions
7 pages
IWB Chapter 11 - Forecasting
No ratings yet
IWB Chapter 11 - Forecasting
20 pages
AB1202 Quiz 3 Prep Special R-Skills v1 Nov'20oubhjnl
No ratings yet
AB1202 Quiz 3 Prep Special R-Skills v1 Nov'20oubhjnl
2 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
PBL PPT
No ratings yet
PBL PPT
13 pages
Lecture-2 Least Squares Regression
No ratings yet
Lecture-2 Least Squares Regression
18 pages
Wandosan RSCH
No ratings yet
Wandosan RSCH
51 pages
Fundamental Analysis of FMCG Sector Ashish Chanchlani PDF
0% (1)
Fundamental Analysis of FMCG Sector Ashish Chanchlani PDF
61 pages
Dhir Resume(Main) (1) (1)
No ratings yet
Dhir Resume(Main) (1) (1)
2 pages
Higher Nationals in Computing: Unit 06: Managing A Successful Computing Project Assignment 2
No ratings yet
Higher Nationals in Computing: Unit 06: Managing A Successful Computing Project Assignment 2
14 pages
Term Paper Analysis Sample
No ratings yet
Term Paper Analysis Sample
5 pages
100 Days of ML
No ratings yet
100 Days of ML
383 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
105 pages
Correlation Coefficients: Appropriate Use and Interpretation: Anesthesia & Analgesia February 2018
100% (1)
Correlation Coefficients: Appropriate Use and Interpretation: Anesthesia & Analgesia February 2018
7 pages
TQM Chapter 7
No ratings yet
TQM Chapter 7
9 pages
Chapter 5 - Data Exploration and Visualization With
No ratings yet
Chapter 5 - Data Exploration and Visualization With
39 pages
Lecture 4
No ratings yet
Lecture 4
20 pages
Case Study of Sales Forecasting
No ratings yet
Case Study of Sales Forecasting
22 pages
Big Data Analytics in Mobile Cellular Networks
No ratings yet
Big Data Analytics in Mobile Cellular Networks
29 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
53 pages
IAPA Social Networks
No ratings yet
IAPA Social Networks
16 pages
Module 4
No ratings yet
Module 4
12 pages
MBTI Global Manual Supplement (US)
No ratings yet
MBTI Global Manual Supplement (US)
25 pages

10 Data Mining

Uploaded by

10 Data Mining

Uploaded by

Data Mining

• Data mining is one of the most useful techniques that

Rent property Buy property

Source: Connolly and Begg

Source: Connolly and Begg

Source: Connolly and Begg

You might also like