0% found this document useful (0 votes)

42 views

Data Mining

Data mining refers to analyzing large quantities of data stored on computers to extract useful information and discover patterns. Businesses use data mining to develop effective marketing strategies, increase sales, and reduce costs. Examples of data mining include grocery stores analyzing purchase data and medical facilities diagnosing patients. Key data mining techniques are association, classification, clustering, prediction, and analysis of sequential patterns. Classification methods like decision trees and neural networks are used to categorize data.

Uploaded by

Tinashe Kota

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Data Mining

Uploaded by

Tinashe Kota

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

DATA MINING

TINEYI CHINHENGO M172823

DEFINITION

 Data mining refers to the analysis of the large quantities of data that are stored in
computers.
 It refers to extraction or “mining” knowledge from large amounts of data.
 Data mining is a process used by companies to turn raw data into useful information.
By using software to look for patterns in large batches of data, businesses can learn
more about their customers to develop more effective marketing strategies, increase
sales and decrease costs. Data mining depends on effective data collection,
warehousing, and computer processing.
A BIG PICTURE OF DATA MINING
EXAMPLES OF DATA MINING

 Grocery stores have large amounts of data generated by our purchases.

 Bar coding has made checkout very convenient for us, and provides retail establishments with masses
of data. Grocery stores and other retail stores are able to quickly process our purchases, and use
computers to accurately determine product prices.
Information gathered through bar coding can be used for data mining analysis.
 Data mining has been heavily used in the medical field, to include diagnosis of patient records to help
identify best practices.
EXAMPLES OF DATA MINING

 Data mining is widely used by banking firms in soliciting credit card customers,4 by
insurance and telecommunication companies in detecting fraud.
 By telephone companies and credit card issuers in identifying those potential
customers most likely to churn.
 By manufacturing firms in quality control, and many other applications.
EXAMPLES OF DATA MINING

 Data mining can be used by businesses in many ways. Three examples are:
 1. Customer profiling, identifying those subsets of customers most profitable to the
business;
 2. Targeting, determining the characteristics of profitable customers who have been
captured by competitors;
 3. Market-basket analysis, determining product purchases by consumer, which can be
used for product positioning and for cross-selling.
WHY DO WE NEED DATA MINING

 Fraud detection
 Potential clients
 Quality control
 Product positioning
 Cross-selling
WHAT IS NEEDED TO DO DATA MINING

 Data mining requires identification of a problem, along with collection of data that
can lead to better understanding, and computer models to provide statistical or other
means of analysis.
 Data mining tools need to be versatile, scalable, capable of accurately predicting
responses between actions and results, and capable of automatic implementation.
 Versatile refers to the ability of the tool to apply a wide variety of models. Scalable
tools imply that if the tools works on a small data set, it should also work on larger
data sets.
DATA MINING TECHNIQUES

 Data Mining can be achieved by

 Association,

 Classification,

 Clustering,

 Prediction,

 Sequential Patterns and

 Similar Time Sequences.

ASSOCIATION

 the relationship of a particular item in a data transaction on other items in the same transaction is used
to predict patterns.
 For example, if a customer purchases a laptop PC (X), then he or she also buys a mouse (Y) in 60% of
the cases. This pattern occurs in 5.6% of laptop PC purchases. An association rule in this situation can
be “X implies Y, where 60% is the confidence factor and 5.6% is the support factor.” When the
confidence factor and support factor are represented by linguistic variables “high” and “low,”
respectively, the association rule can be written in the fuzzy logic form, such as: “where the support
factor is low, X implies Y is high.” In the case of many qualitative variables, fuzzy association is a
necessary and promising technique in data mining.
CLASSIFICATION

 The methods are intended for learning different functions that map each item of the selected data into one of a predefined
set of classes. Given the set of predefined classes, a number of attributes, and a “learning (or training) set,” the
classification methods can automatically predict the class of other unclassified data of the learning set. Two key research
problems related to classification results are the evaluation of misclassification and prediction power. Mathematical
techniques that are often used to construct classification methods are
 binary decision trees,

 Neural networks,

 linear programming and

 statistics
BINARY DECISION TREE

 By using decision trees, a tree induction model with a “Yes–No” format can be built to split data into
different classes according to its attributes.
 Models fit to data can be measured by either statistical estimation or information entropy.
 Clearly lay out the problem so that all options can be challenged.
 Allow us to analyze fully the possible consequences of a decision.
 It provides a framework to quantify the values of outcomes and the probabilities of achieving them.
 However, the classification obtained from tree induction may not produce an optimal solution where
prediction power is limited.
HOW TO CONSTRUCT A DECISION TREE

‘Root node’
or ‘the root’ These are called
‘internal nodes’
or just ‘nodes’

‘leaf nodes’
or ‘leaves’
DECISION TREE

 Internal nodes have arrows pointing to them.

 And they have arrows pointing away from them.

 Leaf nodes have arrows pointing to them, but there are no arrows pointing away from them.
NEURAL NETWORKS

 By using neural networks, a neural induction model can be built.

 In this approach, the attributes become input layers in the neural network while the classes associated
with data are output layers. Between input layers and output layers, there are a larger number of hidden
layers processing the accuracy of the classification.
 Although the neural induction model often yields better results in many cases of data mining, since the
relationships involve complex nonlinear relationships, implementing this method is difficult when
there’s a large set of attributes.
HOW TO CONSTRUCT A NEURAL NETWORK

0.8
channels

0.2
Which performs most of the computations required by
0.9
our network.
0.1 0.1
0.6
Actual Error
output
0.3 0.2 0.3 0.5
0 -0.5
0.8 0.7
0.7 0.4

Neuron
1 0.6
0.1 0.8 0.6 0.1 0 -0.1
0.9

0.2
0.1 0.3
Output layer 123
0.9
Activatio
input layer
Hidden layers
n
Forward Propagation

Back Propagation
LINEAR PROGRAMMING

 In linear programming approaches, the classification problem is viewed as a special form of linear program.

 Given a set of classes and a set of attribute variables, one can define a cutoff limit (or boundary) separating the
classes.
 Then each class is represented by a group of constraints with respect to a boundary in the linear program.

 The objective function in the linear programming model can minimize the overlapping rate across classes and
maximize the distance between classes.
 The linear programming approach results in an optimal classification.
STATISTICAL

 However, the computation time required may exceed that of statistical approaches.
 Various statistical methods, such as linear discriminant regression, quadratic discriminant regression,
and logistic discriminant regression are very popular and are commonly used in real business
classifications.
 Even though statistical software has been developed to handle a large amount of data, statistical
approaches have a disadvantage in efficiently separating multiclass problems in which a pair-wise
comparison (i.e., one class versus the rest of the classes) has to be adopted.
CLUSTERING

 Cluster analysis takes ungrouped data and uses automatic techniques to put this data
into groups.
 Clustering is unsupervised, and does not require a learning set.
 It shares a common methodological ground with Classification.
 In other words, most of the mathematical models mentioned earlier in regards to
Classification can be applied to Cluster Analysis as well.
PREDICTION

 Prediction analysis is related to regression techniques.

 The key idea of prediction analysis is to discover the relationship between the dependent and
independent variables, the relationship between the independent variables (one versus Another, one
versus the rest, and so on).
 For example, if sales is an independent variable, then profit may be a dependent variable.
 By using historical data from both sales and profit, either linear or nonlinear regression techniques can
produce a fitted regression curve that can be used for profit prediction in the future.
SEQUENTIAL PATTERN

 Sequential Pattern analysis seeks to find similar patterns in data transaction over a business period.

 These patterns can be used by business analysts to identify relationships among data.

 The mathematical models behind Sequential Patterns are logic rules, fuzzy logic, and so on.

 As an extension of Sequential Patterns, Similar Time Sequences are applied to discover sequences similar to a
known sequence over both past and current business periods.
 In the data mining stage, several similar sequences can be studied to identify future trends in transaction
development.
 This approach is useful in dealing with databases that have time-series characteristics.
DATA MINING PROCESS

 In order to systematically conduct data mining analysis, a general process is usually followed.

 CRISP-DM – is an industry standard process consisting of a sequence of steps that are usually involved in a data
mining study.
 SEMMA – is specific to SAS, while each step of either approach is not needed in every analysis, this process
provides a good coverage of the steps needed, starting with data exploration, data collection, data processing,
analysis, inferences drawn and implementation.
CRISP-DM

 There is a Cross-Industry Standard Process for Data Mining (CRISP-DM) widely used by industry
members.
 This model consists of six phases intended as a cyclical process.

Business Understanding
 Business understanding includes determining business objectives, assessing the current situation,
establishing data mining goals, and developing a project plan.
CRISP-DM

Data Understanding
 Once business objectives and the project plan are established, data understanding considers data requirements.

 This step can include initial data collection, data description, data exploration, and the verification of data quality.

 Data exploration such as viewing summary statistics (which includes the visual display of categorical variables)
can occur at the end of this phase.
 Models such as cluster analysis can also be applied during this phase, with the intent of identifying patterns in the
data.
CRISP-DM

Data Preparation
 Once the data resources available are identified, they need to be selected, cleaned, built into the form
desired, and formatted.
 Data cleaning and data transformation in preparation of data modeling needs to occur in this phase.
 Data exploration at a greater depth can be applied during this phase, and additional models utilized,
again providing the opportunity to see patterns based on business understanding.
CRISP-DM PROCESS
CRISP-DM

Modeling
 Data mining software tools such as visualization (plotting data and establishing relationships) and
cluster analysis (to identify which variables go well together) are useful for initial analysis.
 Tools such as generalized rule induction can develop initial association rules.
 Once greater data understanding is gained (often through pattern recognition triggered by viewing
model output), more detailed models appropriate to the data type can be applied.
 The division of data into training and test sets is also needed for modeling.
Evaluation
 Model results should be evaluated in the context of the business objectives established in the first phase (business

 understanding).

 This will lead to the identification of other needs (often through pattern recognition), frequently reverting to prior
phases of CRISP-DM.
 Gaining business understanding is an iterative procedure in data mining, where the results of various visualization,
statistical, and artificial intelligence tools show the user new relationships that provide a deeper understanding of
organizational operations.
CRISP-DM

Deployment
 Data mining can be used to both verify previously held hypotheses, or for knowledge discovery (identification of
unexpected and useful relationships).
 Through the knowledge discovered in the earlier phases of the CRISP-DM process, sound models can be obtained
that may then be applied to business operations for many purposes, including prediction or identification of key
situations.
 These models need to be monitored for changes in operating conditions, because what might be true today may
not be true a year from now.
 If significant changes do occur, the model should be redone.

 It’s also wise to record the results of data mining projects so documented evidence is available for future studies.
CRISP-DM

 There’s usually a great deal of backtracking.

 Additionally, experienced analysts may not need to apply each phase for every study.
 CRISP-DM provides a useful framework for data mining.

BrokenDreamers-1.15.0_walkthrough-rev-1.7
No ratings yet
BrokenDreamers-1.15.0_walkthrough-rev-1.7
25 pages
Weka Tutorial 3
No ratings yet
Weka Tutorial 3
60 pages
Hair PPT Ch06
No ratings yet
Hair PPT Ch06
16 pages
CSE422_15_Lab_GroupNo011_Report_23241101_21201596
No ratings yet
CSE422_15_Lab_GroupNo011_Report_23241101_21201596
13 pages
1.0 Modeling: 1.1 Classification
No ratings yet
1.0 Modeling: 1.1 Classification
5 pages
4116-Article Text-24835-1-10-20250113
No ratings yet
4116-Article Text-24835-1-10-20250113
10 pages
Data Shoes
100% (1)
Data Shoes
51 pages
ML QA
No ratings yet
ML QA
10 pages
Business Data Mining
No ratings yet
Business Data Mining
4 pages
Oral Questions LP-II: Star Schema
No ratings yet
Oral Questions LP-II: Star Schema
21 pages
Data Analytic 3 Marks Q
No ratings yet
Data Analytic 3 Marks Q
10 pages
Final2 Math EE
No ratings yet
Final2 Math EE
77 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
Presentation Credit Card
No ratings yet
Presentation Credit Card
25 pages
5 What Is Data-WPS Office
No ratings yet
5 What Is Data-WPS Office
19 pages
unit-2
No ratings yet
unit-2
16 pages
Vectorized Deep Learning
No ratings yet
Vectorized Deep Learning
16 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
HPC Mini Project Report
100% (1)
HPC Mini Project Report
12 pages
DM Unit 1 PDF
No ratings yet
DM Unit 1 PDF
9 pages
Oral Questions LP II
No ratings yet
Oral Questions LP II
21 pages
Dss On Life Insurance
No ratings yet
Dss On Life Insurance
20 pages
Detection of Cyber Attacks Using Artificial Intelligence
No ratings yet
Detection of Cyber Attacks Using Artificial Intelligence
14 pages
ML NOTES
No ratings yet
ML NOTES
13 pages
PMA 5
No ratings yet
PMA 5
39 pages
7 Types of Classification Algorithms
No ratings yet
7 Types of Classification Algorithms
9 pages
Unit 2
No ratings yet
Unit 2
19 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
Decision Region vs
No ratings yet
Decision Region vs
4 pages
DSBDL Asg 2 Write Up
No ratings yet
DSBDL Asg 2 Write Up
4 pages
BANA 560 - Lecture - 2 - Data - Mining - Overview - Data - Exploration
No ratings yet
BANA 560 - Lecture - 2 - Data - Mining - Overview - Data - Exploration
38 pages
Sat - 149.Pdf - Prediction of Bigmart Sales Using Machine Learning Algorihms
No ratings yet
Sat - 149.Pdf - Prediction of Bigmart Sales Using Machine Learning Algorihms
11 pages
Neural Network - Merged
No ratings yet
Neural Network - Merged
9 pages
Data Science Crash Course
100% (1)
Data Science Crash Course
32 pages
(IJCST-V9I3P23) :aditi Linge, Bhavya Malviya, Digvijay Raut, Payal Ekre
No ratings yet
(IJCST-V9I3P23) :aditi Linge, Bhavya Malviya, Digvijay Raut, Payal Ekre
3 pages
Anomaly Detection Report
No ratings yet
Anomaly Detection Report
33 pages
Storytelling With Data To Executives 09212016
100% (1)
Storytelling With Data To Executives 09212016
33 pages
Data Mining in Bioinformatics
No ratings yet
Data Mining in Bioinformatics
21 pages
BeverageSalesPrediction-FR
No ratings yet
BeverageSalesPrediction-FR
8 pages
Research Citation Notes
No ratings yet
Research Citation Notes
35 pages
Deep Learning for Data Analytics 2023 Answer
No ratings yet
Deep Learning for Data Analytics 2023 Answer
6 pages
Data Mining Project
No ratings yet
Data Mining Project
11 pages
Assignment of DMDW kg11
No ratings yet
Assignment of DMDW kg11
17 pages
Data Mining Questions and Answers
No ratings yet
Data Mining Questions and Answers
22 pages
Revenue Predictor - Udit Ennam PDF
No ratings yet
Revenue Predictor - Udit Ennam PDF
30 pages
Short Answer
No ratings yet
Short Answer
19 pages
Project Report - ML
100% (1)
Project Report - ML
17 pages
Rev Insurance Business Report
No ratings yet
Rev Insurance Business Report
4 pages
Project Data Mining
No ratings yet
Project Data Mining
55 pages
Disaster
No ratings yet
Disaster
20 pages
IV-cse DM Viva Questions
No ratings yet
IV-cse DM Viva Questions
10 pages
AI_slides1
No ratings yet
AI_slides1
66 pages
Neural Network Thesis 2013
100% (1)
Neural Network Thesis 2013
5 pages
project
No ratings yet
project
36 pages
Machine Learning Viva Questions
No ratings yet
Machine Learning Viva Questions
6 pages
Ijcsea 2
No ratings yet
Ijcsea 2
13 pages
Deep-Learning Notes 01
No ratings yet
Deep-Learning Notes 01
8 pages
DL Unit1
No ratings yet
DL Unit1
10 pages
SML Updated UNIT-2
No ratings yet
SML Updated UNIT-2
43 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
ISA Symbols
No ratings yet
ISA Symbols
2 pages
Nouns Exercise Answers
No ratings yet
Nouns Exercise Answers
8 pages
Jurnal Epo 1
No ratings yet
Jurnal Epo 1
7 pages
6066 T6 Aircraft Aluminum Alloy Sheet Suppliers
No ratings yet
6066 T6 Aircraft Aluminum Alloy Sheet Suppliers
12 pages
Common Pool Inventory Sugar Yearwise - Final
No ratings yet
Common Pool Inventory Sugar Yearwise - Final
66 pages
Hospital Pharmacy
No ratings yet
Hospital Pharmacy
32 pages
Cap Bank Datasheet 81919
No ratings yet
Cap Bank Datasheet 81919
4 pages
Profile of MT Malindang
100% (1)
Profile of MT Malindang
62 pages
Learning Skills Assessment Record
100% (1)
Learning Skills Assessment Record
2 pages
2024Q1 Overseas Quarter PromotiJ1aS9O49
No ratings yet
2024Q1 Overseas Quarter PromotiJ1aS9O49
9 pages
A Free, Easy-To-Use, Computer-Based Simple and Four-Choice Reaction Time Programme: The Deary-Liewald Reaction Time Task
No ratings yet
A Free, Easy-To-Use, Computer-Based Simple and Four-Choice Reaction Time Programme: The Deary-Liewald Reaction Time Task
11 pages
KKD-SAMUR-HSEJHA-011 JHA Bar Bending Cutting UPDATE 30.7.2013
No ratings yet
KKD-SAMUR-HSEJHA-011 JHA Bar Bending Cutting UPDATE 30.7.2013
3 pages
Thromboangitis Obliterans (Buerger's Disease)
No ratings yet
Thromboangitis Obliterans (Buerger's Disease)
5 pages
Fix The Sentences A 6 10 PDF
No ratings yet
Fix The Sentences A 6 10 PDF
4 pages
988Hole's Essentials of Human Anatomy and Physiology 12th Edition by David Shier, Jackie Butler, Ricki Lewis ISBN 0073403725 9780073403724 instant download
No ratings yet
988Hole's Essentials of Human Anatomy and Physiology 12th Edition by David Shier, Jackie Butler, Ricki Lewis ISBN 0073403725 9780073403724 instant download
29 pages
Lesson 1: Sap On Cloud Platform Cloud Computing
No ratings yet
Lesson 1: Sap On Cloud Platform Cloud Computing
10 pages
IP Video Surge Protection: Dtk-Mrjpoes or Dtk-Mrjpoe
No ratings yet
IP Video Surge Protection: Dtk-Mrjpoes or Dtk-Mrjpoe
1 page
WBO and WLN air compressors
No ratings yet
WBO and WLN air compressors
20 pages
Marketing Management
100% (3)
Marketing Management
118 pages
Course Title: Structure in English Course Code: EM3 Course Description
No ratings yet
Course Title: Structure in English Course Code: EM3 Course Description
7 pages
Catherine Bennett - OutbreakSafe Pty LTD - Advisor Agreement
No ratings yet
Catherine Bennett - OutbreakSafe Pty LTD - Advisor Agreement
9 pages
Chapter 1 Pesto - Docx, With Corrections
No ratings yet
Chapter 1 Pesto - Docx, With Corrections
11 pages
Instant ebooks textbook Technical Writing for Engineers & Scientists, 4th Edition Leo Finkelstein download all chapters
100% (4)
Instant ebooks textbook Technical Writing for Engineers & Scientists, 4th Edition Leo Finkelstein download all chapters
47 pages
BS 5628-3:2001 Code of Practice For Use of Masonry - Part 3: Materials and Components, Design and Workmanship
No ratings yet
BS 5628-3:2001 Code of Practice For Use of Masonry - Part 3: Materials and Components, Design and Workmanship
3 pages
BG Hitachi
No ratings yet
BG Hitachi
22 pages
Escort
No ratings yet
Escort
8 pages
Scientist Photo
No ratings yet
Scientist Photo
9 pages
Personal Account Application Form PDF
No ratings yet
Personal Account Application Form PDF
2 pages
En Rm7800 7840eglm Programmer Datasheet 66 2027 3 Nl05r0812
No ratings yet
En Rm7800 7840eglm Programmer Datasheet 66 2027 3 Nl05r0812
8 pages

Data Mining

Uploaded by

Data Mining

Uploaded by

DATA MINING

TINEYI CHINHENGO M172823

 Grocery stores have large amounts of data generated by our purchases.

 Data Mining can be achieved by

 Sequential Patterns and

 Similar Time Sequences.

 linear programming and

 Internal nodes have arrows pointing to them.

 And they have arrows pointing away from them.

 By using neural networks, a neural induction model can be built.

 Prediction analysis is related to regression techniques.

 There’s usually a great deal of backtracking.

You might also like