CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
Chapter 1
Introduction
OHT 1 : 15%
OHT 2: 15%
Quizzes: 10%
Total 4
All announced
Assignment: 10%
Semester Project
Syndicate Members: 1-3 2
Will be announced after 1st OHT
RESOURCES
Text Book:
1. Data Mining Concepts and Techniques
By Jiawei Han.
3rd Editionn
Reference:
Will be provided.
3
RESOURCES
Grading Scheme:
4
WHY DATA MINING?
5
We are drowning in data, but starving for knowledge!
WHAT IS DATA MINING?
6
DATA MINING: CONFLUENCE OF MULTIPLE
DISCIPLINES
7
ALTERNATIVE NAMES
Information Harvesting
Knowledge Mining
Data Mining
CS490D
Knowledge Discovery
in Databases Data Dredging
Data Archaeology
Data Pattern Processing
Database Mining
Knowledge Extraction
Siftware
Recommender systems
9
MARKET ANALYSIS AND MANAGEMENT
Where does the data come from?
Target marketing
Cross-market analysis
10
CS490D
Results
25/05/2022
Dr: HammaD AfzaL - Data Mining
12
FRAUD DETECTION & MINING
UNUSUAL PATTERNS
Clustering & model construction for frauds, Outlier analysis
CS490D
Medical Insurance
Professional patients, Ring of doctors, and Ring of
references
13
FRAUD DETECTION & MINING
UNUSUAL PATTERNS
Clustering & model construction for frauds, Outlier analysis
Banking Industry
Fraudulent transactions
CS490D
Retail industry
Analysts estimate that 38% of retail shrink is due to
dishonest employees
Anti-terrorism
14
KNOWLEDGE DISCOVERY (KDD) PROCESS
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
15
Databases
STEPS OF A KDD PROCESS
Data Integration:
Mulyiple Data sources may be combined.
Data Selection:
Where data relevant to analysis task are retrieved.
17
DATA MINING IN BUSINESS
INTELLIGENCE
Increasing potential
to support End User
business decisions Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
20
DATA MINING FUNCTION: (1)
GENERALIZATION
Data Characterization:
Summarization of general characteristics or features of
target class of data.
Data collected by query.
Output can be pie charts, bar charts, data cubes.
Data Discrimination:
Comparison of general features of target class with other
classes.
21
DATA MINING FUNCTION: (2) ASSOCIATION
ANALYSIS
Frequent patterns (or frequent itemsets).
Patterns that appear frequently in data.
Association,
A typical association rule
Buys(X,computer) -> buys (X, Software)
Computer Software [0.5%, 75%] (support, confidence)
Typical methods
Decision trees, naïve Bayesian classification, support vector machines,
neural networks, rule-based classification, pattern-based classification,
logistic regression, …
23
DATA MINING FUNCTION: (3) CLASSIFICATION
Typical applications:
Credit card fraud detection, direct marketing, classifying stars,
diseases, web-pages, …
24
DATA MINING FUNCTION: (3) REGRESSION
Similar to classification,
but is applied on ordered data (often numeric data).
Usually in the form:
Y = mx + c.
Where Y and X are variables.
25
DATA MINING FUNCTION: (4)
CLUSTER ANALYSIS
Unsupervised learning (i.e., Class label is unknown)
26
DATA MINING FUNCTION: (5)
OUTLIER ANALYSIS
Outlier analysis
Outlier: A data object that does not comply with the general behavior of the
data
27
COURTESY
5/25/22
Jiawei Han, Micheline Kamber, and Jian Pei
University of Illinois at Urbana-Champaign & Simon Fraser University.
28