0% found this document useful (0 votes)

38 views

CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal

The document discusses an introduction to data warehousing and data mining. It covers topics like the need for data mining due to large amounts of data, definitions of data mining, related disciplines, applications, and steps in the knowledge discovery process. It also discusses functions of data mining like generalization and association analysis. The document is the syllabus for a course on data warehousing and data mining.

Uploaded by

Zafar Iqbal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal

Uploaded by

Zafar Iqbal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

CS423

DATA WAREHOUSING AND DATA

MINING

Chapter 1
Introduction

Dr. Hammad Afzal

[email protected]

Department of Computer Software Engineering

National University of Sciences and Technology (NUST)
RESOURCES
 Lecture Slides will be available on LMS

 Additional references shall be provided (if any)

 OHT 1 : 15%
 OHT 2: 15%

 Quizzes: 10%
 Total 4
 All announced

 Assignment: 10%
 Semester Project
 Syndicate Members: 1-3 2
 Will be announced after 1st OHT
RESOURCES

Text Book:
 1. Data Mining Concepts and Techniques
 By Jiawei Han.
 3rd Editionn

Reference:
 Will be provided.

3
RESOURCES
 Grading Scheme:

 PEC - Washington Accord

 Outcome based Learning (OBE)

 Course Learning Objectives (CLOs)

4
WHY DATA MINING?

 The Explosive Growth of Data: from terabytes to petabytes

 Data collection and data availability

 Automated data collection tools, database systems, Web, computerized
society

 Major sources of abundant data

 Business: Web, e-commerce, transactions, stocks, …
 Science: Remote sensing, bioinformatics, scientific simulation, …
 Society and everyone: news, digital cameras, YouTube

5
 We are drowning in data, but starving for knowledge!
WHAT IS DATA MINING?

 Data mining (knowledge discovery from data)

 Extraction of interesting (non-trivial, implicit, previously unknown and
potentially useful) patterns or knowledge from huge amount of data

6
DATA MINING: CONFLUENCE OF MULTIPLE
DISCIPLINES

Machine Pattern Statistics

Learning Recognition

Applications Data Mining Visualization

Algorithm Database High-Performance

Technology Computing

7
ALTERNATIVE NAMES

Information Harvesting
Knowledge Mining
Data Mining

CS490D
Knowledge Discovery
in Databases Data Dredging

Data Archaeology
Data Pattern Processing

Database Mining
Knowledge Extraction
Siftware

The process of discovering meaningful new correlations, patterns, and trends by

sifting through large amounts of stored data, using pattern recognition
technologies and statistical and mathematical techniques 8
APPLICATIONS OF DATA MINING
 Web page analysis
 From web page classification, clustering to PageRank

 Recommender systems

 Basket data analysis to targeted marketing

 Biological and medical data analysis

9
MARKET ANALYSIS AND MANAGEMENT
 Where does the data come from?

 Credit card transactions, loyalty cards, discount coupons, customer

complaint calls, plus (public) lifestyle studies

 Target marketing

 Find clusters of “model” customers who share the same characteristics:

interest, income level, spending habits, etc.

 Determine customer purchasing patterns over time

 Cross-market analysis
10

 Associations/co-relations between product sales, & prediction based on

such association
REAL EXAMPLE FROM THE NBA
 Play-by-play information recorded by teams
 Who is on the court
 Who shoots

CS490D
 Results

 Coaches want to know what works best

 Plays that work well against a given team
 Good/bad player matchups

 Advanced Scout (from IBM Research) is a data

mining tool to answer these questions
11
NEW TRENDS IN MARKETING: TARGET
CUSTOMERS, INTEGRATE DIFFERENT RESOURCES

25/05/2022
Dr: HammaD AfzaL - Data Mining
12
FRAUD DETECTION & MINING
UNUSUAL PATTERNS
 Clustering & model construction for frauds, Outlier analysis

 Applications: Health care, retail, credit card service, telecomm.

 Money laundering: suspicious monetary transactions

CS490D
 Medical Insurance
 Professional patients, Ring of doctors, and Ring of
references

13
FRAUD DETECTION & MINING
UNUSUAL PATTERNS
 Clustering & model construction for frauds, Outlier analysis

 Banking Industry
 Fraudulent transactions

CS490D
 Retail industry
 Analysts estimate that 38% of retail shrink is due to
dishonest employees

 Anti-terrorism

14
KNOWLEDGE DISCOVERY (KDD) PROCESS

 This is a view from typical database systems

and data warehousing communities
Pattern Evaluation

Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration
15

Databases
STEPS OF A KDD PROCESS

 Data cleaning and preprocessing: (may take 60% of effort!)

 To remove noise and inconsistent data.

 Data Integration:
 Mulyiple Data sources may be combined.

 Data Selection:
 Where data relevant to analysis task are retrieved.

 Data reduction and transformation

 Find useful features, dimensionality/variable reduction, invariant
representation.
16
STEPS OF A KDD PROCESS
 Data mining: search for patterns of interest
 Choosing functions of data mining
 Summarization, classification, regression, association, clustering.

 Pattern evaluation and knowledge presentation

 Visualization, Removing redundant patterns, etc.

17
DATA MINING IN BUSINESS
INTELLIGENCE
Increasing potential
to support End User
business decisions Decision
Making

Data Presentation Business

Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses

DBA
Data Sources 18
Paper, Files, Web documents, Scientific experiments, Database Systems
DATA MINING: ON WHAT KINDS
OF DATA?
 Database-oriented data sets and applications
 Relational database, data warehouse, transactional database

 Advanced data sets and advanced applications

 Data streams and sensor data
 Time-series data, temporal data, sequence data (incl. bio-sequences)
 Structure data, graphs, social networks and multi-linked data
 Spatial data and spatiotemporal data
 Multimedia database
 Text databases
 The World-Wide Web
19
For Details: See Book
DATA MINING FUNCTION: (1)
GENERALIZATION
 Class/Concept Description: Characterization and
Discrimination

 Data mining can be used to describe individual classes and

concepts in summarized, precized form.

 Two techniques used for this purpose: Characterization and

discrimination

20
DATA MINING FUNCTION: (1)
GENERALIZATION
 Data Characterization:
 Summarization of general characteristics or features of
target class of data.
 Data collected by query.
 Output can be pie charts, bar charts, data cubes.

 Data Discrimination:
 Comparison of general features of target class with other
classes.
21
DATA MINING FUNCTION: (2) ASSOCIATION
ANALYSIS
 Frequent patterns (or frequent itemsets).
 Patterns that appear frequently in data.

 Many Kind of patterns, i.e. frequent itemsets, frequent sequences

 Association,
 A typical association rule
 Buys(X,computer) -> buys (X, Software)
 Computer Software [0.5%, 75%] (support, confidence)

 Strength of rule measured through support and confidence

22
DATA MINING FUNCTION: (3)
CLASSIFICATION
 Classification : A process that describes and distinguishes data
classes or concepts.
 Construct models (functions) based on some training examples
 Describe and distinguish classes or concepts for future prediction
 Predict some unknown class labels

 Typical methods
 Decision trees, naïve Bayesian classification, support vector machines,
neural networks, rule-based classification, pattern-based classification,
logistic regression, …

23
DATA MINING FUNCTION: (3) CLASSIFICATION

 Typical applications:
 Credit card fraud detection, direct marketing, classifying stars,
diseases, web-pages, …

24
DATA MINING FUNCTION: (3) REGRESSION

 Similar to classification,
 but is applied on ordered data (often numeric data).
 Usually in the form:
 Y = mx + c.
 Where Y and X are variables.

 Example: Geological surveys

25
DATA MINING FUNCTION: (4)
CLUSTER ANALYSIS
 Unsupervised learning (i.e., Class label is unknown)

 Group data to form new categories (i.e., clusters), e.g., cluster

houses to find distribution patterns

 Principle: Maximizing intra-class similarity & minimizing

interclass similarity

26
DATA MINING FUNCTION: (5)
OUTLIER ANALYSIS
 Outlier analysis
 Outlier: A data object that does not comply with the general behavior of the
data

 Noise or exception? ― One person’s garbage could be another person’s

treasure

 Methods: by product of clustering.

 Useful in fraud detection, rare events analysis

27
COURTESY

 Slides are prepared using material from Website of

5/25/22
 Jiawei Han, Micheline Kamber, and Jian Pei
University of Illinois at Urbana-Champaign & Simon Fraser University.

Data Mining: Concepts and Techniques



 Course Slides: Infolabs Stanford University

 Course Slides: Purdue University

Getting Started With Cisco Design Thinking v0.9
No ratings yet
Getting Started With Cisco Design Thinking v0.9
86 pages
Online Recruitment System
0% (1)
Online Recruitment System
42 pages
Topic10 - Data Mining
No ratings yet
Topic10 - Data Mining
29 pages
Chap 1
No ratings yet
Chap 1
45 pages
Unit 3.1
No ratings yet
Unit 3.1
23 pages
01 Intro
No ratings yet
01 Intro
40 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Introduction To Data Mining 1604
No ratings yet
Introduction To Data Mining 1604
32 pages
Lecture_01_11jan
No ratings yet
Lecture_01_11jan
29 pages
01Intro.pptx
No ratings yet
01Intro.pptx
40 pages
Data Mining Overview
No ratings yet
Data Mining Overview
14 pages
01Intro
No ratings yet
01Intro
28 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
01 Intro
No ratings yet
01 Intro
29 pages
Ch1 Data Mining New
No ratings yet
Ch1 Data Mining New
35 pages
01Intro1
No ratings yet
01Intro1
33 pages
Introduction
No ratings yet
Introduction
27 pages
Data Warehousing Data Mining Lecture Notes On UNIT 1
No ratings yet
Data Warehousing Data Mining Lecture Notes On UNIT 1
22 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
01 Intro
No ratings yet
01 Intro
22 pages
VIPDMTheoryChapter1
No ratings yet
VIPDMTheoryChapter1
25 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
Cse5243 Intro. To Data Mining: Chapter 1. Introduction
No ratings yet
Cse5243 Intro. To Data Mining: Chapter 1. Introduction
56 pages
01 Intro
No ratings yet
01 Intro
35 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
28 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
27 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Week 02 PDF
No ratings yet
Week 02 PDF
39 pages
ICS 2408 Lecture 1 Introduction
No ratings yet
ICS 2408 Lecture 1 Introduction
32 pages
Data Mining: Business Intelligence
No ratings yet
Data Mining: Business Intelligence
68 pages
01Intro (1)
No ratings yet
01Intro (1)
40 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
20 pages
Unit 1
No ratings yet
Unit 1
95 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
15 pages
01Intro
No ratings yet
01Intro
41 pages
01intro (Autosaved)
No ratings yet
01intro (Autosaved)
43 pages
Unit 1
No ratings yet
Unit 1
95 pages
Lecture 1. Introduction
No ratings yet
Lecture 1. Introduction
42 pages
DWDM 3rd Edition Text Book Slides
No ratings yet
DWDM 3rd Edition Text Book Slides
938 pages
Module 1
No ratings yet
Module 1
40 pages
1 01intro, 2data (Except2 3), 3preprocessing
No ratings yet
1 01intro, 2data (Except2 3), 3preprocessing
169 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Data Mining
No ratings yet
Data Mining
29 pages
LECTURE 1 data mining
No ratings yet
LECTURE 1 data mining
41 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
41 pages
Data Mining
No ratings yet
Data Mining
13 pages
Internal
No ratings yet
Internal
267 pages
April 25, 2019 Data Mining: Concepts and Techniques
No ratings yet
April 25, 2019 Data Mining: Concepts and Techniques
21 pages
IS414: Data Mining: DR - Waleed M.Ead
No ratings yet
IS414: Data Mining: DR - Waleed M.Ead
36 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
0 Introduction
No ratings yet
0 Introduction
43 pages
data mining 1
No ratings yet
data mining 1
39 pages
01 Intro 1
No ratings yet
01 Intro 1
50 pages
Introduction
No ratings yet
Introduction
46 pages
_01Intro_edited_v1
No ratings yet
_01Intro_edited_v1
42 pages
DM Introduction-SSM
No ratings yet
DM Introduction-SSM
6 pages
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
Data Mining Concept (MMU)
No ratings yet
Data Mining Concept (MMU)
38 pages
Big Data Analytics for Human-Computer Interactions: A New Era of Computation
From Everand
Big Data Analytics for Human-Computer Interactions: A New Era of Computation
Kuldeep Singh Kaswan
No ratings yet
List of SAP Modules: Sap-Fi Sap-Co
No ratings yet
List of SAP Modules: Sap-Fi Sap-Co
2 pages
Web Directory List
0% (1)
Web Directory List
196 pages
Irjte
No ratings yet
Irjte
10 pages
Joan B. Duran: Salisay Dagupan City Mobile No.:0943-0009771/0928-7145866 Job Objective
No ratings yet
Joan B. Duran: Salisay Dagupan City Mobile No.:0943-0009771/0928-7145866 Job Objective
4 pages
Asynchronous Sequential Circuits
No ratings yet
Asynchronous Sequential Circuits
168 pages
Memletics Effective Speed Reading Course (2004)
100% (1)
Memletics Effective Speed Reading Course (2004)
110 pages
Tutorial - 1 - Introduction To Active HDL - Creating and Simulating Simple Schematics
No ratings yet
Tutorial - 1 - Introduction To Active HDL - Creating and Simulating Simple Schematics
30 pages
ExcelWays MS-Excel MCQ Set 1
No ratings yet
ExcelWays MS-Excel MCQ Set 1
22 pages
CS ERD (Notes) - 2
No ratings yet
CS ERD (Notes) - 2
4 pages
New DBMS Lab - Course Plan 2014
No ratings yet
New DBMS Lab - Course Plan 2014
16 pages
Venkata Reddy .M: Career Objective
No ratings yet
Venkata Reddy .M: Career Objective
4 pages
Competitive Advantage
100% (1)
Competitive Advantage
10 pages
Sijin Narayanan Resume SCM
No ratings yet
Sijin Narayanan Resume SCM
4 pages
2 - Project File (By - Nikhil Bandari)
No ratings yet
2 - Project File (By - Nikhil Bandari)
65 pages
120FF51A Installation Guide For SAP Solutions PDF
100% (1)
120FF51A Installation Guide For SAP Solutions PDF
234 pages
Virtual Classroom Documentation
100% (3)
Virtual Classroom Documentation
8 pages
EDE Practical 3-1
No ratings yet
EDE Practical 3-1
3 pages
MANUAL DO LEITOR DE CODEGO DE BARRAS
No ratings yet
MANUAL DO LEITOR DE CODEGO DE BARRAS
34 pages
How To Install SSH For SCO OpenServer 5
No ratings yet
How To Install SSH For SCO OpenServer 5
4 pages
MSC 214
No ratings yet
MSC 214
5 pages
Laravel How To Upload Multiple Files in Laravel 5
No ratings yet
Laravel How To Upload Multiple Files in Laravel 5
9 pages
Aiot Cho
No ratings yet
Aiot Cho
26 pages
Display Alert Message in Android
No ratings yet
Display Alert Message in Android
2 pages
Migrating-to-SharePoint Online
No ratings yet
Migrating-to-SharePoint Online
26 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Abdul Raheem
No ratings yet
Abdul Raheem
3 pages
Technical Support - Welcome To Huawei
No ratings yet
Technical Support - Welcome To Huawei
2 pages
Louis Tomlinson - Google Search
No ratings yet
Louis Tomlinson - Google Search
9 pages