0% found this document useful (0 votes)
41 views

Intro Data Mining

The document provides an introduction to data mining concepts and techniques. It discusses why data mining is needed due to the explosive growth of data. It defines data mining as the extraction of interesting and potentially useful patterns from large amounts of data. The document presents a multi-dimensional view of data mining, covering the types of data, knowledge, techniques and applications involved. It also describes some common data mining functions like association analysis, classification, cluster analysis and outlier analysis.

Uploaded by

Nafiz Islam
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Intro Data Mining

The document provides an introduction to data mining concepts and techniques. It discusses why data mining is needed due to the explosive growth of data. It defines data mining as the extraction of interesting and potentially useful patterns from large amounts of data. The document presents a multi-dimensional view of data mining, covering the types of data, knowledge, techniques and applications involved. It also describes some common data mining functions like association analysis, classification, cluster analysis and outlier analysis.

Uploaded by

Nafiz Islam
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

August 19, 2020 Data Mining: Concepts and Techniques 1

Data Mining:
Concepts and Techniques

— Chapter 1 —

2
August 19, 2020 Data Mining: Concepts and Techniques 3
August 19, 2020 Data Mining: Concepts and Techniques 4
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?

5
Why Data Mining?

 The Explosive Growth of Data: from terabytes to petabytes


 Data collection and data availability
 Automated data collection tools, database systems, Web,
computerized society
 Major sources of abundant data
 Business: Web, e-commerce, transactions, stocks, …
 Science: Remote sensing, bioinformatics, scientific simulation, …
 Society and everyone: news, digital cameras, YouTube
 We are drowning in data, but starving for knowledge!
 “Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets

6
Evolution of Database Technology
 1960s:
 Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web
databases
 2000s
 Stream data management and mining
 Data mining and its applications
 Web technology (XML, data integration) and global information systems

8
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?

9
What Is Data Mining?

 Data mining (knowledge discovery from data)


 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data
 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.
 Watch out: Is everything “data mining”?
 Simple search and query processing

10
August 19, 2020 Data Mining: Concepts and Techniques 11
Knowledge Discovery (KDD) Process
 This is a view from typical
database systems and data
warehousing communities Pattern Evaluation
Identify interesting patterns
 Data mining plays an essential role
in the knowledge discovery process
Data Mining

Task-relevant Data

Data Transformation
Selection
To retrieve data relevant to analysis

Data
Warehouse
Data Cleaning
Remove noise & inconsistent data

Data Integration
Combine multiple data Sources

Databases
12
Example: A Web Mining Framework

 Web mining usually involves


 Data cleaning
 Data integration from multiple sources
 Warehousing the data
 Data cube construction
 Data selection for data mining
 Data mining
 Presentation of the mining results
 Patterns and knowledge to be used or stored into
knowledge-base

13
Focuses search towards interesting patterns

Modules for Association & Correlation analysis,


Classification, Prediction, Cluster Analysis
Domain Knowledge-
To guide search

August 19, 2020 Data Mining: Concepts and Techniques 14


Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?

19
Multi-Dimensional View of Data Mining
 Data to be mined
 Database data (extended-relational, object-oriented, heterogeneous,

legacy), data warehouse, transactional data, stream, spatiotemporal,


time-series, sequence, text and web, multi-media, graphs & social
and information networks
 Knowledge to be mined (or: Data mining functions)
 Characterization, discrimination, association, classification,

clustering, trend/deviation, outlier analysis, etc.


 Descriptive vs. predictive data mining

 Techniques utilized
 Data-intensive, data warehouse (OLAP), machine learning, statistics,

pattern recognition, visualization, high-performance, etc.


 Applications adapted
 Retail, telecommunication, banking, fraud analysis, bio-data mining,

stock market analysis, text mining, Web mining, etc.


20
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?

21
Data Mining: On What Kinds of Data?
 Database-oriented data sets and applications
 Relational database, data warehouse, transactional database
 Advanced data sets and advanced applications
 Data streams and sensor data
 Time-series data, temporal data, sequence data (incl. bio-sequences)
 Structure data, graphs, social networks and multi-linked data
 Object-relational databases
 Heterogeneous databases and legacy databases
 Spatial data and spatiotemporal data
 Multimedia database
 Text databases
 The World-Wide Web
Ref Section 1.3– Self Study
22
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?

23
Data Mining Function: Association and
Correlation Analysis
 Frequent patterns (or frequent item sets)
 What items are frequently purchased together in your
Chaldal/Lavender?
 Association, correlation vs. causality
 A typical association rule
 Diaper  Milk [0.5%, 75%] (support, confidence)
 How to mine such patterns and rules efficiently in large
datasets?

25
Correlation Vs Causality

August 19, 2020 Data Mining: Concepts and Techniques 26


Data Mining Function: Classification

 Classification and label prediction (Supervised Learning)


 Construct models (functions) based on some training examples
 Describe and distinguish classes or concepts for future prediction
 E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
 Predict some unknown class labels
 Typical methods
 Decision trees, naïve Bayesian classification, support vector
machines, neural networks, rule-based classification, pattern-
based classification, logistic regression, …
 Typical applications:
 Credit card fraud detection, direct marketing, classifying stars,
diseases, web-pages, …

27
Classification

August 19, 2020 Data Mining: Concepts and Techniques 28


Data Mining Function: Cluster Analysis

 Unsupervised learning (i.e., Class label is unknown)


 Group data to form new categories (i.e., clusters), e.g.,
cluster houses to find distribution patterns
 Principle: Maximizing intra-class similarity & minimizing
interclass similarity
 Many methods and applications

29
Data Mining Function: Cluster
Analysis

August 19, 2020 Data Mining: Concepts and Techniques 30


Data Mining Function: Outlier Analysis
 Outlier analysis
 Outlier: A data object that does not comply with the general
behavior of the data
 Methods: by product of clustering or regression analysis, …
 Useful in fraud detection, rare events analysis

31
Data Mining Function: Outlier
Analysis

August 19, 2020 Data Mining: Concepts and Techniques 32


Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?

36
Data Mining: Confluence of Multiple Disciplines

Machine Pattern Statistics


Learning Recognition

Applications Data Mining Visualization

Algorithm Database High-Performance


Technology Computing

37
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?

39
Applications of Data Mining
 Web page analysis: from web page classification, clustering to
PageRank & HITS algorithms
 Collaborative analysis & recommender systems
 Basket data analysis to targeted marketing
 Biological and medical data analysis: classification, cluster analysis
(microarray data analysis), biological sequence analysis, biological
network analysis
 Major dedicated data mining systems/tools (e.g., SAS, MS SQL-Server
Analysis Manager, Oracle Data Mining Tools).

40

You might also like