0% found this document useful (0 votes)
57 views

Lecture 1

This document provides an overview of data mining. It discusses data mining techniques like classification, clustering, association rule mining and sequential pattern mining. It describes the steps involved in a knowledge discovery process including data selection, cleaning, transformation, mining and evaluation. Examples of large datasets and applications of data mining are also presented. The document outlines the origins, functionalities and process of data mining.

Uploaded by

Subhashini Reddy
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Lecture 1

This document provides an overview of data mining. It discusses data mining techniques like classification, clustering, association rule mining and sequential pattern mining. It describes the steps involved in a knowledge discovery process including data selection, cleaning, transformation, mining and evaluation. Examples of large datasets and applications of data mining are also presented. The document outlines the origins, functionalities and process of data mining.

Uploaded by

Subhashini Reddy
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Data Mining

Data Mining Overview


• Data warehouses and OLAP (On Line Analytical Processing.)
• Association Rules Mining
• Clustering: Hierarchical and Partition approaches
• Classification: Decision Trees and Bayesian classifiers
• Sequential Pattern Mining
• Advanced topics: graph mining, privacy preserving data
mining, outlier detection, spatial data mining
What is Data Mining?
• Data Mining is:
(1) The efficient discovery of previously
unknown, valid, potentially useful,
understandable patterns in large datasets

(2) The analysis of (often large) observational


data sets to find unsuspected relationships
and to summarize the data in novel ways that
are both understandable and useful to the
data owner
Overview of terms

• Data: a set of facts (items) D, usually stored in


a database
• Pattern: an expression E in a language L, that
describes a subset of facts
• Attribute: a field in an item i in D.
• Interestingness: a function ID,L that maps an
expression E in L into a measure space M
Overview of terms
• The Data Mining Task:

For a given dataset D, language of facts L,


interestingness function ID,L and threshold c,
find the expression E such that ID,L(E) > c
efficiently.
Knowledge Discovery
Steps of a KDD Process

• Learning the application domain


– Relevant prior knowledge and goals of application
• Creating a target data set: data selection
• Data cleaning and preprocessing: (may take 60% of effort!)
• Data reduction and transformation
– Find useful features, dimensionality/variable reduction.
• Choosing functions of data mining
– Summarization, classification, regression, association, clustering.
• Choosing the mining algorithm(s)
• Data mining: search for patterns of interest
• Pattern evaluation and knowledge presentation
– Visualization, transformation, removing redundant patterns, etc.
• Use of discovered knowledge

7
Architecture: Typical Data Mining System

Graphical user interface

Pattern evaluation

Data mining engine

Knowledge-base
Database or data
warehouse server
Data cleaning & data integration Filtering

Data
Databases Warehouse
8
Data Mining: On What Kinds of Data?
• Relational database
• Data warehouse
• Transactional database
• Advanced database and information repository
– Spatial and temporal data
– Time-series data
– Stream data
– Multimedia database
– Text databases & WWW

9
Examples of Large Datasets

• Government: IRS, NGA, …


• Large corporations
– WALMART: 20M transactions per day
– MOBIL: 100 TB geological databases
– AT&T 300 M calls per day
– Credit card companies

• Scientific
– NASA, EOS project: 50 GB per hour
– Environmental datasets
Examples of Data mining Applications

1. Fraud detection: credit cards, phone cards


2. Marketing: customer targeting
3. Data Warehousing: Walmart
4. Astronomy
5. Molecular biology
How Data Mining is used

1. Identify the problem


2. Use data mining techniques to transform
the data into information
3. Act on the information
4. Measure the results
The Data Mining Process
1. Understand the domain
2. Create a dataset:
– Select the interesting attributes
– Data cleaning and preprocessing
3. Choose the data mining task and the specific
algorithm
4. Interpret the results, and possibly return to 2
Origins of Data Mining

• Draws ideas from machine learning/AI, pattern


recognition, statistics, and database systems
AI /
• Must address: Statistics
Machine Learning
– Enormity of data
– High dimensionality
Data Mining
of data
– Heterogeneous,
distributed nature Database
of data systems
Data Mining Functionalities

• Concept description: Characterization and discrimination


– Generalize, summarize, and contrast data characteristics
• Association (correlation and causality)
– Diaper à Beer [0.5%, 75%]
• Classification and Prediction
– Construct models (functions) that describe and distinguish classes or
concepts for future prediction
– Presentation: decision-tree, classification rule, neural network

15
Data Mining Functionalities

• Cluster analysis
– Class label is unknown: Group data to form new classes, e.g., cluster
houses to find distribution patterns
– Maximizing intra-class similarity & minimizing interclass similarity
• Outlier analysis
– Outlier: a data object that does not comply with the general behavior of
the data
– Useful in fraud detection, rare events analysis
• Trend and evolution analysis
– Trend and deviation: regression analysis
– Sequential pattern mining, periodicity analysis

16
Data Mining: Confluence of Multiple Disciplines

Database
Statistics
Systems

Machine Data Mining Visualization


Learning

Algorithm Other
Disciplines

17

You might also like