0% found this document useful (0 votes)
35 views26 pages

July 16, 2009 1 Data Mining

This document discusses the evolution of database technology from the 1960s to present day and the rise of data mining. It notes that while data is being collected in large quantities, there is a need to extract useful knowledge from it. The document then outlines the data mining process and some common data mining techniques like classification, clustering, and association rule mining. It also discusses applications of data mining in various industries.

Uploaded by

Kaouther Benali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views26 pages

July 16, 2009 1 Data Mining

This document discusses the evolution of database technology from the 1960s to present day and the rise of data mining. It notes that while data is being collected in large quantities, there is a need to extract useful knowledge from it. The document then outlines the data mining process and some common data mining techniques like classification, clustering, and association rule mining. It also discusses applications of data mining in various industries.

Uploaded by

Kaouther Benali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

27/Sep/2008

Data Mining July 16, 2009 1


Evolution of Database
technology
YEAR PURPOSE
1960’s Network Model, Batch Reports

1970’s Relational data model, Executive information Systems

1980’s Application specific DBMS(spatial data, scientific data,


image data, …)
1990’s Terabyte Data warehouses, Object Oriented, middleware
and web technology
2000’s Business Process

2010’s Sensor DB systems, DBs on embedded systems, large


scale pub/ sub systems
Data Mining July 16, 2009 2
Motivation : Necessity is the
mother of invention
 Data explosion problem

◦ Automated data collection tools and mature database technology


lead to tremendous amounts of data stored in databases, data
warehouses and other information repositories
 We are drowning in data, but starving for knowledge!
 Solution: Data warehousing and data mining

◦ Extraction of interesting knowledge (rules, regularities, patterns,


constraints) from data in large databases

Data Mining July 16, 2009 3


Why Data Mining?

 Data, Data, Data Every where …

 I can’t find data I need – data is


scattered over network

 I can’t get the data I need

 I can’t understand the data I


need

 I can’t use the data I found

Data Mining July 16, 2009 4


 An abundance of data  This data occupies
 Super Market Scanners, POS
data
 Terabytes - 10^12 bytes
 Credit cards transactions
 Call Center records
 Petabytes - 10^15 bytes
 ATM Machines
 Demographic data
 Exabytes - 10^18bytes
 Sensor Networks
 Cameras
 Zettabytes - 10^21bytes
 Web server logs
 Customer web site trails
 Zottabytes-10^24bytes
 Geographic Information System
 National Medical Records  Walmart - 24 Terabytes
 Weather Images

Data Mining July 16, 2009 5


 Process of sorting through large amounts of data and picking
out relevant information

 Process of analyzing data from different perspectives and


summarizing it into useful information

 Discovering hidden value in database

 It is non-trivial process of identifying valid, novel, useful and


understandable patterns in data

 Extracting or mining knowledge from large amounts of data

Data Mining July 16, 2009 6


History Notes – Many Names of Data
Mining

YEAR Names USES

1960 Data Fishing, Data Statisticians


Dredging
1990 Data Mining DB Community, business

1989 Knowledge Discovery AI, Machine Learning community


in databases
Other Names

Data Archaeology, Information Harvesting, Information Discovery,


Knowledge Extraction,

Data Mining July 16, 2009 7


Data Warehousing provides the
Enterprise with a memory

Data Mining provides the


Enterprise with intelligence

July 16, 2009 Data Mining 8


Why Data Mining?(Cont..)

 Data Warehouse is single, complete and consistent store of data from


variety of different sources available to end users

 For example, AT and T handles billions of calls per day. Europe's Very
Long Baseline Interferometer (VLBI) has 16 telescopes, each of which
produces 1 Gigabit/second of astronomical data over a 25-day
observation session

 We need data mining for


 Transforming data into useful information to users
 Present data in useful format
 Provide data access to business analyst, Information technology
professionals

Data Mining July 16, 2009 9


Data Mining Process
 Data Mining is the technique used to carry out KDD.

 Data Mining turns data into information and then to knowledge

Information

Data

Knowledge

Data Mining July 16, 2009 10


Steps in Data Mining
1. Data cleaning
To remove noise and inconsistent data
2. Data integration
To integrate (compile) multiple data
sources
3. Data selection
Data relevant to analysis is selected
4. Data transformation
Summary normalization aggregation operations are performed
(convert data into two dimension form) and consolidate the data

Data Mining July 16, 2009 11


Steps in Data Mining(Cont..)
5. Data mining
Intelligent methods are applied to the data to discover
knowledge or patterns

6. Pattern evaluation
Evaluation of the interesting patterns by thresholding

7. Knowledge Discovery
Visualization and presentation methods are used to present
the mined knowledge to the user.

Data Mining July 16, 2009 12


Pattern Evaluation
◦ Data mining: the core of
knowledge discovery
process. Data Mining

Task-relevant Data

Data Selection
Warehouse
Data Cleaning

Data Integration

Databases
Data Mining July 16, 2009 13
Data Mining Tasks
1. Classification
• Classification maps data into predefined groups or classes.
• It may be represented by methods such as decision trees, etc.

Decision tree
 Flow chart like tree structure
 Each node denotes test of
an attribute value
 Each branch represents
outcome of test
 Leaves represent classes
or class distribution.

Data Mining July 16, 2009 14


2. Regression
Used to map a data item to a real valued prediction variable.
Example. A manager wants to reach a certain level of savings before his
retirement. Periodically he predicts his retirement savings by current value
and several past values. He uses a simple linear regressive formula to
predict the values of savings in future.

3. Prediction
Many real world applications can be seen
predicting future data states based on
past and current data.
Example - Predicting flooding is difficult problem

Data Mining July 16, 2009 15


4. Clustering
Clustering is similar to classification
except that the groups are not predefined.
5. Association Rule
Association refers to uncovering relationship 1998
among data.
Used in retail sales community to identify the items Bread and
(products) that are frequently Jam sell
Zzzz...
purchased together. together!

Data Mining July 16, 2009 16


6. Summarization
Summarization of general characteristics or features of target class of
data.
Data characterization presented in various forms - pie charts, bar
charts, curves.
Data discrimination comparison of general features of target class of
data objects with general features of objects from one or a set of
contrasting classes.
7. Outlier Analysis
Database may contain data objects that do not comply with general
behavior model of data. These data objects are called as outliers.
Data mining methods discard outliers as noise or exceptions.
In applications such as fraud detection, rare events may be more
interesting than regularly occurring events.
Data Mining July 16, 2009 17
Data Mining: Types of Data

 Relational data and transactional data

 Text

 Images, video

 Mixtures of data

Data Mining July 16, 2009 18


Data Mining Products

 DataMind -- neurOagent
 Information Discovery -- IDIS
 SAS Institute -- SAS/Neuronets

19
Data Mining July 16, 2009
Data Mining Software
 RapidMiner and Weka – Defining data mining process

 Top 8 data mining software in 2008

 Angoss software
 Infor CRM Epiphany
 Portrait Software
 SAS
 SPSS
 ThinkAnalytics
 Unica
 Viscovery

Data Mining July 16, 2009 20


Application Areas

Industry Application
Finance Credit Card Analysis
Insurance Fraud Analysis
Telecommunication Call record analysis

July 16, 2009 Data Mining 21


Applications
 Financial Industry, Banks, Businesses, E-commerce
◦ Stock and investment analysis
◦ Identify loyal customers and risky customer
◦ Predict customer spending

 Database analysis and decision support


◦ Market analysis and management
 target marketing, customer relation management, market basket
analysis.
◦ Risk analysis and management
 Forecasting, quality control, competitive analysis
◦ Fraud detection and management

Data Mining July 16, 2009 22


Data Mining in Usage

1. Intelligent Miner
 It is IBM data mining product
 Distinct feature is include scalability of its mining algorithm and tight
integration with IBM DB2 related data base system.

5. DB Miner
 Developed by DBMiner Technologies Inc.
 Distinct features of DBMiner are Data cube based Online Analytical
Mining

Data Mining July 16, 2009 23


The Telecomm Slice
Product

Household

Telecomm ns
gio
e
R
Video Europe
Far East
Audio India

Retail Direct Special Sales Channel

Data Mining July 16, 2009 24


Conclusion
 Data mining: discovering interesting patterns from large amounts of
data
 A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
 Mining can be performed in a variety of information repositories
 Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier etc

Data Mining July 16, 2009 25


Thank you !!!
Data Mining July 16, 2009 26

You might also like