0% found this document useful (0 votes)
9 views

Data Mining

Uploaded by

GOURAV GHOSH
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Data Mining

Uploaded by

GOURAV GHOSH
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

DATA WAREHOUSING

AND MINING

TOPIC: INTRODUCTION
TO DATA MINING

BY – GOURAV GHOSH
ROLL NO. - 31154322014
What is Data Mining/KDD

Data mining (knowledge discovery from data)


 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data
What is Data Mining

 By definition is the process of extracting previously


unknown data from large databases and using it to make
orgnisational decisions.
 Is concerned with the discovery of hidden knowledge.
 Usually works on large volumes of data
 Is useful in making critical organisationnal decisions,
particularly those of strategic nature
Data Mining
Data Mining referred using a number of names:
Data Fishing, Data Dredging (1960…):
 Used by statisticians (as bad name)
Knowledge Discovery in Databases (1989…):
 Used by AI, Machine Learning Community
Business Intelligence (1990…):
 Business management term
Also data archaeology, information harvesting, information
discovery, knowledge extraction, data/pattern analysis, etc.
Data Mining: On What Kinds
Of Data?
Relational database
Data warehouse
Transactional database
Advanced database and information repository
 Object-relational database
 Spatial and temporal data
 Time-series data
 Stream data
 Multimedia database
 Text databases & WWW
Data Mining Functionalities

Concept description
 Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet
regions
Association (correlation and causality)
 Nappies & Beer
Classification and Prediction
 Construct models that describe and distinguish classes or concepts for future
prediction
 Predict some unknown or missing numerical values
Data Mining Functionalities

Cluster analysis
 Class label is unknown: Group data to form new classes,
e.g., cluster houses to find distribution patterns
Outlier analysis
 Outlier:a data object that does not comply with the general
behavior of the data
 Noise or exception? No! useful in fraud detection and rare
event analysis
Other pattern-directed or statistical analyses
Data Mining is
Multidisciplinary
Statistics
Pattern Neurocomputing
Recognition

Machine
Data Mining Learning AI

Databases
KDD
Why we Need Data Mining

Data explosion problem


 Automated data collection tools and mature database technology lead to huge
amounts of data accumulated
We are drowning in data, but starving for knowledge!
Solution: Data warehousing and data mining
 Data warehousing and on-line analytical processing
 Mining interesting knowledge (rules, regularities, patterns, constraints) from
data in large databases
Potential Applications

Data analysis and decision support


 Market analysis and management
 Risk analysis and management
 Fraud detection and detection of unusual patterns
Other applications
 Text mining (email, documents) and Web mining
 Stream data mining
 DNA and bio-data analysis
Stages of KDD
Knowledge

Evaluation &
Presentation

Data Mining

Selection &
Transformation
Data
Warehouse

Cleaning &
Integration

Databases
Issues and Challenges of Data
Mining
Data mining methodology
 Mining different kinds of knowledge from diverse data types, e.g., bio, stream,
Web
 Performance: efficiency, effectiveness, and scalability
 Pattern evaluation: the interestingness problem
 Incorporation of background knowledge
 Handling noise and incomplete data
 Parallel, distributed and incremental mining methods
 Integration of the discovered knowledge with existing one: knowledge fusion
Issues and Challenges of Data
Mining
User interaction
 Data mining query languages and ad-hoc mining
 Expression and visualization of resultant knowledge
 Interactive mining of knowledge at multiple levels of abstraction
Applications and social impacts
 Domain-specific data mining & invisible data mining
 Protection of data security, integrity, and privacy
Market Analysis And
Management
Where does the data come from?
 Credit card transactions, loyalty cards, discount coupons, customer complaint
calls, etc
Target marketing
 Find clusters of “model” customers who share the same characteristics
 Determine customer purchasing patterns over time
Cross-market analysis
 Associations/co-relations between product sales, & prediction based on such
association
Market Analysis And
Management (cont…)
Customer profiling
 What types of customers buy what products (clustering or classification)
Customer requirement analysis
 Identifying the best products for different customers
 Predict what factors will attract new customers
Provision of summary information
 Multidimensional summary reports
 Statistical summary information (data central tendency and variation)
Corporate Analysis & Risk
Management
Finance planning and asset evaluation
 Cash flow analysis and prediction
 Contingent claim analysis to evaluate assets
 Cross-sectional and time series analysis (financial-ratio, trend
analysis, etc.)
Resource planning
 Summarize and compare the resources and spending
Competition
 Monitor competitors and market directions
 Group customers into classes and a class-based pricing procedure
 Set pricing strategy in a highly competitive market
Fraud Detection & Mining
Unusual Patterns
Applications: Health care, retail, credit card service, telecommunications
 Auto insurance: ring of collisions
 Money laundering: suspicious monetary transactions
 Medical insurance
 Professional patients, ring of doctors, and ring of
references
 Unnecessary or correlated screening tests
 Telecommunications: phone-call fraud
 Phone call model: destination of the call, duration, time of
day or week. Analyze patterns that deviate from an
expected norm
 Retail industry
 Analysts estimate that 38% of retail shrink is due to
dishonest employees
 Anti-terrorism
THANK YOU

You might also like