Data Mining
Data Mining
AND MINING
TOPIC: INTRODUCTION
TO DATA MINING
BY – GOURAV GHOSH
ROLL NO. - 31154322014
What is Data Mining/KDD
Concept description
Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet
regions
Association (correlation and causality)
Nappies & Beer
Classification and Prediction
Construct models that describe and distinguish classes or concepts for future
prediction
Predict some unknown or missing numerical values
Data Mining Functionalities
Cluster analysis
Class label is unknown: Group data to form new classes,
e.g., cluster houses to find distribution patterns
Outlier analysis
Outlier:a data object that does not comply with the general
behavior of the data
Noise or exception? No! useful in fraud detection and rare
event analysis
Other pattern-directed or statistical analyses
Data Mining is
Multidisciplinary
Statistics
Pattern Neurocomputing
Recognition
Machine
Data Mining Learning AI
Databases
KDD
Why we Need Data Mining
Evaluation &
Presentation
Data Mining
Selection &
Transformation
Data
Warehouse
Cleaning &
Integration
Databases
Issues and Challenges of Data
Mining
Data mining methodology
Mining different kinds of knowledge from diverse data types, e.g., bio, stream,
Web
Performance: efficiency, effectiveness, and scalability
Pattern evaluation: the interestingness problem
Incorporation of background knowledge
Handling noise and incomplete data
Parallel, distributed and incremental mining methods
Integration of the discovered knowledge with existing one: knowledge fusion
Issues and Challenges of Data
Mining
User interaction
Data mining query languages and ad-hoc mining
Expression and visualization of resultant knowledge
Interactive mining of knowledge at multiple levels of abstraction
Applications and social impacts
Domain-specific data mining & invisible data mining
Protection of data security, integrity, and privacy
Market Analysis And
Management
Where does the data come from?
Credit card transactions, loyalty cards, discount coupons, customer complaint
calls, etc
Target marketing
Find clusters of “model” customers who share the same characteristics
Determine customer purchasing patterns over time
Cross-market analysis
Associations/co-relations between product sales, & prediction based on such
association
Market Analysis And
Management (cont…)
Customer profiling
What types of customers buy what products (clustering or classification)
Customer requirement analysis
Identifying the best products for different customers
Predict what factors will attract new customers
Provision of summary information
Multidimensional summary reports
Statistical summary information (data central tendency and variation)
Corporate Analysis & Risk
Management
Finance planning and asset evaluation
Cash flow analysis and prediction
Contingent claim analysis to evaluate assets
Cross-sectional and time series analysis (financial-ratio, trend
analysis, etc.)
Resource planning
Summarize and compare the resources and spending
Competition
Monitor competitors and market directions
Group customers into classes and a class-based pricing procedure
Set pricing strategy in a highly competitive market
Fraud Detection & Mining
Unusual Patterns
Applications: Health care, retail, credit card service, telecommunications
Auto insurance: ring of collisions
Money laundering: suspicious monetary transactions
Medical insurance
Professional patients, ring of doctors, and ring of
references
Unnecessary or correlated screening tests
Telecommunications: phone-call fraud
Phone call model: destination of the call, duration, time of
day or week. Analyze patterns that deviate from an
expected norm
Retail industry
Analysts estimate that 38% of retail shrink is due to
dishonest employees
Anti-terrorism
THANK YOU