III-IT-Data Mining Unit 1-Session 2-Part2
III-IT-Data Mining Unit 1-Session 2-Part2
2
Data Mining
Unit I – INTRODUCTION
• Introduction- Different Kinds of Data
• Patterns Mined –Applications
• Attribute Types
• Data Preprocessing: Data Cleaning
• Data Integration
• Data Reduction
• Data Transformation
• Data Discretization
• Data Visualization
Data Mining 3
Why Data Mining?—Potential
Applications
• Data analysis and decision support
• Market analysis and management
• Target marketing, customer relationship management (CRM),
market basket analysis, market segmentation
Data Mining 4
Why Data Mining?—Potential Applications
• Other Applications
• Text mining (news group, email, documents) and Web
mining
• Stream data mining
• Bioinformatics and bio-data analysis
Data Mining 5
Market Analysis and Management
Data Mining 6
Market Analysis and Management
• Cross-market analysis
• Associations/co-relations between product sales, &
prediction based on such association
• Customer profiling
• What types of customers buy what products
Data Mining 7
Fraud Detection & Mining Unusual
Patterns
• Approaches: Clustering & model construction for frauds, outlier
analysis
• Applications: Health care, retail, credit card service,
telecomm.
• Medical insurance
• Professional patients, and ring of doctors
• Unnecessary or correlated screening tests
• Telecommunications:
• Phone call model: destination of the call, duration, time of day or
week. Analyze patterns that deviate from an expected norm
• Retail industry
• Analysts estimate that 38% of retail shrink is due to dishonest
employees
Data Mining 8
Other Applications
Data Mining 9
Major Issues in Data Mining
• Mining methodology and User interaction
• Mining different kinds of knowledge
• DM should cover a wide spectrum of data analysis and knowledge discovery tasks
• Enable to use the database in different ways
• Require the development of numerous data mining techniques
• Interactive mining of knowledge at multiple levels of abstraction
• Difficult to know exactly what will be discovered
• Allow users to focus the search, refine data mining requests
• Incorporation of background knowledge
• Guide the discovery process
• Allow discovered patterns to be expressed in concise terms and different levels of
abstraction
• Data mining query languages and ad hoc data mining
• High-level query languages need to be developed
• Should be integrated with a DB/DW query language
Data Mining 10
Major Issues in Data Mining
• Presentation and visualization of results
• Knowledge should be easily understood and directly usable
• High level languages, visual representations or other expressive
forms
• Require the DM system to adopt the above techniques
• Handling noisy or incomplete data
• Require data cleaning methods and data analysis methods that can
handle noise
• Pattern evaluation – the interestingness problem
• How to develop techniques to access the interestingness of
discovered patterns, especially with subjective measures bases on
user beliefs or expectations
Data Mining 11
Major Issues in Data Mining
• Performance Issues
• Efficiency and scalability
• Huge amount of data
• Running time must be predictable and acceptable
• Parallel, distributed and incremental mining algorithms
• Divide the data into partitions and processed in parallel
• Incorporate database updates without having to mine the entire data again from
scratch
Data Mining 12
Summary
• Applications
• Major Issues
Data Mining 13
Reference
1. Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining:
Concepts and Techniques”, 3rd Edition, Elsevier, 2014.
2. Jure Leskovec, Anand Rajaraman, Jeffery David
Ullman, “Mining of Massive Datasets”, 2nd Edition,
Cambridge University Press, 2014.
3. Ian H.Witten, Eibe Frank, Mark A.Hall, “Data Mining:
Practical Machine Learning Tools and Techniques”, 3rd
Edition, Elsevier, 2011.
Data Mining 14
Thank you
Data Mining 15