0% found this document useful (0 votes)
17 views

III-IT-Data Mining Unit 1-Session 2-Part2

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

III-IT-Data Mining Unit 1-Session 2-Part2

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Mining

Unit 1-Session 2-Part 2


CO1: Identify the types of data to be pre-processed for
the given dataset using the preprocessing
technique.
LO1.1: Describe about Data mining and its
functionalities
SO1.1.4: Describe the various technologies related to
data mining and different applications of data
mining.

2
Data Mining
Unit I – INTRODUCTION
• Introduction- Different Kinds of Data
• Patterns Mined –Applications
• Attribute Types
• Data Preprocessing: Data Cleaning
• Data Integration
• Data Reduction
• Data Transformation
• Data Discretization
• Data Visualization

Data Mining 3
Why Data Mining?—Potential
Applications
• Data analysis and decision support
• Market analysis and management
• Target marketing, customer relationship management (CRM),
market basket analysis, market segmentation

• Risk analysis and management


• Forecasting, customer retention, quality control, competitive
analysis

• Fraud detection and detection of unusual patterns


(outliers)

Data Mining 4
Why Data Mining?—Potential Applications

• Other Applications
• Text mining (news group, email, documents) and Web
mining
• Stream data mining
• Bioinformatics and bio-data analysis

Data Mining 5
Market Analysis and Management

• Where does the data come from?


• Credit card transactions, discount coupons, customer
complaint calls
• Target marketing
• Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits,
etc.
• Determine customer purchasing patterns over time

Data Mining 6
Market Analysis and Management

• Cross-market analysis
• Associations/co-relations between product sales, &
prediction based on such association
• Customer profiling
• What types of customers buy what products

• Customer requirement analysis


• Identifying the best products for different customers
• Predict what factors will attract new customers

Data Mining 7
Fraud Detection & Mining Unusual
Patterns
• Approaches: Clustering & model construction for frauds, outlier
analysis
• Applications: Health care, retail, credit card service,
telecomm.
• Medical insurance
• Professional patients, and ring of doctors
• Unnecessary or correlated screening tests
• Telecommunications:
• Phone call model: destination of the call, duration, time of day or
week. Analyze patterns that deviate from an expected norm
• Retail industry
• Analysts estimate that 38% of retail shrink is due to dishonest
employees
Data Mining 8
Other Applications

• Internet Web Surf-Aid


• IBM Surf-Aid applies data mining algorithms to Web
access logs for market-related pages to discover
customer preference and behavior pages, analyzing
effectiveness of Web marketing, improving Web site
organization, etc.

Data Mining 9
Major Issues in Data Mining
• Mining methodology and User interaction
• Mining different kinds of knowledge
• DM should cover a wide spectrum of data analysis and knowledge discovery tasks
• Enable to use the database in different ways
• Require the development of numerous data mining techniques
• Interactive mining of knowledge at multiple levels of abstraction
• Difficult to know exactly what will be discovered
• Allow users to focus the search, refine data mining requests
• Incorporation of background knowledge
• Guide the discovery process
• Allow discovered patterns to be expressed in concise terms and different levels of
abstraction
• Data mining query languages and ad hoc data mining
• High-level query languages need to be developed
• Should be integrated with a DB/DW query language

Data Mining 10
Major Issues in Data Mining
• Presentation and visualization of results
• Knowledge should be easily understood and directly usable
• High level languages, visual representations or other expressive
forms
• Require the DM system to adopt the above techniques
• Handling noisy or incomplete data
• Require data cleaning methods and data analysis methods that can
handle noise
• Pattern evaluation – the interestingness problem
• How to develop techniques to access the interestingness of
discovered patterns, especially with subjective measures bases on
user beliefs or expectations

Data Mining 11
Major Issues in Data Mining
• Performance Issues
• Efficiency and scalability
• Huge amount of data
• Running time must be predictable and acceptable
• Parallel, distributed and incremental mining algorithms
• Divide the data into partitions and processed in parallel
• Incorporate database updates without having to mine the entire data again from
scratch

• Diversity of Database Types


• Other database that contain complex data objects, multimedia data,
spatial data, etc.
• Expect to have different DM systems for different kinds of data
• Heterogeneous databases and global information systems
• Web mining becomes a very challenging and fast-evolving field in data mining

Data Mining 12
Summary
• Applications
• Major Issues

Data Mining 13
Reference
1. Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining:
Concepts and Techniques”, 3rd Edition, Elsevier, 2014.
2. Jure Leskovec, Anand Rajaraman, Jeffery David
Ullman, “Mining of Massive Datasets”, 2nd Edition,
Cambridge University Press, 2014.
3. Ian H.Witten, Eibe Frank, Mark A.Hall, “Data Mining:
Practical Machine Learning Tools and Techniques”, 3rd
Edition, Elsevier, 2011.

Data Mining 14
Thank you

Data Mining 15

You might also like