0% found this document useful (0 votes)
10 views

08 Data Mining Application

Uploaded by

josephnyangau45
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

08 Data Mining Application

Uploaded by

josephnyangau45
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 19

08 DATA MINING

Application
Data Mining Applications
• Data mining: A young discipline with broad and diverse
applications
– There still exists a nontrivial gap between generic data mining
methods and effective and scalable data mining tools for domain-
specific applications
• Some application domains (briefly discussed here)
– Data Mining for Financial data analysis
– Data Mining for Retail and Telecommunication Industries
– Data Mining in Science and Engineering
– Data Mining for Intrusion Detection and Prevention
– Data Mining and Recommender Systems
Data Mining for Financial
Data Analysis (I)
• Financial data collected in banks and financial institutions
are often relatively complete, reliable, and of high quality
• Design and construction of data warehouses for
multidimensional data analysis and data mining
– View the debt and revenue changes by month, by region, by sector,
and by other factors
– Access statistical information such as max, min, total, average,
trend, etc.
• Loan payment prediction/consumer credit policy analysis
– feature selection and attribute relevance ranking
– Loan payment performance
– Consumer credit rating
Data Mining for Financial Data Analysis (II)

• Classification and clustering of customers for targeted


marketing
– multidimensional segmentation by nearest-neighbor,
classification, decision trees, etc. to identify customer
groups or associate a new customer to an appropriate
customer group
• Detection of money laundering and other financial crimes
– integration of from multiple DBs (e.g., bank transactions,
federal/state crime history DBs)
– Tools: data visualization, linkage analysis, classification,
clustering tools, outlier analysis, and sequential pattern
analysis tools (find unusual access sequences)
4
Data Mining for Retail & Telcomm. Industries (I)

• Retail industry: huge amounts of data on sales, customer


shopping history, e-commerce, etc.
• Applications of retail data mining
– Identify customer buying behaviors
– Discover customer shopping patterns and trends
– Improve the quality of customer service
– Achieve better customer retention and satisfaction
– Enhance goods consumption ratios
– Design more effective goods transportation and distribution
policies
• Telcomm. and many other industries: Share many similar
5
goals and expectations of retail data mining
Data Mining Practice for Retail Industry
• Design and construction of data warehouses
• Multidimensional analysis of sales, customers, products, time, and
region
• Analysis of the effectiveness of sales campaigns
• Customer retention: Analysis of customer loyalty
– Use customer loyalty card information to register sequences of
purchases of particular customers
– Use sequential pattern mining to investigate changes in customer
consumption or loyalty
– Suggest adjustments on the pricing and variety of goods
• Product recommendation and cross-reference of items
• Fraudulent analysis and the identification of usual patterns
6
• Use of visualization tools in data analysis
Data Mining in Science and Engineering
• Data warehouses and data preprocessing
– Resolving inconsistencies or incompatible data collected in diverse
environments and different periods (e.g. eco-system studies)
• Mining complex data types
– Spatiotemporal, biological, diverse semantics and relationships
• Graph-based and network-based mining
– Links, relationships, data flow, etc.
• Visualization tools and domain-specific knowledge
• Other issues
– Data mining in social sciences and social studies: text and social
media
– Data mining in computer science: monitoring systems, software
7
bugs, network intrusion
Data Mining for Intrusion Detection and
Prevention
• Majority of intrusion detection and prevention systems use
– Signature-based detection: use signatures, attack patterns that are
preconfigured and predetermined by domain experts
– Anomaly-based detection: build profiles (models of normal
behavior) and detect those that are substantially deviate from the
profiles
• What data mining can help
– New data mining algorithms for intrusion detection
– Association, correlation, and discriminative pattern analysis help
select and build discriminative classifiers
– Analysis of stream data: outlier detection, clustering, model
shifting
8 – Distributed data mining
– Visualization and querying tools
Data Mining and Recommender Systems

• Recommender systems: Personalization, making product


recommendations that are likely to be of interest to a user
• Approaches: Content-based, collaborative, or their hybrid
– Content-based: Recommends items that are similar to items the
user preferred or queried in the past
– Collaborative filtering: Consider a user's social environment,
opinions of other customers who have similar tastes or preferences
• Data mining and recommender systems
– Users C × items S: extract from known to unknown ratings to
predict user-item combinations
– Memory-based method often uses k-nearest neighbor approach
– Model-based method uses a collection of ratings to learn a model
(e.g., probabilistic models, clustering, Bayesian networks, etc.)
9 – Hybrid approaches integrate both to improve performance (e.g.,
using ensemble)
Data Mining Trends and Research
Frontiers
• Mining Complex Types of Data
• Other Methodologies of Data Mining
• Data Mining Applications
• Data Mining and Society
• Data Mining Trends
• Summary

10
Ubiquitous and Invisible Data Mining

• Ubiquitous Data Mining


– Data mining is used everywhere, e.g., online shopping
– Ex. Customer relationship management (CRM)
• Invisible Data Mining
– Invisible: Data mining functions are built in daily life operations
– Ex. Google search: Users may be unaware that they are examining
results returned by data
– Invisible data mining is highly desirable
– Invisible mining needs to consider efficiency and scalability, user
interaction, incorporation of background knowledge and
visualization techniques, finding interesting patterns, real-time, …
– Further work: Integration of data mining into existing business and
11 scientific technologies to provide domain-specific data mining
tools
Privacy, Security and Social Impacts of Data
Mining
• Many data mining applications do not touch personal data
– E.g., meteorology, astronomy, geography, geology, biology, and other
scientific and engineering data
• Many DM studies are on developing scalable algorithms to find general or
statistically significant patterns, not touching individuals
• The real privacy concern: unconstrained access of individual records,
especially privacy-sensitive information
• Method 1: Removing sensitive IDs associated with the data
• Method 2: Data security-enhancing methods
– Multi-level security model: permit to access to only authorized level
– Encryption: e.g., blind signatures, biometric encryption, and
anonymous databases (personal information is encrypted and stored at
different locations)
– Intrusion detection is another active area of research
•12 Method 3: Privacy-preserving data mining methods
Privacy-Preserving Data Mining
• Privacy-preserving (privacy-enhanced or privacy-sensitive) mining:
– Obtaining valid mining results without disclosing the underlying
sensitive data values
– Often needs trade-off between information loss and privacy
• Privacy-preserving data mining methods:
– Randomization (e.g., perturbation): Add noise to the data in order to
mask some attribute values of records
– K-anonymity and l-diversity: Alter individual records so that they
cannot be uniquely identified
• k-anonymity: Any given record maps onto at least k other records
• l-diversity: enforcing intra-group diversity of sensitive values
– Distributed privacy preservation: Data partitioned and distributed
either horizontally, vertically, or a combination of both
– Downgrading the effectiveness of data mining: The output of data
mining may violate privacy
13 • Modify data or mining results, e.g., hiding some association rules or slightly
distorting some classification models
The Classification of Protection
Procedures
• Privacy-Preserving Data Mining and Statistical Disclosure Control (SDC)
are two related fields with a similar interest on ensuring data privacy.
Their goal is to avoid the disclosure of sensitive or proprietary information
to third parties.
• Data-driven or general purpose protection procedures. In this case, no
specific analysis or usage is foreseen for the data. The data owner does not
know what kind of analysis will be performed by the third party.
• Computation-driven or specific purpose protection procedures. In this case
it is known beforehand which type of analysis has to be applied to the
data. As the data uses are known, protection procedures are defined
according to the intended subsequent computation.
• Results-driven protection procedures. In this case, privacy concerns to the
result of applying a particular data mining method to some particular data
Data Mining Trends and Research
Frontiers
• Mining Complex Types of Data
• Other Methodologies of Data Mining
• Data Mining Applications
• Data Mining and Society
• Data Mining Trends
• Summary
15
Trends of Data Mining
• Application exploration: Dealing with application-specific problems
• Scalable and interactive data mining methods
• Integration of data mining with Web search engines, database systems,
data warehouse systems and cloud computing systems
• Mining social and information networks
• Mining spatiotemporal, moving objects and cyber-physical systems
• Mining multimedia, text and web data
• Mining biological and biomedical data
• Data mining with software engineering and system engineering
• Visual and audio data mining
• Distributed data mining and real-time data stream mining
• 16 Privacy protection and information security in data mining
Data Mining Trends and
Research Frontiers
• Mining Complex Types of Data
• Other Methodologies of Data Mining
• Data Mining Applications
• Data Mining and Society
• Data Mining Trends
• Summary
17
Summary
• We present a high-level overview of mining complex data types
• Statistical data mining methods, such as regression, generalized linear
models, analysis of variance, etc., are popularly adopted
• Researchers also try to build theoretical foundations for data mining
• Visual/audio data mining has been popular and effective
• Application-based mining integrates domain-specific knowledge with
data analysis techniques and provide mission-specific solutions
• Ubiquitous data mining and invisible data mining are penetrating our
data lives
• Privacy and data security are importance issues in data mining, and
privacy-preserving data mining has been developed recently
• 18 Our discussion on trends in data mining shows that data mining is a
promising, young field, with great, strategic importance
END

You might also like