0% found this document useful (0 votes)
5 views

III-IT-Data Mining Unit 1-Session 2-Part1

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

III-IT-Data Mining Unit 1-Session 2-Part1

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Data Mining

Unit 1-Session 2-Part 1


CO1: Identify the types of data to be pre-processed for
the given dataset using the preprocessing
technique.
LO1.1: Describe about Data mining and its
functionalities
SO1.1.3: Explain kinds of patterns that can be mined in
data mining.

2
Data Mining
Unit I – INTRODUCTION
• Introduction- Different Kinds of Data
• Patterns Mined –Applications
• Attribute Types
• Data Preprocessing: Data Cleaning
• Data Integration
• Data Reduction
• Data Transformation
• Data Discretization
• Data Visualization

Data Mining 3
Kinds of Patterns Mined
(Data Mining Functionalities)
• Data mining functionalities are used to specify
the kind of patterns to be found in data mining
tasks.

• Data mining tasks can be classified into two


categories:
• Descriptive : Characterize the general properties of
the data
• Predictive : Perform inference on the current data to
make predictions

Data Mining 4
What to extract?
• Users may not have an idea about what kinds of
patterns in their data can be interesting

Data Mining 5
What to do?
• Have a data mining system that can mine
multiple types of patterns to handle different
user and application needs.
• Discover patterns at various granularities (levels
of abstraction).
• Allow users to guide the search for interesting
patterns.

Data Mining 6
Data Mining Functionalities
- What kinds of patterns can be mined?
• Concept/Class Description: Characterization and Discrimination
• Data can be associated with classes or concepts.
• E.g. classes of items – computers, printers, …
concepts of customers – bigSpenders, budgetSpenders, …
• How to describe these items or concepts?
• Descriptions can be derived via
• Data characterization – summarizing the general characteristics of a
target class of data.
• E.g. summarizing the characteristics of customers who spend more than $1,000 a year
at AllElectronics. Result can be a general profile of the customers, such as 40 – 50 years old,
employed, have excellent credit ratings.

Data Mining 7
Data Mining Functionalities
- What kinds of patterns can be mined?
• Data discrimination – comparing the target class with one or a set of
comparative classes
• E.g. Compare the general features of software products whole sales increase by 10% in the
last year with those whose sales decrease by 30% during the same period

• Or both of the above

• Mining Frequent Patterns, Associations and


Correlations
• Frequent itemset: a set of items that frequently appear
together in a transactional data set (e.g. milk and bread)
• Frequent subsequence: a pattern that customers tend to purchase
product A, followed by a purchase of product B

Data Mining 8
Data Mining Functionalities
- What kinds of patterns can be mined?
• Association Analysis: find frequent patterns
• E.g. a sample analysis result – an association rule:
buys(X, “computer”) => buys(X, “software”) [support = 1%, confidence = 50%]
(if a customer buys a computer, there is a 50% chance that she will buy software. 1% of
all of the transactions under analysis showed that computer and software
are purchased together. )
• Associations rules are discarded as uninteresting if they do not satisfy both a
minimum support threshold and a minimum confidence threshold.
• Correlation Analysis: additional analysis to find statistical correlations
between associated pairs

Data Mining 9
Data Mining Functionalities
- What kinds of patterns can be mined?
• Classification and Prediction
• Classification
• The process of finding a model that describes and distinguishes the data classes or
concepts, for the purpose of being able to use the model to predict the class of
objects whose class label is unknown.
• The derived model is based on the analysis of a set of training data (data objects
whose class label is known).
• The model can be represented in classification (IF-THEN) rules, decision trees,
neural networks, etc.
• Prediction
• Predict missing or unavailable numerical data values

Data Mining 10
Data Mining Functionalities
- What kinds of patterns can be mined?

Data Mining 11
Data Mining Functionalities
- What kinds of patterns can be mined?
• Cluster Analysis
• Class label is unknown: group data to form new classes
• Clusters of objects are formed based on the principle of maximizing
intra-class similarity & minimizing interclass similarity
• E.g. Identify homogeneous subpopulations of customers. These clusters may
represent individual target groups for marketing.

Data Mining 12
Data Mining Functionalities
- What kinds of patterns can be mined?
• Outlier Analysis
• Data that do no comply with the general behavior or model.
• Outliers are usually discarded as noise or exceptions.
• Useful for fraud detection.
• E.g. Detect purchases of extremely large amounts

• Evolution Analysis
• Describes and models regularities or trends for objects whose
behavior changes over time.
• E.g. Identify stock evolution regularities for overall stocks and for the stocks of
particular companies.

Data Mining 13
Technologies Used

Data Mining 14
Summary
• Kinds of Patterns Mined
• Technologies Used

Data Mining 15
Reference
1. Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining:
Concepts and Techniques”, 3rd Edition, Elsevier, 2014.
2. Jure Leskovec, Anand Rajaraman, Jeffery David
Ullman, “Mining of Massive Datasets”, 2nd Edition,
Cambridge University Press, 2014.
3. Ian H.Witten, Eibe Frank, Mark A.Hall, “Data Mining:
Practical Machine Learning Tools and Techniques”, 3rd
Edition, Elsevier, 2011.

Data Mining 16
Thank you

Data Mining 17

You might also like