0% found this document useful (0 votes)
20 views

Data Mining

Data mining is a process that uses statistical and machine learning techniques to extract useful information from large databases. It identifies patterns in data that can be used for prediction and descriptive purposes. Data mining is part of business intelligence and is useful when basic OLAP is not sufficient. It uses tools to clean and analyze data to find unexpected patterns. The major data mining techniques are classification, clustering, association rule learning, and sequential pattern mining. These techniques are applied in various business domains like banking, retailing, manufacturing, and marketing to gain insights into customer behavior and operations.

Uploaded by

ralturk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Data Mining

Data mining is a process that uses statistical and machine learning techniques to extract useful information from large databases. It identifies patterns in data that can be used for prediction and descriptive purposes. Data mining is part of business intelligence and is useful when basic OLAP is not sufficient. It uses tools to clean and analyze data to find unexpected patterns. The major data mining techniques are classification, clustering, association rule learning, and sequential pattern mining. These techniques are applied in various business domains like banking, retailing, manufacturing, and marketing to gain insights into customer behavior and operations.

Uploaded by

ralturk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

BUSINESS INTELLIGENCE

DATA MINING

1
What is Data Mining

 A process that uses statistical, mathematical, artificial


intelligence and machine learning techniques (sophisticated,
advanced data manipulation technology) to extract and
identify useful information and subsequent knowledge from
large database.
Uses sophisticated
Identifies useful
data manipulation Data Mining information
technology

Deals with large


databases 2
Data Mining Concepts and Applications
 Where is Data Mining in Business Intelligence?

3
Why do we need Data Mining?

 Users today want to perform statistical and mathematical


analysis such as hypothesis testing, prediction and customer
scoring models
 A major step in managerial decision making is forecasting or
estimating the results of different alternative courses of
actions
 Such investigation cannot be done with basic OLAP and
will require special tools – advanced business analytics – data
mining
4
Why do we need Data Mining?

OLAP Data Mining


Which branch in the northern Which electrical product will
region has obtained the be the most suitable to be
poorest customer feedback bundled together with the
during the New Year seasons sale of the newly
in the last three years. introduced washing machine?

5
Major Characteristics of Data Mining

 Data are often buried deep within very large databases, which
sometimes contain data from several years.

 Sophisticated tools are used to clean and synchronize data in


order to get the best result.

 Miners may find an unexpected result during data mining


activities and this will require creative thinking on the users’
decision making 6
DATA MINING METHODS

Prediction Methods: using some variables to


predict unknown or future values of other variables.

 Descriptive Methods: finding human-interpretable


patterns describing the data

7
Data Mining Tasks\Algorithms(fall Into Four
Broad Categories):

Classification
 Clustering
Association Rule Discovery
 Sequential Pattern Discovery

8
Data Mining Tasks\Algorithms

1. Classification Medical Insurance


company:
– Also known as supervised induction, most E.g. Clients with a
common of all data mining activities. history of diabetes
– Used to analyze the historical data stored in the (from
database and to automatically generate a model maternal/paternal
side) are likely to also
that can predict future behavior.
have diabetes in a
– Identify patterns of data to belong to a certain later stage of his/her
category life.
Decision: A special
– Application example : target marketing (likely premium coverage
customer or no hope, based on the previous can be designed for
customers’ behavior) the potential health9

condition
Data Mining algorithms(Fall into four broad
categories):

2. Clustering Comb the whole data to


identify sharing of
– Partitioning a database into segments in which the similar qualities/
members of a segment share similar qualities characteristics and
– create group based on
Unlike classification, the cluster is unknown when
that:
the algorithm starts.
E.g. Payment by credit
– Clustering technique includes optimization, the goal card is more popular in
is to create groups so that members within each the urban area compared
group have maximum similarity and the members to the rural area.
across groups have minimum similarity Decision:
Demographically, the
– Before the results of clustering techniques are used, it social class determines the
might be necessary for an expert to interpret, modify method of payment. This
the information can be interpreted 10 into
– Application example : Market segmentation business decisions
/strategy.
Classifying vs. Clustering

What is the major difference between


cluster analysis and classification?

 Classification is sorting cases into groups so that members of the same group
are strongly associated in some meaningful way.
 Cluster analysis identifying the common characteristics shared by members
of groups in transactions, and interpret that into a case.

11
Data Mining algorithms(Fall into four broad
categories):

3. Association
– Establishes relationship about items that
occur together in a given record Placing batteries in the
– Determining associations among items toys
that sell together If a customer buys bread,
– Often called market basket analysis as they are also likely to
buy milk
the primary applications is the analysis of
sales transactions
– Application example : Market basket
analysis 12
Data Mining algorithms(Fall into four broad
categories):

Unemployed
consumer who
4. Sequence discovery purchased pre paid
– The identification of association over telco service are
time most likely to
convert to postpaid
– Some sequence discovery techniques upon being employed
keep track of elapsed time between
associated events and the frequency of Purchase of
occurrences machinery will later
be followed by the
– Application example : Market basket purchase of
analysis over time, customer life maintenance service
13

cycle analysis
14
Types of data mining (Two types)

1) Hypothesis-driven data mining


Begins with a proposition by the user, who then seeks to validate the
truthfulness of the proposition
e.g. Start with a statement - The cause of fire during road accident is
due to the modification of vehicle by an unauthorized parties, then
use data mining to prove the statement
2) Discovery-driven data mining
Finds patterns, associations, and relationships among the data in
order to uncover facts that were previously unknown or not
even contemplated by an organization 15
Use in Business

Business Use

Banking

Forecasting levels of bad loans, fraud in credit card usage,
Where data mining is beneficial (the intent in most of these examples is to
credit card spending pattern, new loans
identify a business opportunity and create a sustainable competitive advantage).
Retailing andblanks.Predicting sales, determining correct inventory levels and
Fill in the
sales distribution schedules
Manufacturing Predicting when to expect machinery failures
and production
Marketing Predicting which customers will respond to Internet
banners or buy a particular products
16
Use in Business

Business Use

Government Forecasting threats to national security, predicting


 Where data mining is beneficial (the intent in most of these examples is to
and defense resources consumptions
identify a business opportunity and create a sustainable competitive advantage)
Health Correlating demographics of patients with critical illnesses.
Doctors will be more prepared
Airlines Capturing popular and unpopular routes at given times

Broadcasting Predicting what programm are best shown at prime time,


and which is the best time to slot in advertisement.

17
Understanding Customer Behavior

For most retail environments, three sources of customers data


are most critical to data mining efforts aimed at better understanding of
behavior:
– Demographic data – salary, population
– Transaction data – purchase type, online, cash, credit
– Online interaction data - favorite sections in website
(clickstream analytics can be used to identify who did/did not buy
product, why and when)
18
Data Mining in Retail
 Data mining in retail usually is looking at three different aspects:

1. Web analytics – Gather web statistics that track customer’s online


behavior ; hit, pages, sales, volume, and so on. This helps in adjusting
a web site to meet customer needs.
2. Customer analytics – transaction data from offline purchases, sales
and orders made, call for support, and demographic data. This is critical
in CRM and revenue management because a better understanding
allows an organization to cluster customers into groupings.
3. Optimization – Patterns can be detected and used to optimize
transaction and customer interaction. For example in recommending
relevant styles and complementary purchases/products to suit
customer behavior 19
Text Mining

 Application of data mining to nonstructured or less structured text files.


 It generates meaningful numerical indices from the unstructured text and
then processes these indices using various data mining algorithms

Data Mining Text Mining

Takes advantage of the infrastructure Operates with text documents - less


of stored data to extract additional structured information.. E.g.
useful information. E.g. Applying data Visualizing relationships between
mining to customer database, we may documents such as policies, memos,
discover that everyone who buys emails, minutes of meeting etc.
product A will also buy products B Organizations recognized this as one
and C six months later of the major sources for competitive
advantage. 20
Text Mining Example
• Airline industry uses text mining software to focus on key problem
areas through pattern identification by accessing incident reports to
increase the quality of service.

– The most frequently occurring terms are identified through


incident reports documented .

– Cluster/group the terms e.g. the term spillage and associate with
other key terms such as coffee, tea, soup, drink
– Can identify incidents that might lead to trouble and help 21

management stop the issue


Text Mining Example

 A private tertiary institution uses text mining to establish


knowledge on programs offered by the competitors by accessing
the advertisement materials produced by the competitors
 The most frequently occurring terms are identified through the
advertisements
 Cluster/group the terms e.g. the term degree and associate with
other key terms such as 2+1, 3+0, accommodation, fees
 Can identify new programs or types of facilities offered by the
competitors
22
Text Mining
How to mine text
1. Eliminate commonly used words (e.g. the, and, other).
These are known as stop-words.
2. Replace words with their stems or roots (e.g. eliminate
plurals and various conjugations). The terms phoned,
phoning, and phones would be mapped to phone.
3. Consider synonyms and phrases. Synonyms need to be
combined, e.g students and pupil need to be grouped
together.
23
Text Mining
How to mine text
4. Calculate the weights of the remaining terms, looking at the frequency with
which the words appear
2 common measures are used for this,
1) Term frequency factor (the actual number of times the word
appears in a document) and
2) Inverse document frequency (the number of times the word
appears in all document in a set)
– If tf factor is large, weight increase, If idf factor is large, weight
decrease
– Reason: idf indicates that the terms would be a common words to the
industry. 24
Web Mining

The discovery through the analysis of interesting


and useful information from the web, about the web
and usually using a web based tool.

25
Types of Web Mining

1. Web content mining - extraction of useful information from


Webpages. May be used to enhance search results produced by search
engines
2. Web structure mining – generating information from the links
included in WebPages. Can be used to structure the display of the page.
Can also identify the members of specific communities and their roles
3. Web usage mining – generated through web page visits, transactions
and web server logs – useful for CRM, understanding user behavior
(web analytics)

26

You might also like