0% found this document useful (0 votes)
81 views

Data Mining

1. Data mining allows companies to analyze large amounts of stored data to uncover hidden patterns and relationships that can provide useful business insights and knowledge. 2. As data storage costs decrease, companies, government agencies, and research institutions are accumulating vast amounts of data but lack the analytical capabilities to understand it. Data mining uses computer algorithms to automatically find patterns in large data sets. 3. Data mining techniques include association pattern analysis to find relationships between different data items, cluster analysis to organize data into meaningful groups, and predictive modeling to discover patterns that can be used to predict future outcomes. Uncovering useful patterns from data can help businesses target advertising, personalize websites, and predict customer behavior.

Uploaded by

chepimanca
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views

Data Mining

1. Data mining allows companies to analyze large amounts of stored data to uncover hidden patterns and relationships that can provide useful business insights and knowledge. 2. As data storage costs decrease, companies, government agencies, and research institutions are accumulating vast amounts of data but lack the analytical capabilities to understand it. Data mining uses computer algorithms to automatically find patterns in large data sets. 3. Data mining techniques include association pattern analysis to find relationships between different data items, cluster analysis to organize data into meaningful groups, and predictive modeling to discover patterns that can be used to predict future outcomes. Uncovering useful patterns from data can help businesses target advertising, personalize websites, and predict customer behavior.

Uploaded by

chepimanca
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

tutor

DATA MINING

HIDDEN
MESSAGES
Have you ever thought that maybe your data is trying to tell you something? As your busi-
ness expands, so do your data archives; and with gigabytes of information stored on servers,
disks and drives – you may be sitting on a gold mine. Data mining services unearth pat-
terns and trends that can help your business.

overnment agencies, information and relationships itories that provided little more ment of query tools such as SQL,

G b u si n e ss e s and
re s e a rch firms are
c h u r n in g out raw
f rom immense quantities of
data. Scientists use it to sepa-
rate signals from noise in astro-
than the capability to summarise
and report. With the develop-
database managers were able
to query data more flexibly. A

data on every subject imagin- nomical data and to find genes


able, at an ever-increasing rate. within DNA sequences. Your
N A S A’s space probes keep company can use it to gain valu-
phoning home with data, e- able knowledge about cus-
businesses accumulate infor- tomers, site visitors and busi-
mation about customer habits, ness practices. Using this knowl-
and Web servers log every user edge, you can target advertising
interaction. The cost of storage campaigns and evaluate their
keeps dropping, so there’s no success. You can personalise
d i fficulty finding a place to Web pages and suggest related
warehouse these terabytes of purchases. And you can predict
data. Although there may be customer behaviour, making
important patterns and knowl- your site more effective.
edge buried in the data, the
sheer amount of information DEVELOPMENT OF DATA
has grown beyond human ana- MINING
lytical capacity. Data mining Data mining is a logical evolu-
allows computers to take over tion in database technology. The Figure 1: Affinity patterns reveal associations such as which products
the task of finding the patterns. earliest databases, which served are purchased in the same transactions or which pages are often vis-
Data mining is the process of as simple replacements for ited in the same session.
automatically extracting useful paper records, were data repos-

114 July 2001 www.DITnet.co.ae ■ www.pcmag-mideast.com


data mining | hidden messages

manager could, for example, INTRIGUING PATTERNS


d e t e r mi n e h o w m a ny c ell To wring knowledge from raw
phones were sold in Kalama- data, data mining software uses
zoo during June of 1980 or a wide variety of complex algo-
which salespeople brought in rithms including neural net-
the most customers. Almost any works, rule induction, decision
quantitative question can be trees and genetic algorithms.
answered using these tools. Typically, the software performs
OLAP (Online Analytical Pro- its analysis on a portion of the
cessing) tools aid by making data to obtain rules and pat-
patterns visible. Given the cor- terns, then validates the results
rect view of the data, you might by testing them against the held-
discover that trailer hitch sales back data. In a scientific setting,
in Texas are twice as high in this process can reveal relation- Figure 2: After the data mining software identifies clusters of similar
February as in any other month, ships that aren’t obvious or sift records, the manager isolates clusters whose characteristics are inter-
l etting you know that you out real data from mere noise. In esting.
should adjust production to a business setting, this knowl-
match or work on raising sales edge can be used to set policy,
in the other months. These tools exploiting favourable patterns records might show that 80 per- customers who make a very
make patterns in data easier to and avoiding bad ones. cent of customers declaring large number of small purchas-
see, but the manager still has Part of the challenge for data bankruptcy had obtained three es. The company could reduce
to manipulate the data, look for mining software involves gen- or more new credit cards with- processing costs by offering an
patterns and decide which pat- erating results in terms under- in the past year. incentive for customers to com-
terns are important. A totally standable by humans. Among To identify clusters, the soft- bine their small orders into
unexpected or hidden pattern the more intelligible pattern w a re models a multi-dimen- fewer, larger ones.
c a n g o u n n o t ic e d s im pl y typ es a re a ss o ci a ti o n s, sional space in which each of Data mining is also used to
because nobody thought to sequences and clusters. Associ- possibly thousands of dimen- devise predictive pattern s .
look for it. ation or affinity patterns simply sions represents an attribute of Given a large database of cus-
Data mining automates the identify database elements that the data. The program then seg- tomer transactions and a spe-
process of locating and extract- occur together in a statistically ments the data into clusters cific subset that are known to be
ing these hidden patterns and significant fashion. For exam- based on their proximity in this fraudulent, the software could
knowledge. In its purest form, ple, analysis of a huge database imaginary space. Further analy- be directed to determine what
data mining doesn’t involve of customer shopping carts sis, either in software or with simpler characteristics distin-
looking for specific information. could reveal that nine out of ten human intervention, selects clus- guish the fraudulent transactions
Rather than starting from a ques- visitors who bought calendars ters that have useful character- from the rest. If successful, this
tion or a hypothesis, data min- also bought pens. Sequence pat- istics. A simple example would will yield a rule that predicts
ing simply finds patterns that terns are similar but with a time be an analysis of a sales data- which future transactions are
are already present in the data. fac tor thrown in. A bank’s base that uncovers a cluster of likely to be fraudulent, and the

www.DITnet.co.ae ■ www.pcmag-mideast.com July 2001 115


tutor
DATA MINING
company can give extra scruti - of clickstream lines every day. tionships among the categories Hit Wonders and Return Visi-
ny to those dealings. Data mining software can find Accessories, Men’s Clothing and tors might suggest techniques
Of course, most data sets ar e significant patterns in the click- Women’s Clothing within the for turning more of the former
full of patterns that don’t repre- stream logs alone, but that data general category Cycle Shop > into the latter.
sent useful knowledge. Yo u becomes substantially more use- Mountain Bike. More than half Once an interesting cluster
won’t be impressed if a data ful in combination with cus- of those who browsed acces- has been identified, you can
mining tool reports that every tomer registration data. Linking sories also browsed men’s cloth- study it further using more tra-
customer in zip code 10016 has clickstream entries with a spe- ing. Slightly more than half the ditional forms of analysis, such
a New York address, or that cific customer lets you track that visitors who looked at acces- as the funnel report shown in
every patient in the gynaecolo- customer’s travels through your sories also looked at women’s Figure 3. A funnel report iden-
gy department is female. But the site – this alone provides a vast clothing. Over a third of the cus- tifies how many users success-
same technique might uncover new realm for discovery. The tomers who toured accessories fully negotiate each step of a
a pattern of double billing or richest lode for data mining is a also browsed both men’s and multistep process. In the fig-
reveal new avenues for targeted data warehouse containing women’s clothing. The manag- ure, the process is a purchase
advertising. Human interven- clickstream data, user profile er viewing this data will have transaction. Just a few percent
tion is required to distinguish information and all of the com- to dig deeper to determine the of the users drop off at each
patterns that are useful. pany’s other relevant databases. reason for these patterns. The step; an unusually large drop at
Using the most accurate data customers could be buying one step would be a red flag for
is essential. As a precursor to PRACTICAL APPLICATIONS accessories and clothing, in the Webmaster to examine the
serious data mining, companies One of the most cited publi- which case cross-selling makes Web page or form correspond-
will usually establish a data cations in the data mining field sense. Another possibility is that ing to that step.
warehouse – a collection of data is a 1991 doctoral dissertation the customers looked in one cat- Of course, you’ll only get the
designed to support manage- by Usama Fayyad. As a gradu- egory, didn’t find what they benefit of data mining if you
ment decision making. The ate student, Fayyad worked wanted, and switched to the t a ke a c t i o n b as e d o n t h e
warehouse includes data from w i t h G e n e r a l Mo t o r s o n other. In that case, certain prod- knowledge it provides (and
across the enterprise at a single extracting useful knowledge ucts could be placed in different keep the data warehouse up to
point in time. As much as pos- from an immense database of or multiple categories. date). If you’ve identified a
sible, the data is cleansed of car repair data; the algorithms As noted above, data mining cluster of users based on certain
errors and redundancy, and per- he devised became the basis s o f t w a re can plot immense simple characteristics, you can
haps transformed into a format for his dissertation. Fayyad quantities of data in multi- personalise your site for first-
suitable for the mining program. went on to develop data-min- dimensional space to find items time visitors that fit those char-
ing sys tems for NASA and with similar characteristics. After acteristics, or you can create
DATA MINING AND THE WEB M i c rosoft before f ounding the software has done the heavy targeted advertising campaigns.
Every time you click on a URL, digiMine in March 2000. lifting, the database manager If the software shows that cer-
your browser requests the cor- Data mining is not cheap. studies the characteristics of tain products are purc h a s e d
responding Web page, which a digiMine’s hosted service starts each cluster and identifies those together very frequently, link
Web server supplies, logging the at $7,500 a month; fully installed that seem useful. In Figure 2, the two in your catalogue. If
transaction. Further transactions solutions cost hundreds of thou- digiMine has identified eight half of the customers who
may be required – to download sands. To see what data min- significant clusters within a begin the checkout pro c e s s
images on the page, for exam- ing can do for you, we’ll look at group of over 300,000 visitors, drop out before completing the
ple. The server’s log of low level a few of the reports generated and the manager has named transaction, revamp the check-
transactions is re f e r red to as by digiMine. several of the clusters based on out system. Your company’s
clickstream data. A large e-com- Figure 1 shows a fairly simple their characteristics. Compari- data is full of undiscovere d
merce site can generate millions report that reveals affinity rela- son of the characteristics of One gems; start digging!

Figure 3: Traditional data analysis techniques applied to the results of data mining can yield useful reports. This funnel report iden-
tifies how many users make it past each step of the checkout process.

116 July 2001 www.DITnet.co.ae ■ www.pcmag-mideast.com

You might also like