Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
BY
1
ABSTRACT
Data Mining is more oriented towards applications than the basic nature of the
underlying phenomena. For example, uncovering the nature of the underlying functions
or the specific types of interactive, multivariate dependencies between variables are not
the main goal of Data Mining. Instead, the focus is on producing a solution that can
generate useful prediction Therefore, Data Mining accepts among others approach to data
exploration or knowledge discovery and uses not only the traditional Exploratory Data
Analysis techniques, but also such techniques as Neural Networks which can generate
valid predictions but are not capable of identifying the specific nature of the interrelations
between the variables on which the predictions are based.
2
INDEX
Contents
1. Introduction
2. What is data mining
3. Evolution of data mining
4. How does data mining works
5. Data Mining Process
6. Tools and Techniques
a. Neural Networks
b. Tree based
c. Statistical tools
d. Data surveyor
7. Applications of Data mining
8. Data Warehouses
9. Future Data mining
10. Conclusion
11. Bibiliography
3
1. INTRODUCTION:
Data Mining is a powerful new technology with great potential to help companies
focus on the most important information in the data they have collected about the
behavior of their customers and potential customers. It discovers information within the
data that Queries and reports can’t effectively reveal.
4
years?"
"What were unit sales in
Faster and cheaper computers with more
Data Access (1980s) New England last
storage, relational databases
March?"
• Classes: Stored data is used to locate data in predetermined groups. For example,
a restaurant chain could mine customer purchase data to determine when
customers visit and what they typically order. This information could be used to
increase traffic by having daily specials.
5
• Sequential patterns: Data is mined to anticipate behavior patterns and trends. For
example, an outdoor equipment retailer could predict the likelihood of a
• Hiking shoes.
Stage 1: Exploration. This stage usually starts with data preparation which may
involve cleaning data, data transformations, selecting subsets of records and - in case of
data sets with large numbers of variables ("fields") - performing some preliminary feature
selection operations to bring the number of variables to a manageable range (depending
on the statistical methods which are being considered).
Stage 2: Model building and validation. This stage involves considering various
models and choosing the best one based on their predictive performance (i.e., explaining
the variability in question and producing stable results across samples).
Stage 3: Deployment. That final stage involves using the model selected as best
in the previous stage and applying it to new data in order to generate predictions or
estimates of the expected outcome
6. TOOLS AND TECHNIQUES
A.Neural Networks Learn
6
Neural networks consist of a number of neurons that are interconnected--often in
complex ways--and then organized into layers. Neurons are very simple processing units
that compute a linear combination of a number of inputs and then perform a simple
mathematical process on the result to produce an output.
B.Tree-based Models
Tree-based models--which include classification and regression trees--are the
most common induction tools used in data mining. Tree-based models automatically
construct decision trees from data, yielding a sequence of rules, such as "If income is
greater than $60,000, assign the customer to this segment
C.Statistical Tools
7
fitting and validation, data mining also uses more general statistical methods that conduct
automated searches for complex relationships and apply fresh data to tentative
relationships..
D.Data Surveyor
This is a data mining tool for the discovery of strategic relevant information from
large databases. Data Surveyor searches for relationships, trends and patterns. It uses
highly efficient techniques to test many potential relationships for their statistical
significance, allowing many hundreds of variables to be taken into account.
The perceived user of Data Surveyor is an expert in the application domain,
e.g. a tuarian, data-analyst or database-marketer. Their domain
knowledge is considered to be vital during the mining process.
7. APPLICATIONS
Risk analysis
Insurance companies and banks use data mining for risk analysis. An insurance
company searches in its own insurants and claims databases for relationships between
personal characteristics and claim behavior.
Direct Marketing
Data mining can also be used to discover the relationship between one's personal
characteristics,
8
Company X sends a mailing (1) to a number of prospects. The response (2) is e.g.
2%. The response is analyzed using data mining techniques (3), discovering differences
between the customers that did respond, and those that did not respond. The result
consists of database subgroups that have a significantly higher response probability (4),
e.g. of all young couples with double incomes, 24% replied to the last mailing. The
groups with the highest response-probability are selected as targets for the next mailing
(5). Data mining thus increases the response considerably.
Production Quality Control
8.Data Warehouses
The drop in price of data storage has given companies willing to make the
investment a tremendous resource: Data about their customers and potential customers
stored in "Data Warehouses”. Data warehouses are becoming part of the technology.
Data warehouses are used to consolidate data located in disparate databases retrieved,
interpreted, and sorted by users or other data to respond faster to markets and
9
9. The Future of Data Mining:
In the medium term, data mining may be as common and easy to use as e-mail.
We may use these tools to find the best airfare to New York, root out a phone number of
a long-lost classmate, or find the best prices on lawn mowers.
10. CONCLUSION:
Data mining is emerging as one of the key features of many homeland
securityinitiatives. Often used as a means for detecting fraud, assessing risk, and product
retailing, data mining involves the use of data analysis tools to discover previously
unknown, valid patterns and relationships in large data sets. In the context of homeland
security, data mining is often viewed as a potential means to identify terrorist activities,
such as money transfers and communications, and to identify and track individual
terrorists themselves, such as through travel and immigration records.
11. BIBILIOGRAPHY:
1. www.kdnuggets.com
2. www.ultragem.com
3. info.gte.com/kdd/
4. www.google.com
5. Data Base Management Systems by RaghuRamaKrishnan
6. Data Mining
10