0% found this document useful (0 votes)
96 views

Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering

This document discusses data mining and data warehouses. It begins with an introduction to data mining, explaining that it discovers patterns and relationships within large data sets. It then covers the evolution of data mining and how it works. The data mining process is described as involving exploration, model building and validation, and deployment. Common tools and techniques for data mining like neural networks, tree-based models, and statistical tools are overviewed. Applications of data mining like risk analysis and direct marketing are highlighted. Finally, the document discusses data warehouses as a key enabler of data mining.

Uploaded by

api-19799369
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views

Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering

This document discusses data mining and data warehouses. It begins with an introduction to data mining, explaining that it discovers patterns and relationships within large data sets. It then covers the evolution of data mining and how it works. The data mining process is described as involving exploration, model building and validation, and deployment. Common tools and techniques for data mining like neural networks, tree-based models, and statistical tools are overviewed. Applications of data mining like risk analysis and direct marketing are highlighted. Finally, the document discusses data warehouses as a key enabler of data mining.

Uploaded by

api-19799369
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10

DATA MINING AND DATA WAREHOUSE

BY

Sneha Jain Sajana


( III/IV B. Tech- CSE ) ( III/IV B. Tech- CSE)

Mail id: Mail id:


[email protected] [email protected]

Dept. of Computer Science Engineering


KONERU LAKSHMAIAH COLLEGE OF ENGINEERING
Vaddeswaram
GUNTUR

1
ABSTRACT

Data Mining is more oriented towards applications than the basic nature of the
underlying phenomena. For example, uncovering the nature of the underlying functions
or the specific types of interactive, multivariate dependencies between variables are not
the main goal of Data Mining. Instead, the focus is on producing a solution that can
generate useful prediction Therefore, Data Mining accepts among others approach to data
exploration or knowledge discovery and uses not only the traditional Exploratory Data
Analysis techniques, but also such techniques as Neural Networks which can generate
valid predictions but are not capable of identifying the specific nature of the interrelations
between the variables on which the predictions are based.

2
INDEX
Contents
1. Introduction
2. What is data mining
3. Evolution of data mining
4. How does data mining works
5. Data Mining Process
6. Tools and Techniques
a. Neural Networks
b. Tree based
c. Statistical tools
d. Data surveyor
7. Applications of Data mining
8. Data Warehouses
9. Future Data mining
10. Conclusion
11. Bibiliography

3
1. INTRODUCTION:

Data Mining is a powerful new technology with great potential to help companies
focus on the most important information in the data they have collected about the
behavior of their customers and potential customers. It discovers information within the
data that Queries and reports can’t effectively reveal.

Data Mining is an analytic process designed to explore data (usually large


amounts of data- typically business or market related) in search of consistent patterns
and/or systematic relationships between variables, and then to validate the findings by
applying the detected patterns to new subsets of data. The ultimate goal of data mining is
prediction - and predictive data mining is the most common type of data mining and one

that has the most direct business applications.

2. What is Data Mining?

Data Mining, or knowledge discovery, is the computer-assisted process of digging


through and analyzing enormous sets of data and then extracting the meaning of the data.
Data mining tools predict behaviors and future trends, allowing businesses to make
proactive, knowledge-driven decisions. Data mining tools can answer business questions
that traditionally were too time consuming to resolve. They scour databases for hidden
patterns, finding predictive information that experts may miss because it lies outside their
expectations.

3. The Evolution of Data Mining

Data mining is a natural development of the increased use of computerized


databases to

Store data and provide answers to business analysts.

Evolutionary Step Business Question Enabling Technology


Data Collection "What was my total Computers, tapes, disks
(1960s) revenue in the last five

4
years?"
"What were unit sales in
Faster and cheaper computers with more
Data Access (1980s) New England last
storage, relational databases
March?"

Faster and cheaper computers with more


Data Warehousing "What were unit sales in
storage, On-line analytical processing
and Decision New England last March?
(OLAP), multidimensional databases,
Support Drill down to Boston."
datawarehouses

"What's likely to happen


Faster and cheaper computers with more
Data Mining to Boston unit sales next
storage, advanced computer algorithms
month? Why?"

4. How does Data mining work?

While large-scale information technology has been evolving separate


transaction and analytical systems, data mining provides the link between the
two. Data mining software analyzes relationships and patterns in stored
transaction data based on open-ended user queries.. Generally, any of four
types of relationships are sought:

• Classes: Stored data is used to locate data in predetermined groups. For example,
a restaurant chain could mine customer purchase data to determine when
customers visit and what they typically order. This information could be used to
increase traffic by having daily specials.

• Clusters: Data items are grouped according to logical relationships or consumer


preferences. For example, data can be mined to identify market segments or
consumer affinities.

• Associations: Data can be mined to identify associations. The beer-diaper


example is an example of associative mining

5
• Sequential patterns: Data is mined to anticipate behavior patterns and trends. For
example, an outdoor equipment retailer could predict the likelihood of a

• Hiking shoes.

5. THE DATA MINING PROCESS


The process of data mining consists of three stages: (1) the initial exploration, (2)

model building or pattern identification with validation/verification, and (3) deployment


(i.e., the application of the model to new data in order to generate predictions).

Stage 1: Exploration. This stage usually starts with data preparation which may
involve cleaning data, data transformations, selecting subsets of records and - in case of
data sets with large numbers of variables ("fields") - performing some preliminary feature
selection operations to bring the number of variables to a manageable range (depending
on the statistical methods which are being considered).
Stage 2: Model building and validation. This stage involves considering various
models and choosing the best one based on their predictive performance (i.e., explaining
the variability in question and producing stable results across samples).
Stage 3: Deployment. That final stage involves using the model selected as best
in the previous stage and applying it to new data in order to generate predictions or
estimates of the expected outcome
6. TOOLS AND TECHNIQUES
A.Neural Networks Learn

6
Neural networks consist of a number of neurons that are interconnected--often in
complex ways--and then organized into layers. Neurons are very simple processing units
that compute a linear combination of a number of inputs and then perform a simple
mathematical process on the result to produce an output.

B.Tree-based Models
Tree-based models--which include classification and regression trees--are the
most common induction tools used in data mining. Tree-based models automatically
construct decision trees from data, yielding a sequence of rules, such as "If income is
greater than $60,000, assign the customer to this segment
C.Statistical Tools

Data mining employs a variety of traditional statistical methods such as cluster


analysis, discriminant analysis, logistic regression, and time series forecasting. For model

7
fitting and validation, data mining also uses more general statistical methods that conduct
automated searches for complex relationships and apply fresh data to tentative
relationships..

D.Data Surveyor
This is a data mining tool for the discovery of strategic relevant information from
large databases. Data Surveyor searches for relationships, trends and patterns. It uses
highly efficient techniques to test many potential relationships for their statistical
significance, allowing many hundreds of variables to be taken into account.
The perceived user of Data Surveyor is an expert in the application domain,
e.g. a tuarian, data-analyst or database-marketer. Their domain
knowledge is considered to be vital during the mining process.
7. APPLICATIONS
Risk analysis
Insurance companies and banks use data mining for risk analysis. An insurance
company searches in its own insurants and claims databases for relationships between
personal characteristics and claim behavior.

Direct Marketing
Data mining can also be used to discover the relationship between one's personal
characteristics,

8
Company X sends a mailing (1) to a number of prospects. The response (2) is e.g.
2%. The response is analyzed using data mining techniques (3), discovering differences
between the customers that did respond, and those that did not respond. The result
consists of database subgroups that have a significantly higher response probability (4),

e.g. of all young couples with double incomes, 24% replied to the last mailing. The
groups with the highest response-probability are selected as targets for the next mailing
(5). Data mining thus increases the response considerably.
Production Quality Control

Data Mining can also be used to determine those combinations of production


factors that influence the quality of the end-product. This information allows the process
engineers to explain why certain products fail the final test and to increase the quality of
the production process. To clarify the data mining process, SAS Institute has mapped out
an overall plan for data mining. This step-by-step process is referred to by the acronym

8.Data Warehouses
The drop in price of data storage has given companies willing to make the
investment a tremendous resource: Data about their customers and potential customers
stored in "Data Warehouses”. Data warehouses are becoming part of the technology.
Data warehouses are used to consolidate data located in disparate databases retrieved,
interpreted, and sorted by users or other data to respond faster to markets and

9
9. The Future of Data Mining:

In the short-term, the results of data mining will be in profitable, if mundane,


business related areas. Micro-marketing campaigns will explore new niches. Advertising
will target potential customers with new precision.

In the medium term, data mining may be as common and easy to use as e-mail.
We may use these tools to find the best airfare to New York, root out a phone number of
a long-lost classmate, or find the best prices on lawn mowers.

10. CONCLUSION:
Data mining is emerging as one of the key features of many homeland
securityinitiatives. Often used as a means for detecting fraud, assessing risk, and product
retailing, data mining involves the use of data analysis tools to discover previously
unknown, valid patterns and relationships in large data sets. In the context of homeland
security, data mining is often viewed as a potential means to identify terrorist activities,
such as money transfers and communications, and to identify and track individual
terrorists themselves, such as through travel and immigration records.
11. BIBILIOGRAPHY:
1. www.kdnuggets.com
2. www.ultragem.com
3. info.gte.com/kdd/
4. www.google.com
5. Data Base Management Systems by RaghuRamaKrishnan
6. Data Mining

10

You might also like