0% found this document useful (0 votes)

118 views10 pages

Seminar Data Mining

This document discusses data mining and provides an overview of the topic. It defines data mining as the process of extracting patterns from large data sets using methods from statistics and artificial intelligence. The document outlines how data mining works and the key elements involved, including extracting, transforming, loading, storing, managing and analyzing data. It also describes some common data mining techniques, such as artificial neural networks, genetic algorithms, and decision trees. Issues and applications of data mining are also reviewed.

Uploaded by

Sreedevi Kovilakath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views10 pages

Seminar Data Mining

Uploaded by

Sreedevi Kovilakath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

1

CONTENTS
Introduction..........................................................................................................................................2
Data Mining Overview.........................................................................................................................2
Data..................................................................................................................................................2
Information.......................................................................................................................................3
Knowledge.......................................................................................................................................3
How does data mining work?...............................................................................................................3
Elements of Data mining......................................................................................................................4
Types of Data Mining Techniques.......................................................................................................4
Artificial neural networks:...............................................................................................................4
Genetic algorithms:..........................................................................................................................4
Decision trees:..................................................................................................................................4
Nearest neighbor method:................................................................................................................5
Data Mining Issues...............................................................................................................................5
Data Quality.....................................................................................................................................5
Interoperability.................................................................................................................................5
Mission Creep..................................................................................................................................6
Privacy..............................................................................................................................................7
Data Mining Uses.................................................................................................................................7
Automated prediction of trends and behaviors................................................................................7
Automated discovery of previously unknown patterns....................................................................7
Limitations...........................................................................................................................................7
Data Mining Products...........................................................................................................................8
Applications.........................................................................................................................................8
Conclusion............................................................................................................................................9
References............................................................................................................................................9
2

DATA MINING

Introduction
Data mining is the process of extracting patterns from large data sets by combining methods from
statistics and artificial intelligence with database management. Data mining is becoming an
increasingly important tool to transform this data into information. It is currently used in a wide
range of profiling practices, such as marketing, surveillance, fraud detection, and scientific
discovery. The data mining consists of more than collecting and managing data, it also includes
analysis and prediction.

Data mining is often carried out only on samples of data. The mining process will be ineffective if
the samples are not a good representation of the larger body of data. Data mining cannot discover
patterns that may be present in the larger body of data if those patterns are not present in the
sample being "mined". The discovery of a particular pattern in a particular set of data does not
necessarily mean that a pattern is found elsewhere in the larger data from which that sample was
drawn. An important part of the process is the verification and validation of patterns on other
samples of data.

Data Mining Overview

Generally, data mining, sometimes called data or knowledge discovery, is the process of
analyzing data from different perspectives and summarizing it into useful information -
information that can be used to increase revenue, cuts costs, or both. Data mining software is one
of a number of analytical tools for analyzing data. It allows users to analyze data from many
different dimensions or angles, categorize it, and summarize the relationships identified.
Technically, data mining is the process of finding correlations or patterns among dozens of fields
in large relational databases.

Data, Information, and Knowledge

Data
Data are any facts, numbers, or text that can be processed by a computer. Today, organizations
are accumulating vast and growing amounts of data in different formats and different databases.
This includes:
3

 operational or transactional data such as, sales, cost, inventory, payroll, and accounting
 nonoperational data, such as industry sales, forecast data, and macro economic data
 meta data - data about the data itself, such as logical database design or data dictionary
definitions.

Information
The patterns, associations, or relationships among all this data can provide information. For
example, analysis of retail point of sale transaction data can yield information on which products
are selling and when.

Knowledge
Information can be converted into knowledge about historical patterns and future trends. For
example, summary information on retail supermarket sales can be analyzed in light of
promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or
retailer could determine which items are most susceptible to promotional efforts.

How does data mining work?

Data mining can be performed on data represented in quantitative, textual, or multimedia forms.
Data mining applications can use a variety of parameters to examine the data. They include
association (patterns where one event is connected to another event, such as purchasing a pen and
purchasing paper), sequence or path analysis (patterns where one event leads to another event),
classification (identification of new patterns), clustering (finding and visually documenting
groups of previously unknown facts), and forecasting (discovering patterns from which one can
make reasonable predictions regarding future activities).

While large-scale information technology has been evolving separate transaction and analytical
systems, data mining provides the link between the two. Data mining software analyzes
relationships and patterns in stored transaction data based on open-ended user queries. Several
types of analytical software are available: statistical, machine learning, and neural networks.
Generally, any of four types of relationships are sought:

Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant
chain could mine customer purchase data to determine when customers visit and what they
typically order. This information could be used to increase traffic by having daily specials.
4

Clusters: Data items are grouped according to logical relationships or consumer preferences. For
example, data can be mined to identify market segments or consumer affinities.
Associations: Data can be mined to identify associations.
Sequential patterns: Data is mined to anticipate behavior patterns and trends. For example, an
outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a
consumer's purchase of sleeping bags and hiking shoes.

Elements of Data mining

Data mining consists of five major elements:
 Extract, transform, and load transaction data onto the data warehouse system.
 Store and manage the data in a multidimensional database system.
 Provide data access to business analysts and information technology professionals.
 Analyze the data by application software.
 Present the data in a useful format, such as a graph or table.

Types of Data Mining Techniques

Different levels of analysis are available:
Artificial neural networks:
Non-linear predictive models that learn through training and resemble biological neural networks
in structure. In this, set of nodes connected by directed weighted edges. It is Useful for learning
complex data like handwriting, speech and image recognition.

Genetic algorithms:
Optimization techniques that use processes such as genetic combination, mutation, and natural
selection in a design based on the concepts of natural evolution.

Decision trees:
Tree-shaped structures that represent sets of decisions. These decisions generate rules for the
classification of a dataset. Specific decision tree methods include Classification and Regression
Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID
are decision tree techniques used for classification of a dataset. They provide a set of rules that
you can apply to a new (unclassified) dataset to predict which records will have a given outcome.
CART segments a dataset by creating 2-way splits while CHAID segments using chi square tests
5

to create multi-way splits. CART typically requires less data preparation than CHAID.

Nearest neighbor method:

A technique that classifies each record in a dataset based on a combination of the classes of the k
record(s) most similar to it in a historical dataset (where k 1). Sometimes this method is called the
k-nearest neighbor technique. It Define proximity between instances, find neighbors of new
instance and assign majority class.

Data Mining Issues

As data mining initiatives continue to evolve, there are several issues that include, but are not
limited to, data quality, interoperability, mission creep, and privacy.

Data Quality
Data quality is a multifaceted issue that represents one of the biggest challenges for data mining.
Data quality refers to the accuracy and completeness of the data. Data quality can also be affected
by the structure and consistency of the data being analyzed. The presence of duplicate records, the
lack of data standards, the timeliness of updates, and human error can significantly impact the
effectiveness of the more complex data mining techniques, which are sensitive to subtle
differences that may exist in the data. To improve data quality, it is sometimes necessary to
“clean” the data, which can involve the removal of duplicate records, normalizing the values used
to represent information in the database (e.g., ensuring that “no” is represented as a 0 throughout
the database, and not sometimes as a 0, sometimes as a N, etc.), accounting for missing data
points, removing unneeded data fields, identifying anomalous data points (e.g., an individual
whose age is shown as 142 years), and standardizing data formats (e.g., changing dates so they all
include MM/DD/YYYY).

Interoperability
Related to data quality, is the issue of interoperability of different databases and data mining
software. Interoperability refers to the ability of a computer system and/or data to work with other
systems or data using common standards or processes. Interoperability is a critical part of the
larger efforts to improve interagency collaboration and information sharing through e-government
and homeland security initiatives. For data mining, interoperability of databases and software is
important to enable the search and analysis of multiple databases simultaneously, and to help
ensure the compatibility of data mining activities of different agencies. Data mining projects that
are trying to take advantage of existing legacy databases or that are initiating first-time
6

collaborative efforts with other agencies or levels of government (e.g., police departments in
different states) may experience interoperability problems. Similarly, as agencies move forward
with the creation of new databases and information sharing efforts, they will need to address
interoperability issues during their planning stages to better ensure the effectiveness of their data
mining projects.

Mission Creep
Mission creep refers to the use of data for purposes other than that for which the data was
originally collected. This can occur regardless of whether the data was provided voluntarily by the
individual or was collected through other means.

Efforts to fight terrorism can, at times, take on an acute sense of urgency. This urgency can create
pressure on both data holders and officials who access the data. To leave an available resource
unused may appear to some as being negligent. Data holders may feel obligated to make any
information available that could be used to prevent a future attack or track a known terrorist.
Similarly, government officials responsible for ensuring the safety of others may be pressured to
use and/or combine existing databases to identify potential threats. Unlike physical searches, or
the detention of individuals, accessing information for purposes other than originally intended
may appear to be a victimless or harmless exercise. However, such information use can lead to
unintended outcomes and produce misleading results.

One of the primary reasons for misleading results is inaccurate data. All data collection efforts
suffer accuracy concerns to some degree. Ensuring the accuracy of information can require costly
protocols that may not be cost effective if the data is not of inherently high economic value. In
well-managed data mining projects, the original data collecting organization is likely to be aware
of the data’s limitations and account for these limitations accordingly. However, such awareness
may not be communicated or heeded when data is used for other purposes. For example, the
accuracy of information collected through a shopper’s club card may suffer for a variety of
reasons, including the lack of identity authentication when a card is issued, cashiers using their
own cards for customers who do not have one, and/or customers who use multiple cards. For the
purposes of marketing to consumers, the impact of these inaccuracies is negligible to the
individual. If a government agency were to use that information to target individuals based on
food purchases associated with particular religious observances though, an outcome based on
inaccurate information could be, at the least, a waste of resources by the government agency, and
an unpleasant experience for the misidentified individual.
7

Privacy
Concerns about privacy focus both on actual projects proposed, as well as concerns about the
potential for data mining applications to be expanded beyond their original purposes (mission
creep). For example, some experts suggest that anti-terrorism data mining applications might also
be useful for combating other types of crime as well. There is some disagreement over how
privacy concerns should be addressed. Some observers suggest that technical solutions are
adequate. In contrast, some privacy advocates argue in favor of creating clearer policies and
exercising stronger oversight.

Data Mining Uses

Data mining is used for a variety of purposes in both the private and public sectors. Industries
such as banking, insurance, medicine, and retailing commonly use data mining to reduce costs,
enhance research, and increase sales.

Automated prediction of trends and behaviors

Data mining automates the process of finding predictive information in large databases. Questions
that traditionally required extensive hands-on analysis can now be answered directly from the data
quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data
on past promotional mailings to identify the targets most likely to maximize return on investment
in future mailings. Other predictive problems include forecasting bankruptcy and other forms of
default, and identifying segments of a population likely to respond similarly to given events.

Automated discovery of previously unknown patterns

Data mining tools sweep through databases and identify previously hidden patterns in one step.
An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated
products that are often purchased together. Other pattern discovery problems include detecting
fraudulent credit card transactions and identifying anomalous data that could represent data entry
keying errors.

Limitations
While data mining products can be very powerful tools, they are not self-sufficient applications.
To be successful, data mining requires skilled technical and analytical specialists who can
8

structure the analysis and interpret the output that is created. Consequently, the limitations of data
mining are primarily data or personnel-related, rather than technology-related.

Although data mining can help reveal patterns and relationships, it does not tell the user the value
or significance of these patterns. These types of determinations must be made by the user.
Similarly, the validity of the patterns discovered is dependent on how they compare to “real
world” circumstances. For example, to assess the validity of a data mining application designed to
identify potential terrorist suspects in a large pool of individuals, the user may test the model
using data that includes information about known terrorists. However, while possibly re-affirming
a particular profile, it does not necessarily mean that the application will identify a suspect whose
behavior significantly deviates from the original model.

Another limitation of data mining is that while it can identify connections between behaviors
and/or variables, it does not necessarily identify a causal relationship. For example, an application
may identify that a pattern of behavior, such as the propensity to purchase airline tickets just
shortly before the flight is scheduled to depart, is related to characteristics such as income, level
of education, and Internet use. However, that does not necessarily indicate that the ticket
purchasing behavior is caused by one or more of these variables. In fact, the individual’s behavior
could be affected by some additional variable(s) such as occupation (the need to make trips on
short notice), family status (a sick relative needing care), or a hobby (taking advantage of last
minute discounts to visit new destinations).

Data Mining Products

Data mining products are taking the industry by storm. The major database vendors have already
taken steps to ensure that their platforms incorporate data mining techniques. Oracle's Data
Mining Suite (Darwin) implements classification and regression trees, neural networks, k-nearest
neighbors, regression analysis and clustering algorithms. Microsoft's SQL Server also offers data
mining functionality through the use of classification trees and clustering algorithms.

Applications
 Banking: loan/credit card approval
o predict good customers based on old customers
 Customer relationship management:
o Identify those who are likely to leave for a competitor.
 Targeted marketing:
9

o identify likely responders to promotions

 Fraud detection: telecommunications, financial transactions
o from an online stream of event identify fraudulent events
 Manufacturing and production:
o automatically adjust knobs when process parameter changes
 Medicine: disease outcome, effectiveness of treatments
o analyze patient disease history: find relationship between diseases
 Molecular/Pharmaceutical: identify new drugs
 Scientific data analysis:
o identify new galaxies by searching for sub clusters
 Web site/store design and promotion:
o find affinity of visitor to pages and modify layout

Conclusion
Generally, data mining, sometimes called data or knowledge discovery, is the process of
analyzing data from different perspectives and summarizing it into useful information -
information that can be used to increase revenue, cuts costs, or both. In the new millennium,
competitive enterprises will be mining their data with sophisticated data mining tools to find and
attract the best customers, to improve and enhance their product offerings, to maximize operating
efficiency and to cut costs and improve customer satisfaction. With time and resources in short
supply, data mining software will help enterprises maximize resources to remain competitive. The
advancements and deployment of sophisticated data mining tools, computers can think bringing
knowledge to our desktops.
10

References

 S. Sumathi, S. N. Sivanandan “Introduction to Data Mining and its Applications”

Springer; 1 edition, 2006
 Jeffrey W. Seifert “Data Mining: An Overview”. CRS Report for Congress, 2004
 An Introduction to Data Mining - http://www.thearling.com/index.html
 Wikipedia Data Mining entry - http://en.wikipedia.org/wiki/Data_mining

Absract:: Data, Information, and Knowledge
No ratings yet
Absract:: Data, Information, and Knowledge
7 pages
Final Document
No ratings yet
Final Document
25 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
Introduction To Data Mining - 125604
No ratings yet
Introduction To Data Mining - 125604
7 pages
Data Mining: Oracle
No ratings yet
Data Mining: Oracle
6 pages
Unit 1
No ratings yet
Unit 1
27 pages
What Is Data Mining
No ratings yet
What Is Data Mining
5 pages
Data Mining in Search Engine Analytics
No ratings yet
Data Mining in Search Engine Analytics
7 pages
Datamining With Big Data - Siva
No ratings yet
Datamining With Big Data - Siva
69 pages
What Is Data Mining
No ratings yet
What Is Data Mining
1 page
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
8 pages
Data Mining
No ratings yet
Data Mining
395 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Motivation of Data Mining
No ratings yet
Motivation of Data Mining
4 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Data Mining: What Is Data Mining?: Oracle
No ratings yet
Data Mining: What Is Data Mining?: Oracle
16 pages
01 Unit1
No ratings yet
01 Unit1
13 pages
Data Mining
No ratings yet
Data Mining
18 pages
Dmi Unit 1 - 186 - N3
No ratings yet
Dmi Unit 1 - 186 - N3
12 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Seminar On Data Mining Concepts and Its
No ratings yet
Seminar On Data Mining Concepts and Its
8 pages
Annotating Full Document
No ratings yet
Annotating Full Document
48 pages
Data Mining: Techniques and Applications
No ratings yet
Data Mining: Techniques and Applications
6 pages
DW and DM Notes
No ratings yet
DW and DM Notes
89 pages
A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in
No ratings yet
A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in
14 pages
Acp Excise
No ratings yet
Acp Excise
11 pages
Data Mining Insights for Professionals
No ratings yet
Data Mining Insights for Professionals
89 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Data Mining and Data Warehousing Unit 3 Part 1
No ratings yet
Data Mining and Data Warehousing Unit 3 Part 1
13 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
DM-Unit 1
No ratings yet
DM-Unit 1
13 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
46 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
Data Mining for Business Insights
100% (1)
Data Mining for Business Insights
39 pages
Data Mining: What Is Data Mining?
No ratings yet
Data Mining: What Is Data Mining?
10 pages
Data Mining: What Is Data Mining?: Correlations or Patterns Among Fields in Large Relational Databases
No ratings yet
Data Mining: What Is Data Mining?: Correlations or Patterns Among Fields in Large Relational Databases
6 pages
Data Mining1
No ratings yet
Data Mining1
37 pages
Data Mining: Applications and Techniques
No ratings yet
Data Mining: Applications and Techniques
60 pages
Data Mining Tutorial Guide
No ratings yet
Data Mining Tutorial Guide
30 pages
L - 1 Data Mining
No ratings yet
L - 1 Data Mining
17 pages
1 ST Review Document
No ratings yet
1 ST Review Document
37 pages
Chapter 1 (Introduction)
No ratings yet
Chapter 1 (Introduction)
17 pages
Unit 1 Datamining For Business Intelligence
No ratings yet
Unit 1 Datamining For Business Intelligence
101 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
Data Mining
No ratings yet
Data Mining
7 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
1 - Lect 1 & 2 Data Mining
No ratings yet
1 - Lect 1 & 2 Data Mining
20 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
Unit 3 Ba
No ratings yet
Unit 3 Ba
29 pages
Data Mining
No ratings yet
Data Mining
8 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
5 pages
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
No ratings yet
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
6 pages
Notes DATA MINING MBA III
No ratings yet
Notes DATA MINING MBA III
8 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
SQLite ODBC Driver Guide
No ratings yet
SQLite ODBC Driver Guide
6 pages
Unit 1
No ratings yet
Unit 1
88 pages
SQL For Web Developers
No ratings yet
SQL For Web Developers
16 pages
5th Sem Syllabus
No ratings yet
5th Sem Syllabus
4 pages
OBIEE Standards and Best Practices
100% (1)
OBIEE Standards and Best Practices
10 pages
Data Engineering 101 Learning Path
No ratings yet
Data Engineering 101 Learning Path
26 pages
Deadline Projects
No ratings yet
Deadline Projects
1 page
Power BI Exam Prep Guide
No ratings yet
Power BI Exam Prep Guide
12 pages
DMS K Scheme Report
No ratings yet
DMS K Scheme Report
8 pages
AWS Solution Architect Associate Questions
No ratings yet
AWS Solution Architect Associate Questions
13 pages
Lab No 03
No ratings yet
Lab No 03
13 pages
System Design Interview Guide-1
No ratings yet
System Design Interview Guide-1
8 pages
Report 1759175469953
No ratings yet
Report 1759175469953
9 pages
List Data Structure: Data Structures and Algorithms in Java 1/23
No ratings yet
List Data Structure: Data Structures and Algorithms in Java 1/23
23 pages
E6929 IranArze
No ratings yet
E6929 IranArze
15 pages
Task 1 Scenario Assistance
No ratings yet
Task 1 Scenario Assistance
23 pages
IDQ Learning
No ratings yet
IDQ Learning
36 pages
SAP APO Error Troubleshooting
No ratings yet
SAP APO Error Troubleshooting
74 pages
B1 HPE Storage Overview. How To Position The Different Storage Products Van Der Lugt 1
No ratings yet
B1 HPE Storage Overview. How To Position The Different Storage Products Van Der Lugt 1
70 pages
Assignment No 2 (Mid Term Preparation)
No ratings yet
Assignment No 2 (Mid Term Preparation)
4 pages
How To Sync Files in xCAT Release 2.3 or Later: 1.overview
No ratings yet
How To Sync Files in xCAT Release 2.3 or Later: 1.overview
6 pages
Table Spaces and Datafiles Oracle 10G
No ratings yet
Table Spaces and Datafiles Oracle 10G
6 pages
Alta SaaS Protection Licensing Guide
No ratings yet
Alta SaaS Protection Licensing Guide
34 pages
DMS-ALL Chapterwise Questions
No ratings yet
DMS-ALL Chapterwise Questions
3 pages
SAS Big Data Analytics
No ratings yet
SAS Big Data Analytics
5 pages
Data Cloud Set 1 - 73
No ratings yet
Data Cloud Set 1 - 73
31 pages
Program-7: //WAP To Implement JDBC Connectivity
No ratings yet
Program-7: //WAP To Implement JDBC Connectivity
15 pages
Project
No ratings yet
Project
15 pages
Informatica Mapping and Update Strategies
No ratings yet
Informatica Mapping and Update Strategies
5 pages
2.data Models and Database Architecture
No ratings yet
2.data Models and Database Architecture
25 pages

Seminar Data Mining

Uploaded by

Seminar Data Mining

Uploaded by

1

Data Mining Overview

Data, Information, and Knowledge

How does data mining work?

Elements of Data mining

Types of Data Mining Techniques

Nearest neighbor method:

Data Mining Issues

Data Mining Uses

Automated prediction of trends and behaviors

Automated discovery of previously unknown patterns

Data Mining Products

o identify likely responders to promotions

 S. Sumathi, S. N. Sivanandan “Introduction to Data Mining and its Applications”

You might also like