0% found this document useful (0 votes)
6 views

Data Mining

Uploaded by

nagarajan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Data Mining

Uploaded by

nagarajan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Data mining

What is data mining?


It is the extraction of previously unknown, valid, novel and
understandable information or patterns from data in
repositories or sources:

 Databases
 Text files
 Social networks
 Computer simulation

The information obtained should be such that is can be used in


any organizations/enterprises for business making.
Why data mining
• Lots of data is being collected within
organizations such as banks, on e-commerce
based stores and it’s stored/warehoused.
• The need to explore the data and find possible
solutions to known problems may arise. These
solutions may be in a form of a pattern based
on previous data in this case the knowledge
obtained may enhance good decision making in
organizations hence why data mining is needed.
Applications of Data mining
Industry Application
Finance Credit card analysis
Insurance Claims and Fraud analysis
Telecommunications Call record analysis
transport Logistics management
Consumer goods Promotion analysis
Scientific research Image, video, speech
Components of data mining
• Knowledge Discovery
Concrete information gleaned from known data.
Data you may not have known but which is
supported by recorded facts.

• Knowledge prediction
Uses known data to forecast future trends,
events for example, stock market predictions
Steps in data mining
1. Data Integration
This involves combining data residing in different sources and providing users
with a unified or combined view of these data.

2. Data Selection

This is the process of determining the appropriate data type and source as well
as suitable instruments to collect data.

3. Data cleaning
Data cleaning is the process of detecting and correcting corrupt or inaccurate
records from a set, table or database and refers to identifying incomplete,
incorrect, inaccurate or irrelevant part of data and replacing, modifying or
deleting the dirty data.
4. Data transformation
Data transformation converts a set of data values from
the data format of a source data system into the data
format of a destination data system

5. Data mining
Here techniques are applied to extract data or patterns
of interest of which decisions will be made.

6. Pattern evaluation
In Pattern evaluation patterns are identified and
analyzed based on given measures.
7. Knowledge presentation
This is the final phase in which the discovered
knowledge is visually represented to the user.
This phase uses understandable techniques to
help users understand and interpret the data
mining results.
Data mining diagram based on a Knowledge
Discovery in databases
Advantages of data mining
• Marketing or Retailing
Data mining helps marketing companies build models
based on historical data to predict who will respond to
the new marketing campaign etc. through the results
markets will have an appropriate approach to selling
profitable products to target customers.

Appropriate production arrangements can be made


based on marketing analysis and in that way
customers can buy products frequently.
• Banking or Finance
Data mining gives financial institutions
information about loan information and credit
reporting.

By building a model from historical customer’s


data, the bank, and financial institution can
determine good and bad loans. Moreover, data
mining helps banks detect fraudulent credit card
transactions to protect credit cards owner
• Manufacturing
Applying data mining in operational engineering
data, manufactures can detect faulty equipment
and determine optimal control parameters.

• Governments
Data mining helps government’s agencies by
digging and analyzing records of the financial
transaction t build patterns that can detect
money laundering or criminal activities.
Disadvantages of data mining
• Privacy issues
The use of the internet with social networks, e-commerce,
forums, blogs etc. raise a lot of privacy concerns, people are
afraid of their personal information is collected and used in an
unethical way that potentially causes them trouble.

• Security issues
Businesses own information about their employees and
customers including social security numbers, birthdays, payroll
etc. incase hackers access and steal the data of customers so
much personal information may lead to an unsafe environment
especially if the information obtained involves finances.
• Misuse of information
Information may be exploited by unethical
people or businesses to take advantage of
vulnerable people or discriminate against a
group of people

Data mining techniques are also inaccurate


meaning if inaccurate information is used for
decision making then it may cause serious
consequences.
Current research
• Super computer data mining
The aim of the project is to produce a super
computing data mining resource for use by the
United Kingdom academic community which
utilizes a number of advanced machine learning
and statistical algorithms and the ensemble
machine approach will be used to exploit the large
scale parallelism possible in super-computing. This
purpose is embodied in the following objectives :
• To develop a massively parallel approach for
commonly used statistical and machine learning
techniques for exploratory data analysis
• To develop a massively parallel approach to the use of
evolutionary computing techniques for feature
creation and selection.
• To develop a massively parallel approach to the use of
evolutionary computing techniques for data modeling.
• To develop a massively parallel approach to the use of
ensemble machines for data modeling consisting of
many well-known machine learning algorithms
• To develop an appropriate super-computing
infra-structure to support the use of such
advanced machine learning techniques with
large datasets.
• Medical data mining
It is estimated that 150 million people have diabetes
worldwide, and that this number may double by 2025.
There Is no cure for diabetes, however, the condition can be
managed and early treatment can minimize the
complications described. A key factor in providing early
treatment is to identify those most at risk of complications
at an early stage. The data mining group of university of
East Anglia has been working on this area for some time on
a collaborative project with St. Thomas Hospital London.
• St. Thomas Hospital London since 1973 had stored
patients information in a computerized clinical records
system
• In their research they identified factors that
were associated with early mortality. Current
research and teaching on outcome in people
with diabetes identifies cardiac risk factors as
being the most likely indicators of early
mortality. The data mining study occurred in
parallel with the independent analysis of a
cohort of 1000 patients with diabetes re-
examined after 10 years. This analysis also
identified peripheral neuropathy as the most
important risk factor for premature death.
• Time series data mining electricity usage
patterns
This is set to take place over the next decade
and will result in over 27 million households
being equipped with intelligent metering
systems that can monitor electricity
consumption in 15 minutes intervals and
facilitate easy communication of data usage.
Future research
• In future it is highly likely that data mining becomes
predictive analysis. data mining applications that will
enrich human life in various fields such as business,
education, medical field, scientific field, politics include:
• Data mining in security and privacy preserving. For
example, recording of electronic commination like email
logs and web logs have captured human process
• Challenges in mining financial data for example ,
investors use models of assets prices to gain bigger
profits
• Detecting eco-system disturbances.
• Distributed data mining. Distributed algorithm
is developed for association analysis such as
parallel decision tree construction
• Text mining: an example is the use of opinion
or questionnaire mining where the objective is
to obtain useful information.
• Image mining: An example is the classification
of retinal image data and magnetic resonance
imaging scan data to identify disorders.
conclusion
Information extracted through data mining is
valuable for different organizations in different
industries that is, health sector, logistics,
marketing, finance, engineering etc. through it
businesses become information brokers, we can
weed out fraud, bad customers while targeting
good business customers, promising markets
and cross selling.
data set
• A data set can often be viewed as a collection of
data objects. Other names for a data object are
record, point, vector, pattern, event, case,
sample, observation, or entity. In turn, data
objects are described by a number of attributes
that capture the basic characteristics of an
object, such as the mass of a physical object or
the time at which an event occurred. Other
names for an attribute are variable,
characteristic, field, feature,ordimension.

You might also like