0% found this document useful (0 votes)
142 views

Data Mining Techniques and Applications

1. The document discusses data mining techniques and applications. It provides an overview of data mining, describing it as the process of extracting useful patterns from large amounts of data. 2. Several data mining techniques are covered, including classification, prediction, clustering, association rules, and neural networks. Examples of each technique are provided. 3. Applications of data mining discussed include reducing costs for a Dutch insurance company, detecting telecom fraud, preventing financial fraud, and identifying customer attributes for mortgage offers. Case studies demonstrate how these organizations achieved benefits like decreased costs and increased conversion rates through data mining.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views

Data Mining Techniques and Applications

1. The document discusses data mining techniques and applications. It provides an overview of data mining, describing it as the process of extracting useful patterns from large amounts of data. 2. Several data mining techniques are covered, including classification, prediction, clustering, association rules, and neural networks. Examples of each technique are provided. 3. Applications of data mining discussed include reducing costs for a Dutch insurance company, detecting telecom fraud, preventing financial fraud, and identifying customer attributes for mortgage offers. Case studies demonstrate how these organizations achieved benefits like decreased costs and increased conversion rates through data mining.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

DATA MINING TECHNIQUES AND

APPLICATIONS
1.Abstract

Data mining is a process which finds useful patterns from large


amount of data. The paper discusses few of the data mining techniques,
algorithms and some of the organizations which have adapted data mining
technology to improve their businesses and found excellent results. Data
mining is also known as Knowledge Discovery in Database (KDD). It is also
defined as the process which includes extracting the interesting, interpretable
and useful information from the raw data. There are different sources that
generate raw data in very large amount. This is the main reason the
applications of data mining are increasing rapidly. This paper reviews data
mining techniques and its applications such as educational data mining
(EDM), finance, commerce, life sciences and medical etc. We group existing
approaches to determine how the data mining can be used in different fields.
Our categorization specifically focuses on the research that has been
published over the period 2007-2017. With this categorization, we present
an easy and concise view of different models adapted in the data mining.
2.Introduction to data mining techniques

Data mining techniques are set of  algorithms intended to find the hidden
knowledge from the data. Usage of data mining techniques will purely depend on the
problem we were going to solve. Some of the popular data mining techniques are
classification algorithms, prediction analysis algorithms, clustering techniques. In this initial
introduction post, we were going to address the basic understanding of the term data
mining by presenting you a toy kind of example.

1.Overview of Data Mining

The development of Information Technology has generated large amount of databases


and huge data in various areas. The research in databases and information technology has
given rise to an approach to store and manipulate this precious data for further decision
making. Data mining is a process of extraction of useful information and patterns from
huge data. It is also called as knowledge discovery process, knowledge mining from
data, knowledge extraction or data /pattern analysis.
Figure 1. Knowledge discovery Process

Data mining is a logical process that is used to search through large amount of
data in order to find useful data. The goal of this technique is to find patterns that were
previously unknown. Once these patterns are found they can further be used to make
certain decisions for development of their businesses.
Three steps involved are
 Exploration
 Pattern identification
 Deployment

Exploration
In the first step of data exploration data is cleaned and transformed into
another form, and important variables and then nature of data based on the problem are
determined.
Pattern Identification
Once data is explored, refined and defined for the specific variables the
second step is to form pattern identification. Identify and choose the patterns which make the
best prediction.

Deployment
Patterns are deployed for desired outcome.

3.Data Mining Algorithms and Techniques


Various algorithms and techniques like Classification, Clustering, Regression,
Artificial Intelligence, Neural Networks, Association Rules, Decision Trees, Genetic
Algorithm, Nearest Neighbor method etc., are used for knowledge discovery from
databases.

3.1 Classification
Classification is the most commonly applied data mining technique, which employs a set
of pre-classified examples to develop a model that can classify the population of records
at large. Fraud detection and credit- risk applications are particularly well suited to this
type of analysis. This approach frequently employs decision tree or neural network-based
classification algorithms. The data classification process involves learning and
classification. In Learning the training data are analyzed by classification algorithm. In
classification test data are used to estimate the accuracy of the classification rules. If the
accuracy is acceptable the rules can be applied to the new data tuples. For a fraud
detection application, this would include complete records of both fraudulent and valid
activities determined on a record-by-record basis. The classifier-training algorithm uses
these pre-classified examples to determine the set of parameters required for proper
discrimination. The algorithm then encodes these parameters into a model called a
classifier.

Types of classification models


 Classification by decision tree induction
 Bayesian Classification
 Neural Networks
 Support Vector Machines (SVM)
 Classification Based on Associations

3.2 Predication
Regression technique can be adapted for predication. Regression analysis can be used to
model the relationship between one or more independent variables and dependent
variables. In data mining independent variables are attributes already known and response
variables are what we want to predict. Unfortunately, many real-world problems are not
simply prediction. For instance, sales volumes, stock prices, and product failure rates are
all very difficult to predict because they may depend on complex interactions of multiple
predictor variables. Therefore, more complex techniques (e.g., logistic regression,
decision trees, or neural nets) may be necessary to forecast future values. The same
model types can often be used for both regression and classification. For example, the
CART (Classification and Regression Trees) decision tree algorithm can be used to build
both classification trees (to classify categorical response variables) and regression trees
(to forecast continuous response variables). Neural networks too can create both
classification and regression models.

Types of regression methods


 Linear Regression
 Multivariate Linear Regression
 Nonlinear Regression
 Multivariate Nonlinear Regression

3.3 Association Rule


Association and correlation is usually to find frequent item set findings among large data
sets. This type of finding helps businesses to make certain decisions, such as catalogue
design, cross marketing and customer shopping behavior analysis. Association Rule
algorithms need to be able to generate rules with confidence values less than one.
However the number of possible Association Rules for a given dataset is generally very
large and a high proportion of the rules are usually of little (if any) value.
Types of association rule
 Multilevel association rule
 Multidimensional association rule
 Quantitative association rule

3.4 Neural networks

Neural network is a set of connected input/output units and each connection has a weight
present with it. During the learning phase, network learns by adjusting weights so as to be
able to predict the correct class labels of the input tuples. Neural networks have the
remarkable ability to derive meaning from complicated or imprecise data and can be used to
extract patterns and detect trends that are too complex to be noticed by either humans or
other computer techniques. These are well suited for continuous valued inputs and
outputs. For example handwritten character reorganization, for training a computer to
pronounce English text and many real world business problems and have already been
successfully applied in many industries. Neural networks are best at identifying patterns or
trends in data and well suited for prediction or forecasting needs.

Types of neural networks


 Back Propagation

4.Data Mining Applications


Data mining is a relatively new technology that has not fully matured. Despite this,
there are a number of industries that are already using it on a regular basis. Some of these
organizations include retail stores, hospitals, banks, and insurance companies. Many of
these organizations are combining data mining with such things as statistics, pattern
recognition, and other important tools. Data mining can be used to find patterns and
connections that would otherwise be difficult to find. This technology is popular with
many businesses because it allows them to learn more about their customers and make
smart marketing decisions. Here is overview of business problems and solutions found
using data mining technology.

4.1 FBTO Dutch Insurance Company

Challenges
 To reduce direct mail costs.
 Increase efficiency of marketing campaigns.
 Increase cross-selling to existing customers, using inbound channels such as the
company’s sell center and the internet a one year test of the solution’s effectiveness.
Results
 Provided the marketing team with the ability to predict the effectiveness of its
campaigns.
 Increased the efficiency of marketing campaign creation, optimization, and execution.
 Decreased mailing costs by 35 percent.
 Increased conversion rates by 40 percent.

4.2 ECtel Ltd., Israel

Challenges
 Fraudulent activity in telecommunication
services.

Results
 Significantly reduced telecommunications fraud for more than 150
telecommunication companies worldwide.
 Saved money by enabling real-time fraud detection.

4.2 Provident Financial’s Home credit Division, United Kingdom

Challenges
 No system to detect and prevent
fraud.

Results
 Reduced frequency and magnitude of agent and customer fraud.
 Saved money through early fraud detection.
 Saved investigator’s time and increased prosecution rate.

4.3 Standard Life Mutual Financial Services Companies


Challenges
 Identify the key attributes of clients attracted to their mortgage offer.
 Cross sell Standard Life Bank products to the clients of other Standard Life companies.
 Develop a remortgage model which could be deployed on the group Web site to
examine the profitability of the mortgage business being accepted by Standard Life
Bank.

Results
 Built a propensity model for the Standard Life Bank mortgage offer identifying key
customer types that can be applied across the whole group prospect pool.
 Discovered the key drivers for purchasing a remortgage product.
 Achieved, with the model, a nine times greater response than that achieved by the
control group.
 Secured £33million (approx. $47 million) worth of mortgage application revenue.

4.4 Shenandoah Life insurance company United States.


Challenges
 Policy approval process was paper based and cumbersome.
 Routing of these paper copies to various departments, there was
delays in approval.

Results
 Empowered management with current information on pending policies.
 Reduced the time required to issue certain policies by 20 percent.
 Improved underwriting and employee performance review processes.

4.5 Soft map Company Ltd., Tokyo


Challenges
 Customers had difficulty making hardware and software purchasing decisions,
which was hindering online sales.
Results
 Page views increased 67 percent per month after the recommendation engine went live.
 Profits tripled in 2001, as sales increased 18 percent versus the same period in the
previous year.

4.6 Basic Facts in KNN


Data mining has attracted a great attention in the information industry and in society
as a whole in recent years, due to wide availability of huge amount of data and the
imminent need for turning such data into useful information and knowledge. The
information and knowledge gained can be used for application ranging from market
analysis, fraud detection, to production control, disaster management and science
exploration. Data mining can be viewed as a result of the natural evolution of
information technology. The database system industry has witnessed an evolutionary
path in the development of various functionalities: data collection and database
creation, database management (including data storage and retrieval, and database
transaction processing and advance data analysis Knowledge discovery as a process
consists of an iterative sequence of following steps:
1. Data cleaning:
That is to remove noise and inconsistent data.
2. Data integration:
That is where multiple data sources are combined.
3. Data selection:
That is where data relevant to the analysis task are retrieved from the database.
4. Data transformation:
That is where data are transformed or consolidated into forms appropriate for mining
by performing summary or aggregation operations.
5. Data mining:
That is an essential process where intelligent methods are applied in order to
extract the data patterns.
6. Knowledge presentation:
That is where visualization and knowledge representation techniques are used
to present the mined knowledge to the user.
Some of the applications of data mining are:
Data Mining For Financial Data Analysis In Banking Industry data mining is used:
1. Predicting Credit fraud
2. Evaluation Risk
3. Performing trend analysis
4. Analyzing profitability
5. Helping with direct marketing campaigns
In financial markets and neural networks data mining is used:
1. Forecasting stock prices
2. Forecasting commodity-price prediction
3. Forecasting financial disasters
Data Mining for Telecommunications Industry used:
1. How does one retain customers and keep them loyal as competitors offer special
offers and reduced rates?
2. When is a high-risk investment, such as new fiber optic lines, acceptable?
3. How does one predict whether customers will buy additional products like cellular
services, call waiting, or basic services?
4. What characteristics differentiate our products from those of our competitors?
Data Mining for the Retail Industry:
The retail industry is a major application area for data mining since it collects huge
amounts of data on sales, customer-shopping history, goods transportation,
consumption patterns, and service records.
1. What are the best types of advertisements to reach certain segments of customers?
2. What is the optimal timing at which to send mailers?
3. What types of products can be sold together?
4. How does one retain profitable customers?

5.Conclusion
Data mining has importance regarding finding the patterns,
forecasting, discovery of knowledge etc., in different business domains. Data
mining techniques and algorithms such as classification, clustering etc.,
helps in finding the patterns to decide upon the future trends in businesses to
grow. Data mining has wide application domain almost in every industry
where the data is generated that’s why data mining is considered one of the
most important frontiers in database and information systems and one of the
most promising interdisciplinary developments in Information Technology.
Data mining techniques and algorithms such as classification
clustering etc., helps in finding the patterns to decide upon the future trends in
businesses. In this study, the basic concept of clustering and clustering techniques
are given. Data mining has wide application domain almost in every industry
where the data is generated that’s why data mining is considered one of the most
important frontiers in database and information systems and one of the most
promising interdisciplinary developments in Information Technology.

6. References
1.Jiawei Han and Micheline Kamber (2006), Data Mining Concepts and
Techniques, published by Morgan Kauffman, 2nd ed.
2.Dr. Gary Parker, vol 7, 2004, Data Mining: Modules in emerging fields,
CD-ROM.
3.Crisp-DM 1.0 Step by step Data Mining guide from http://www.crisp-
dm.org/CRISPWP- 0800.pdf.
4.CustomerSuccesses in your industry from http://www.spss.com/success/?
source=homepage&hpzone=nav_bar.
5.https://www.allbusiness.com/Technology /computer-software-data-
management/ 633425-1.html, last retrieved on 15th Aug 2010.
301
302
303
.
3.1 FBTO Dutch Insurance Company
Challenges
 To reduce direct mail costs.
 Increase efficiency of marketing campaigns.
 Increase cross-selling to existing customers, using inbound channels such as the company’s sell center
and the internet a one year test of the solution’s effectiveness.
Results
 Provided the marketing team with the ability to predict the effectiveness of its campaigns.
 Increased the efficiency of marketing campaign creation, optimization, and execution.
 Decreased mailing costs by 35 percent.
 Increased conversion rates by 40 percent.
1.1. ECtel Ltd., Israel
Challenges
 Fraudulent activity in telecommunication services.
Results
 Significantly reduced telecommunications fraud for more than 150 telecommunication companies
worldwide.
 Saved money by enabling real-time fraud detection.
1.2. Provident Financial’s Home credit Division, United Kingdom
Challenges
 No system to detect and prevent fraud.
Results
 Reduced frequency and magnitude of agent and customer fraud.
 Saved money through early fraud detection.
 Saved investigator’s time and increased prosecution rate.
1.3. Standard Life Mutual Financial Services Companies
Challenges
 Identify the key attributes of clients attracted to their mortgage offer.
 Cross sell Standard Life Bank products to the clients of other Standard Life companies.
 Develop a remortgage model which could be deployed on the group Web site to examine the
profitability of the mortgage business being accepted by Standard Life Bank.

304
Results
 Built a propensity model for the Standard Life Bank mortgage offer identifying key customer
types that can be applied across the whole group prospect pool.
 Discovered the key drivers for purchasing a remortgage product.
 Achieved, with the model, a nine times greater response than that achieved by the control group.
 Secured £33million (approx. $47 million) worth of mortgage application revenue.
1.4. Shenandoah Life insurance company United States.
Challenges
 Policy approval process was paper based and cumbersome.
 Routing of these paper copies to various departments, there was delays in
approval. Results
 Empowered management with current information on pending policies.
 Reduced the time required to issue certain policies by 20 percent.
 Improved underwriting and employee performance review processes.
1.5. Soft map Company Ltd., Tokyo
Challenges
 Customers had difficulty making hardware and software purchasing decisions, which was
hindering online sales.
Results
 Page views increased 67 percent per month after the recommendation engine went live.
 Profits tripled in 2001, as sales increased 18 percent versus the same period in the previous year.
2. Conclusion
Data mining has importance regarding finding the patterns, forecasting, discovery of knowledge
etc., in different business domains. Data mining techniques and algorithms such as classification,
clustering etc., helps in finding the patterns to decide upon the future trends in businesses to grow. Data
mining has wide application domain almost in every industry where the data is generated that’s why data
mining is considered one of the most important frontiers in database and information systems and one of
the most promising interdisciplinary developments in Information Technology.
3. References
1. Jiawei Han and Micheline Kamber (2006), Data Mining Concepts and Techniques, published by Morgan
Kauffman, 2nd ed.
2. Dr. Gary Parker, vol 7, 2004, Data Mining: Modules in emerging fields, CD-ROM.
3. Crisp-DM 1.0 Step by step Data Mining guide from http://www.crisp-dm.org/CRISPWP-0800.pdf.
4. Customer Successes in your industry from http://www.spss.com/success/?source=homepage&hpzone=nav_bar.
5. https://www.allbusiness.com/Technology /computer-software-data-management/ 633425-1.html, last retrieved
on 15th Aug 2010.
http://www.kdnuggets.com/.

305

You might also like