Data Mining Techniques and Applications
Data Mining Techniques and Applications
APPLICATIONS
1.Abstract
Data mining techniques are set of algorithms intended to find the hidden
knowledge from the data. Usage of data mining techniques will purely depend on the
problem we were going to solve. Some of the popular data mining techniques are
classification algorithms, prediction analysis algorithms, clustering techniques. In this initial
introduction post, we were going to address the basic understanding of the term data
mining by presenting you a toy kind of example.
Data mining is a logical process that is used to search through large amount of
data in order to find useful data. The goal of this technique is to find patterns that were
previously unknown. Once these patterns are found they can further be used to make
certain decisions for development of their businesses.
Three steps involved are
Exploration
Pattern identification
Deployment
Exploration
In the first step of data exploration data is cleaned and transformed into
another form, and important variables and then nature of data based on the problem are
determined.
Pattern Identification
Once data is explored, refined and defined for the specific variables the
second step is to form pattern identification. Identify and choose the patterns which make the
best prediction.
Deployment
Patterns are deployed for desired outcome.
3.1 Classification
Classification is the most commonly applied data mining technique, which employs a set
of pre-classified examples to develop a model that can classify the population of records
at large. Fraud detection and credit- risk applications are particularly well suited to this
type of analysis. This approach frequently employs decision tree or neural network-based
classification algorithms. The data classification process involves learning and
classification. In Learning the training data are analyzed by classification algorithm. In
classification test data are used to estimate the accuracy of the classification rules. If the
accuracy is acceptable the rules can be applied to the new data tuples. For a fraud
detection application, this would include complete records of both fraudulent and valid
activities determined on a record-by-record basis. The classifier-training algorithm uses
these pre-classified examples to determine the set of parameters required for proper
discrimination. The algorithm then encodes these parameters into a model called a
classifier.
3.2 Predication
Regression technique can be adapted for predication. Regression analysis can be used to
model the relationship between one or more independent variables and dependent
variables. In data mining independent variables are attributes already known and response
variables are what we want to predict. Unfortunately, many real-world problems are not
simply prediction. For instance, sales volumes, stock prices, and product failure rates are
all very difficult to predict because they may depend on complex interactions of multiple
predictor variables. Therefore, more complex techniques (e.g., logistic regression,
decision trees, or neural nets) may be necessary to forecast future values. The same
model types can often be used for both regression and classification. For example, the
CART (Classification and Regression Trees) decision tree algorithm can be used to build
both classification trees (to classify categorical response variables) and regression trees
(to forecast continuous response variables). Neural networks too can create both
classification and regression models.
Neural network is a set of connected input/output units and each connection has a weight
present with it. During the learning phase, network learns by adjusting weights so as to be
able to predict the correct class labels of the input tuples. Neural networks have the
remarkable ability to derive meaning from complicated or imprecise data and can be used to
extract patterns and detect trends that are too complex to be noticed by either humans or
other computer techniques. These are well suited for continuous valued inputs and
outputs. For example handwritten character reorganization, for training a computer to
pronounce English text and many real world business problems and have already been
successfully applied in many industries. Neural networks are best at identifying patterns or
trends in data and well suited for prediction or forecasting needs.
Challenges
To reduce direct mail costs.
Increase efficiency of marketing campaigns.
Increase cross-selling to existing customers, using inbound channels such as the
company’s sell center and the internet a one year test of the solution’s effectiveness.
Results
Provided the marketing team with the ability to predict the effectiveness of its
campaigns.
Increased the efficiency of marketing campaign creation, optimization, and execution.
Decreased mailing costs by 35 percent.
Increased conversion rates by 40 percent.
Challenges
Fraudulent activity in telecommunication
services.
Results
Significantly reduced telecommunications fraud for more than 150
telecommunication companies worldwide.
Saved money by enabling real-time fraud detection.
Challenges
No system to detect and prevent
fraud.
Results
Reduced frequency and magnitude of agent and customer fraud.
Saved money through early fraud detection.
Saved investigator’s time and increased prosecution rate.
Results
Built a propensity model for the Standard Life Bank mortgage offer identifying key
customer types that can be applied across the whole group prospect pool.
Discovered the key drivers for purchasing a remortgage product.
Achieved, with the model, a nine times greater response than that achieved by the
control group.
Secured £33million (approx. $47 million) worth of mortgage application revenue.
Results
Empowered management with current information on pending policies.
Reduced the time required to issue certain policies by 20 percent.
Improved underwriting and employee performance review processes.
5.Conclusion
Data mining has importance regarding finding the patterns,
forecasting, discovery of knowledge etc., in different business domains. Data
mining techniques and algorithms such as classification, clustering etc.,
helps in finding the patterns to decide upon the future trends in businesses to
grow. Data mining has wide application domain almost in every industry
where the data is generated that’s why data mining is considered one of the
most important frontiers in database and information systems and one of the
most promising interdisciplinary developments in Information Technology.
Data mining techniques and algorithms such as classification
clustering etc., helps in finding the patterns to decide upon the future trends in
businesses. In this study, the basic concept of clustering and clustering techniques
are given. Data mining has wide application domain almost in every industry
where the data is generated that’s why data mining is considered one of the most
important frontiers in database and information systems and one of the most
promising interdisciplinary developments in Information Technology.
6. References
1.Jiawei Han and Micheline Kamber (2006), Data Mining Concepts and
Techniques, published by Morgan Kauffman, 2nd ed.
2.Dr. Gary Parker, vol 7, 2004, Data Mining: Modules in emerging fields,
CD-ROM.
3.Crisp-DM 1.0 Step by step Data Mining guide from http://www.crisp-
dm.org/CRISPWP- 0800.pdf.
4.CustomerSuccesses in your industry from http://www.spss.com/success/?
source=homepage&hpzone=nav_bar.
5.https://www.allbusiness.com/Technology /computer-software-data-
management/ 633425-1.html, last retrieved on 15th Aug 2010.
301
302
303
.
3.1 FBTO Dutch Insurance Company
Challenges
To reduce direct mail costs.
Increase efficiency of marketing campaigns.
Increase cross-selling to existing customers, using inbound channels such as the company’s sell center
and the internet a one year test of the solution’s effectiveness.
Results
Provided the marketing team with the ability to predict the effectiveness of its campaigns.
Increased the efficiency of marketing campaign creation, optimization, and execution.
Decreased mailing costs by 35 percent.
Increased conversion rates by 40 percent.
1.1. ECtel Ltd., Israel
Challenges
Fraudulent activity in telecommunication services.
Results
Significantly reduced telecommunications fraud for more than 150 telecommunication companies
worldwide.
Saved money by enabling real-time fraud detection.
1.2. Provident Financial’s Home credit Division, United Kingdom
Challenges
No system to detect and prevent fraud.
Results
Reduced frequency and magnitude of agent and customer fraud.
Saved money through early fraud detection.
Saved investigator’s time and increased prosecution rate.
1.3. Standard Life Mutual Financial Services Companies
Challenges
Identify the key attributes of clients attracted to their mortgage offer.
Cross sell Standard Life Bank products to the clients of other Standard Life companies.
Develop a remortgage model which could be deployed on the group Web site to examine the
profitability of the mortgage business being accepted by Standard Life Bank.
304
Results
Built a propensity model for the Standard Life Bank mortgage offer identifying key customer
types that can be applied across the whole group prospect pool.
Discovered the key drivers for purchasing a remortgage product.
Achieved, with the model, a nine times greater response than that achieved by the control group.
Secured £33million (approx. $47 million) worth of mortgage application revenue.
1.4. Shenandoah Life insurance company United States.
Challenges
Policy approval process was paper based and cumbersome.
Routing of these paper copies to various departments, there was delays in
approval. Results
Empowered management with current information on pending policies.
Reduced the time required to issue certain policies by 20 percent.
Improved underwriting and employee performance review processes.
1.5. Soft map Company Ltd., Tokyo
Challenges
Customers had difficulty making hardware and software purchasing decisions, which was
hindering online sales.
Results
Page views increased 67 percent per month after the recommendation engine went live.
Profits tripled in 2001, as sales increased 18 percent versus the same period in the previous year.
2. Conclusion
Data mining has importance regarding finding the patterns, forecasting, discovery of knowledge
etc., in different business domains. Data mining techniques and algorithms such as classification,
clustering etc., helps in finding the patterns to decide upon the future trends in businesses to grow. Data
mining has wide application domain almost in every industry where the data is generated that’s why data
mining is considered one of the most important frontiers in database and information systems and one of
the most promising interdisciplinary developments in Information Technology.
3. References
1. Jiawei Han and Micheline Kamber (2006), Data Mining Concepts and Techniques, published by Morgan
Kauffman, 2nd ed.
2. Dr. Gary Parker, vol 7, 2004, Data Mining: Modules in emerging fields, CD-ROM.
3. Crisp-DM 1.0 Step by step Data Mining guide from http://www.crisp-dm.org/CRISPWP-0800.pdf.
4. Customer Successes in your industry from http://www.spss.com/success/?source=homepage&hpzone=nav_bar.
5. https://www.allbusiness.com/Technology /computer-software-data-management/ 633425-1.html, last retrieved
on 15th Aug 2010.
http://www.kdnuggets.com/.
305