0% found this document useful (0 votes)

0 views

DataMining

The document discusses data mining as a crucial analytical tool for decision-making, emphasizing its role in extracting useful information from large datasets. It outlines the data mining life cycle, including phases such as business understanding, data preparation, modeling, evaluation, and deployment. The paper also categorizes data mining systems and highlights various tasks and techniques involved in the data mining process.

Uploaded by

worlddependsonme

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

DataMining

Uploaded by

worlddependsonme

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/340540711

DATA MINING -A DOMAIN SPECIFIC ANALYTICAL TOOL FOR DECISION

MAKING Keywords:Data mining life cycle, Data mining Methods, KDD,
Visualization of the data mining model

Article in International Journal of Advance Research in Computer Science and Management · January 2015

CITATIONS READS
0 308

3 authors:

Prakash Chandra Behera Chinmaya Dash

ST.CLARET COLLEGE ,BANGALORE St. Claret College, Bangalore
15 PUBLICATIONS 12 CITATIONS 12 PUBLICATIONS 11 CITATIONS

SEE PROFILE SEE PROFILE

Somanjoli Mohapatra
St. Claret College, Bangalore
4 PUBLICATIONS 13 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Chapter View project

All content following this page was uploaded by Prakash Chandra Behera on 10 April 2020.

The user has requested enhancement of the downloaded file.

ISSN 2347 - 3983

DATA MINING - A DOMAIN SPECIFIC ANALYTICAL TOOL FOR DECISION

MAKING
Ms. Somanjoli Mohapatra1, Dr. D Ramesh2, Mr. Chinmaya Dash3, Mr. Prakash Chandra Behera4
1
Assistant Professor, Department of Computer Science, St. Claret College,Bangalore, Karnataka, India
[email protected]
2
Professor & HOD, Department of MCA, SSIT, Tumkur, Karnataka, India
[email protected]
3
Assistant Professor, Department of Computer Science, St. Claret College, Bangalore, Karnataka, India
[email protected]
4
Assistant Professor, Department of Computer Science, St. Claret College, Bangalore, Karnataka, India
[email protected]

ABSTRACT information stored, and the discovery of patterns in raw data.

With the enormous amount of data stored in files, databases,
In 21st century, human beings use huge in day-to-day and other repositories, it is increasingly important, to develop
transaction in various fields.These data may be in the form of powerful tool for analysis and interpretation of such data and
documents, graphical formats,the video or records.To analyse for the extraction of interesting knowledge that could help in
these data and hence forth taking effective managerial decision-making. The only answer to all above is ‘Data
decision, the techniques- data warehouse and data mining are Mining’. Data mining is the extraction of hidden predictive
used. To analyze, manage and make a decision of such type of information from large databases; it is a powerful technology
huge amount of data we need techniques called data mining. with great potential to help organizations focus on the most
Data Mining can be defined as the process of extracting important information in their data warehouses. Data mining
interesting, interpretable, useful and novel information from tools predict future trends and behaviours, helps organizations
data. It has been used by businesses, scientists and to make proactive knowledge-driven decisions. The
governments to sift through volumes of data like airline automated, prospective analyses offered by data mining move
passenger records, census data and the supermarket scanner beyond the analyses of past events provided by prospective
data. Data mining is also known as Knowledge Discovery in tools typical of decision support systems. Data mining tools
Databases (KDD) which is the nontrivial extraction of implicit, find and produce hidden patterns, information that experts may
previously unknown and potentially useful information from miss because it lies outside their expectations.
data in databases. This paper focuses on applications of data
mining. 2. DATA MINING TASK
Keywords:Data mining life cycle, Data mining Methods, The data mining tasks are classified as:
KDD, Visualization of the data mining model
2.1 Exploratory Data Analysis:
1. INTRODUCTION
This data mining task will serve the two purposes which are
In information technology the discovery of useful information interactive and visual to customer.
from large collections of data industry towards field of “Data
mining” is due to the perception of “we are data rich but  without the knowledge for what the customer is searching
information poor”. There is huge volume of data but we hardly  it analyses the data
able to turn them in to useful information and knowledge for
managerial decision making in business. To generate 2.2 DescriptiveModelling:
information it requires massive collection of data. It may be in
different formats like audio/video, numbers, text, figures and It describes overall probability distribution of the data,
Hypertext formats. To take complete advantage of data; the partitioning of the p-dimensional space into groups and models
data retrieval is simply not enough, it requires a tool for describing the relationships between the variables.
automatic summarization of data, extraction of the essence of
157
ISSN 2347 - 3983

2.4 Discovering Patterns and Rules: 4. DATA MINING LIFE CYCLE

This task is primarily used to find the hidden pattern as well as The life cycle of a data mining consists of six phases.
to discover the pattern in the cluster. In a cluster a number of
patterns of different size and clusters are available.This can be 4.1 Business Understanding:
accomplished by using rule induction and different algorithm
called clustering algorithm. This phase focuses on understanding the objectives and
requirements from a business perspective, then converting this
2.5 Retrieval by Content: knowledge into a data mining problem definition and a
preliminary plan designed to achieve the objectives.
The primary objective of this task is to find the data sets of
frequently used for audio/video as well as images. 4.1.1 Determine business objectives
The first objective of the data analyst is to understand
3. TYPES OF DATA MINING SYSTEM thoroughly, from a business perspective, what the client really
wants to accomplish. The analyst’s goal is to uncover
Data mining systems can be categorized as follows: important factors, at the beginning, that can influence the
outcome of the project.
3.1 Classification of data mining systems according to 4.1.2 Assess situation
the type of data source mined: This task involves more detailed fact-finding about all of the
resources, constraints, assumptions and other factors that
In an organization huge amount of data are available where we should be considered in determining the data analysis goaland
need to classify those data according to its type(may be project plan.
audio/video, text format). 4.1.3 Determine data mining goals
A business goal states objectives in business terms. A data
3.2 Classification of data mining systems according to the mining goal states project objectives in technical terms.
data model: 4.1.4 Produce project plan
Describe the intended plan for achieving the data mining goals
There are number of data mining models (Relational data and thereby achieving the business goals. The plan should
model, Object Model, Object Oriented data Model, specify the anticipated set of steps to be performed during the
Hierarchical data Model) are available. According to these rest of the project including an initial selection of tools and
data model the data mining system classify the data into techniques.
various models.

3.3 Classification of data mining systems according to the

kind of knowledge discovered:

This classification based on the knowledge discovered or

data mining functionalities, such as characterization,
discrimination, association, classification, clustering, etc.
Some systems tend to be comprehensive systems offering
several data mining functionalities together.

3.4 Classification of data mining systems according to

mining techniques used:

This classification is according to the data analysis approach

used such as machine learning, neural networks, genetic
algorithms, statistics, visualization, database oriented or data
warehouse-oriented, etc. The classification can also take into
account the degree of user interaction involved in the data

158
ISSN 2347 - 3983

In this phase, various modelling techniques are selected and

applied and their parameters are calibrated to optimal values.

FIGURE 2: Details of Data Mining Process Model

159
ISSN 2347 - 3983

International Journal of Emerging Trends in Engineering Research (IJETER), Vol. 3 No.6, Pages : 157- 167 (2015)
Special Issue of NCTET 2K15 - Held on June 13, 2015 in SV College of Engineering, Tirupati
http://warse.org/IJETER/static/pdf/Issue/NCTET2015sp32.pdf
customer can use it. The deployment phase can be generating a
4.4.1 Select modeling technique report or as complex as implementing a repeatable data mining
As the first step in modeling, select the actual modeling process across the enterprise.
technique to be used. If a tool was selected in business
understanding (Phase 1), this task refers to selecting the 4.6.1 Plan deployment
specific modeling technique, e.g., building decision trees or To deploy the data mining result(s) into the business, this task
generating a neural network. takes the evaluation results and develops a strategy for
4.4.2 Generate test design deployment. If a general procedure was identified to create the
Prior to building a model, a procedure needs to be defined to relevant model(s), this procedure is documented here for later
test the model’s quality and validity. If the test design specifies deployment.
that the dataset should be separated into training and test sets, 4.6.2 Plan monitoring and maintenance
the model is built on the training set and its quality estimated Monitoring and maintenance are important issues if the data
on the test set. mining result becomes part of the day-to-day business and its
4.4.3 Build model environment. To monitor the deployment of the data mining
The purpose of building models is to use the predictions to result(s), the project needs a detailed plan on the monitoring
make more informed business decisions. The most important process.
goal when building a model is stability, which means that the 4.6.3 Produce final report
model should make predictions that will hold true when it’s At the end of the project, the project leader and the team write
applied to yet unseen data. up a final report. Depending on the deployment plan, this
4.4.4 Assess model report may be only a summary of the project and its
The model should now be assessed to ensure that it meets the experiences or it may be a final and comprehensive
data mining success criteria and passes the desired test criteria. presentation of the data mining result(s).
This step is a purely technical assessment based on the 4.6.4 Review project
outcome of the modeling tasks. Assess what went right and what went wrong, what was done
well and what needs to be improved.
4.5 Evaluation:
5. VISUALIZING DATA MINING MODEL
In this stage the model is thoroughly evaluated and reviewed.
The steps executed to construct the model to be certain it The main objective of data visualization is the overall idea
properly achieves the business objectives. At the end of this about the data mining model.In data mining most of the times
phase, a decision on the use of the data mining results should we are retrieving the data from the repositories which are in
be reached. the hidden form. So visualization of the data mining model
helps us to provide levels of understanding and trust. The data
4.5.1 Evaluate results mining models are of two types: Predictive and Descriptive.
Previous evaluation steps dealt with factors such as the
accuracy and generality of the model. This step assesses the 5.1 Predictive Model:
degree to which the model meets the business objectives and
seeks to determine if there is some business reason why this It makes prediction about unknown data values by using the
chosen model is deficient. known values. Ex. Classification, Regression, Time series
4.5.2 Review process analysis, Prediction etc. Many of the data mining applications
At this point the resultant model appears to be satisfactory and are aimed to predict the future state of the data. Prediction is
appears to satisfy business needsat this stage of Data Mining, the process of analysing the current and past states of the
the Review Process takes on the form of a Quality Assurance. attribute and prediction of its future state.
4.5.3 Determine next steps
According to the assessment results and the process review, Classification is a technique of mapping the target data to the
the analyst decides how to proceed at this stage. The analyst predefined groups or classes, this is a supervise learning
needs to decide whether because the classes are predefined before the examination of
• to finish the project and move on to deployment (Phase 6) the target data. The regression involves the learning of
• to initiate further iterations or function that map data item to real valued prediction variable.
• to set up new data mining projects. In the time series analysis the value of an attribute is examined
as it varies over time. In time series analysis is used for many
4.6 Deployment: statistical techniques which will analyse the time-series data
such as auto regression methods etc.
This phase is used to increase knowledge and further the
knowledge will be organized and presented in a way that the
160
ISSN 2347 - 3983

It identifies the patterns or relationships in data and explores It consists of finding a model that describes significant
the properties of the data examined. Ex. Clustering, dependencies between variables. Dependency models exist at
Summarization, Association rule, Sequence discovery etc. two levels: (1) the structural level of the model specifies which
variables are locally dependent on each other and (2) the
The term clustering means analysing the different data objects quantitative level of the model specifies the strengths of the
without consulting a known class levels. It is also referred to dependencies using some numeric scale.
as unsupervised learning or segmentation. It is the partitioning
or segmentation of the data in to groups or clusters. The 6.6 Change and deviation detection:
clusters are defined by studying the behaviour of the data by It focuses on discovering the most significant changes in the
the domain experts. The term segmentation is a process of data from previously measured or normative values.
partitioning of database into disjoint grouping of similar 6.7 Decision trees and rules:
tuples. Summarization is the technique of presenting the
summarize information from the data. The association rule It use univariate splits have a simple representational form,
finds the association between the different attributes. making the inferred model relatively easy for the user to
Association rule mining is a two-step process: Finding all comprehend. However, the restriction to a particular tree or
frequent item sets, Generating strong association rules from rule representation can significantly restrict the functional
the frequent item sets. Sequence discovery is a process of form (and, thus, the approximation power) of the model.
finding the sequence patterns in data. This sequence can be
used to understand the trend. 6.8 Nonlinear Regression and Classification Methods:

6. METHODS OF DATA MINING These methods consist of a family of techniques for prediction
that fit linear and nonlinear combinations of functions to
Data mining methods are broadly classified as: On-Line combinations of the input variables.
Analytical Processing,(OLAP), Classification, Clustering,
Association Rule Mining, Temporal Data Mining, Time Series 6.9 Probabilistic Graphic Dependency Models:
Analysis, Spatial Mining, Web Mining etc. These methods use
different types of algorithms and data. The data source can be Graphic models specify probabilistic dependencies using a
data warehouse, database, flat file or text file. The algorithms graph structure. The model specifies which variables are
may be Statistical Algorithms, Decision Tree based, Nearest directly dependent on each other. Typically, these models are
Neighbour, Neural Network based, Genetic Algorithms based, used with categorical or discrete-valued variables, but
Ruled based, Support Vector Machine etc. extensions to special cases, such as Gaussian densities, for
real-valued variables are also possible.
6.1 Classification:
6.10 Relational Learning Models:
It is learning a function that maps (classifies) a data item into
one of several predefined classes Relational learning (also known as inductive logic
programming) uses the more flexible pattern language of first-
6.2 Regression: order logic. A relational learner can easily find formulas such
as X = Y. Most research to date on model-evaluation methods
It is learning a function that maps a data item to a real-valued for relational learning is logical in nature.
prediction variable.
Generally the data mining algorithms are fully dependent of
6.3 Clustering: the two factors these are
(i) Which type of data sets are using
It is a common descriptive task where one seeks to identify a (ii) What type of requirements are needed for user
finite set of categories or clusters to describe the data.
Knowledge discovery (KD) process involves pre-processing
6.4 Summarization: data, choosing a data-mining algorithm, and post processing
the mining results. The Intelligent Discovery Assistants (IDA),
It involves methods for finding a compact description for a helps users in applying valid knowledge discovery processes.
subset of data.

161
ISSN 2347 - 3983

So, KDD is a process of mapping low-level data into other

forms that might be more compact, more abstract, or more
useful. KDD refers to the overall process of discovering useful
knowledge from data, and data mining refers to a particular
step in this process. Data mining is the application of specific
algorithms for extracting patterns from data. The additional
steps in the KDD process, such as data preparation, data
selection, data cleaning, incorporation of appropriate prior
FIGURE 3: Data Mining Model knowledge, and proper interpretation of the results of mining,
8.1 Model representation: are essential to ensure that useful knowledge is derived from
the data. The unifying goal is extracting high-level knowledge
It is the language used to describe discoverable patterns. If the from low-level data in the context of large data sets. KDD
representation is too limited, then no amount of training time focuses on the overall process of knowledge discovery from
or examples can produce an accurate model for the data. It is data, including how the data are stored and accessed, how
important that a data analyst fully comprehend the algorithms can be scaled to massive data sets and still run
representational assumptions that might be inherent in a efficiently, how results can be interpreted and visualized, and
particular method and also algorithm designer clearly state how the overall man-machine interaction can usefully be
which representational assumptions are being made by a modeled and supported.
particular algorithm.
KDD is the nontrivial process of identifying valid, novel,
potentially useful, and ultimately understandable patterns in
8.2 Model evaluation:
data. Here, data are a set of facts, and pattern is an expression
in some language describing a subset of the data or a model
It criteria are quantitative statements of how well a particular
applicable to the subset. The term process implies that KDD
pattern (a model and its parameters) meets the goals of the comprises many steps, which involve data preparation,
KDD process. For example, predictive models are often searchfor patterns, knowledge evaluation, and refinement, all
judged by the empirical prediction accuracy on some test set repeated in multiple iterations. The process consists nine steps:
where descriptive models can be evaluated along the

162
ISSN 2347 - 3983

Develop an understanding of the application domain and the 9.9 Working on the discovered knowledge:
relevant prior knowledge and identifying the goal of the KDD
process from the customer’s viewpoint. Using the knowledge directly, incorporating the knowledge
into another system for further action, or simply documenting
9.2 Create a target data set: it and reporting it to interested parties. This process also
includes checking for and resolving potential conflicts with
Selecting a data set, or focusing on a subset of variables or previously believed (or extracted) knowledge.
data samples, on which discovery is to be performed.

9.3 Data cleaning and pre-processing:

Basic operations include removing noise if appropriate,

collecting the necessary information to model or account for
noise, deciding on strategies for handling missing data fields,
and accounting for time-sequence information and known
changes.

9.4 Data reduction and projection:

Finding useful features to represent the data depending on the

goal of the task. With dimensionality reduction or
transformation methods, the effective number of variables
under consideration can be reduced, or invariant FIGURE 4: KDD Process
representations for the data can be found.
10. DATA MINING AND KDD
9.5 Goal of KDD process:
KDD process is one of mapping of low-level data into other
Match the goals of the KDD process (step 1) to a particular forms that might be more compact, more abstract, or more
data-mining method. For example, summarization, useful. Data mining is a step in the KDD process that consists
classification, regression, clustering, and so on. of applying data analysis and discovery algorithms that
produce a particular enumeration of patterns (or models) over
9.6 Exploratory analysis and model and hypothesis the data.
selection:
Online Analytical Processing (OLAP) tool is used to apply
Choosing the datamining algorithm(s) and selecting method(s) data analysis in KDD process. OLAP tools focus on providing
to be used for searching for data patterns. This process multidimensional data analysis, which is superior to SQL in
includes deciding which models and parameters might be computing summaries and breakdowns along many
appropriate and matching a particular data-mining method dimensions. OLAP tools are targeted toward simplifying and
with the overall criteria of the KDD process supporting interactive data analysis, but the goal of KDD tools
is to automate asmuch of the process as possible. Thus, KDD
9.7 Data mining: is a step beyond what is currently supported by most standard
database systems.
Searching for patterns of interest in a particular
representational form or a set of such representations, 11. RULES OF DATA MINING
including classification rules or trees, regression, and
clustering. The user can significantly aid the data-mining  Business objectives are the origin of every data mining
method by correctly performing the preceding steps. solution (Business Goals Law)
 Business knowledge is central to every step of the data
9.8 Pattern interpretation:
mining process (Business Knowledge Law)
 Data preparation is more than half of every data mining
Interpreting mined patterns, possibly returning to any of steps
1 through 7 for further iteration. This step can also involve process (Data Preparation Law)

163
ISSN 2347 - 3983

International Journal of Emerging Trends in Engineering Research (IJETER), Vol. 3 No.6, Pages : 147- 150 (2015)
Special Issue of NCTET 2K15 - Held on June 13, 2015 in SV College of Engineering, Tirupati
http://warse.org/IJETER/static/pdf/Issue/NCTET2015sp32.pdf
 The right model for a given application can only be processes. In this current era we are using the KDD and the
discovered by experiment or There is No Free Lunch for data mining tools for extracting the knowledge. This
the Data Miner (NFL-DM) knowledge can be used for improving the quality of education.
 There are always patterns (Watkins Law)
 Data mining amplifies perception in the business domain 12.4Data mining in manufacturing engineering
(Insight Law)
 Prediction increases information locally by generalisation When we retrieve the data from manufacturing system then
(Prediction Law) the customer use these data for different purposes like to find
 The value of data mining results is not determined by the the errors in the data, to enhance the design methodology, to
accuracy or stability of predictive models (Value Law) make the good quality of the data, how best the data can be
 All patterns are subject to change (Law of Change) supported for making the decision. But most of the time, the
data can be first analysed then after finding the hidden patterns
manufacturing process can be controlled to enhance the quality
of the products.
12. DATA MINING APPLICATIONS
.
12.5Data Mining Applications can be generic or domain
12.1 Data Mining Applications in Healthcare
specific.
The success of healthcare data mining hinges on the Data mining system can be applied for generic or domain
availability of clean healthcare data. In this respect, it is specific. The multi agent based data mining application has
critical that the healthcare industry look into how data can be capability of automatic selection of data mining technique to
better captured, stored, prepared and mined. Possible be applied. The Multi Agent System used at different levels:
directions include the standardization of clinical vocabulary First, at the level of concept hierarchy definition then at the
and the sharing of data across organizations to enhance the result level to present the best adapted decision to the user.
benefits of healthcare data mining applications. As healthcare This decision is stored in knowledge Base to use in a later
data are quantitative data, it is necessary to also explore the use decision-making. Multi Agent System Tool used for generic
of text mining to expand the scope and nature of what data mining system development uses different agents to
healthcare data mining can currently do. This is specially used perform different tasks.
to mixed all the data and then mining the text.
12.6A multi-tier data mining system
12.2Data mining for market basket analysis
It consist basic components like user interface, data mining
Data mining technique is used in MBA (Market Basket services, data access services and the data. There are three
Analysis). When the customer want to buy some products then different architectures presented for the data mining system
this technique helps to find the associations between different namely one-tire, Two-tire and Three-tire architecture. Generic
items that the customer put in their shopping buckets. Here the system required to integrate as many learning algorithms as
discovery of such associations that promotes the business possible and decides the most appropriate algorithm to use.
technique. In this way the retailers uses the data mining CORBA (Common Object Request Broker Architecture)
technique so that they can identify that which customers allows reusability in a feasible way and finally it makes
intension. In this way this technique is used for profits of the possible to build large and scalable system.
business and also helps to purchase the related items.
12.7Data mining technique in CRM
12.3The data mining in education system
Data mining technique used in CRM aims to give a research
With huge number of higher education aspirants, we believe summary on the application of data mining in the CRM
that data mining technology can help bridging knowledge gap domain and techniques which are most often used.
in higher educational systems. The hidden patterns,
associations, and anomalies that are discovered by data mining
12.8The Domain Specific Applications
techniques from educational data can improve decision making
processes in higher educational systems. This improvement The domain specific applications are focused to use the
can bring advantages such as maximizing educational system domain specific data and data mining algorithm that targeted
efficiency, decreasing student's drop-out rate, and increasing for specific objective. The applications are aimed to generate
student's promotion rate, increasing student's retention rate in, the specific knowledge. In the different domains the data
increasing student's transition rate, increasing educational generating sources generate different type of data. Data can be
improvement ratio, increasing student's success, increasing from a simple text, numbers to more complex audio-video
student's learning outcome, and reducing the cost of system
164
ISSN 2347 - 3983

12.9In Medical Science 12.14The data mining system in Internal Revenue Service

The use of data mining in health care is the widely used The data mining system implemented at the Internal Revenue
application of data mining. The medical data is complex and Service to identify high-income individuals engaged in abusive
difficult to analyse. A REMIND (Reliable Extraction and tax shelters show significantly good results. The major lines of
Meaningful Inference from Non-structured Data) system investigation included visualization of the relationships and
integrates the structured and unstructured clinical data in data mining to identify and rank possibly abusive tax
patient records to automatically create high quality structured avoidance transactions. To enhance the quality of product data
clinical data. mining techniques can be used effectively.

12.10Data Mining in the Web Education 12.15E-commerce

Data mining methods are used in the web Education which is E-commerce is also the most prospective domain for data
used to improve courseware. The relationships are discovered mining because data records are plentiful, electronic collection
among the usage data picked up during students’ sessions. This provides reliable data, insight can easily be turned into action,
knowledge is very useful for the teacher or the author of the and return on investment can be measured. The integration of
course, who could decide what modifications will be the most e-commerce and data mining significantly improve the results
appropriate to improve the effectiveness of the course. and guide the users in generating knowledge and making
correct business decisions. This integration effectively solves
12.11The Intrusion Detection in the Network several major problems associated with horizontal data mining
tools including the enormous effort required in pre-processing
The data mining method is used to classify the network traffic of the data before it can be used for mining, and making the
normal traffic or abnormal traffic. If any TCP header does not results of mining actionable.
belong to any of the existing TCP header clusters, then it can
be considered as anomaly. The data mining methods used to 12.16The Digital Library Retrieves
accurately detect malicious executables before they run.
The data mining application can be used in the field of the
12.12Sports data mining Digital Library where the user will finds or collects, stores and
preserves the data which are in the form of digital mode. The
In the world, a huge number of games are available where each advent of electronic resources and their increased use in
and every day the national and international games are to be libraries has brought about significant changes in Library. The
scheduled, where a huge number of data are to be maintained. data and information are available in the different formats.
The data mining tools are applied to give the information as These formats include Text, Images, Video, Audio, Picture,
and when we required. Data mining tools like WEKA and Maps, etc. therefore digital library is a suitable domain for
RAPID MINER are frequently used for sport. In the game application of data mining.
sports the data are available in the statistical form where data
mining can be used and discover the patterns, these patterns
are often used to predict the future forecast. Data mining can 12.17The prediction in engineering applications
be used for scouting, prediction of performance, selection of
players, coaching and training and for the strategy planning. The prediction in engineering applications was treated
effectively by a data mining approach. The prediction
12.13The Intelligence Agencies problems like the cost estimation problem in engineering, the
problem of engineering design that involves decisions where
The Intelligence Agencies collect and analyse information to parameters, actions, components, and so on are selected. Data
investigate terrorist activities. One challenge to law mining technique is used for the variety of the parameters in
enforcement and intelligent agencies is the difficulty of the field of engineering applications like prior data. Once we
analysing large volume of data involve in criminal and terrorist gather the data then we can generate the different models,
activities. Now a day the intelligence agency are using the algorithms which will predict different characteristic.
sophisticated data mining algorithms which makes it easy, to
handle the very large databases for organizations. The different 13. CONCLUSION
data mining techniques are used in crime data mining. In data
mining the Clustering techniques are used for the different In this paper we briefly reviewed the various data mining
objects in crime records. Data mining detects and analyses the applications. This review would be helpful to researchers to
165
ISSN 2347 - 3983

International Journal of Emerging Trends in Engineering Research (IJETER), Vol. 3 No.6, Pages : 147- 150 (2015)
Special Issue of NCTET 2K15 - Held on June 13, 2015 in SV College of Engineering, Tirupati
http://warse.org/IJETER/static/pdf/Issue/NCTET2015sp32.pdf
focus on the various issues of data mining. Most of the data DaimlerChrysler AG (Germany), SPSS Inc. (USA) and
mining applications in various fields use the variety of data OHRA Verzekeringenen Bank Group B.V (The
types range from text to images and stores in variety of Netherlands), 2000”.
databases and data structures. The different methods of data [5] Fayyad, U., Piatetsky-Shapiro, G., and Smyth P., “From
mining are used to extract the patterns and thus the knowledge Data Mining to Knowledge Discovery in Databases,”
from this variety databases. Selection of data and methods for AI Magazine, American Association for Artificial
data mining is an important task in this process and needs the Intelligence, 1996.
knowledge of the domain. Several attempts have been made to [6] Tan Pang-Ning, Steinbach, M., Vipin Kumar.
design and develop the generic data mining system but no “Introduction to Data Mining”, Pearson Education,
system found completely generic. New Delhi, ISBN: 978-81-317-1472-0, 3rd Edition, 2009.
[7] Bernstein, A. and Provost, F., “An Intelligent Assistant
Thus, for every domain the domain expert’s assistant is for the Knowledge Discovery Process”, Working Paper
mandatory. The domain experts shall be guided by the system of the Center for Digital Economy Research, New York
to effectively apply their knowledge for the use of data mining University and also presented at the IJCAI 2001
systems to generate required knowledge. The domain experts Workshop on Wrappers for Performance Enhancement in
are required to determine the variety of data that should be Knowledge Discovery in Databases.
collected in the specific problem domain, selection of specific [8] Baazaoui, Z., H., Faiz, S., and Ben Ghezala, H., “A
data for data mining, cleaning and transformation of data, Framework for Data Mining Based Multi-Agent: An
extracting patterns for knowledge generation and finally Application to Spatial Data, volume 5, ISSN 1307-
interpretation of the patterns and knowledge generation. Most 6884,” Proceedings of World Academy of Science,
of the domain specific data mining applications show accuracy Engineering and Technology, April 2005.
above 90%. The generic data mining applications are having [9] Rantzau, R. and Schwarz, H., “A Multi-Tier
the limitations. From the study of various data mining Architecture for High-Performance Data Mining,A
applications it is observed that, no application called generic Technical Project Report of ESPRIT project, The
application is 100 % generic. The intelligent interfaces and consortium of CRITIKAL project, Attar Software Ltd.
intelligent agents up to some extent make the application (UK), Gehe AG (Denmark); Lloyds TSB Group (UK),
generic but have limitations. Parallel Applications Centre, University of Southampton
(UK), BWI, University of Stuttgart (Denmark), IPVR,
The domain experts play important role in the different stages University of Stuttgart (Denmark)”.
of data mining. The decisions at different stages are influenced [10] Botia, J. A., Garijo, M. y Velasco, J. R., Skarmeta, A. F.,
by the factors like domain and data details, aim of the data “A Generic Data mining System basic design and
mining, and the context parameters. The domain specific implementation guidelines”, A Technical Project Report
applications are aimed to extract specific knowledge. The of CYCYTprojectofSpanishGovernment.1998.WebSite:
domain experts by considering the user’s requirements and http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.
other context parameters guide the system. Therefore it is 53.1935
concluded that the domain specific applications are more [11] Agrawal, R., and Psaila, G. 1995. Active Data Mining.
specific for data mining. In Proceedings of the First International Conference on
Knowledge Discovery and Data Mining(KDD-95), 3–8.
Menlo Park, Calif.: American Association for Artificial
14. REFERENCES Intelligence.
[12] Agrawal, R.; Mannila, H.; Srikant, R.; Toivonen, H.;and
[1] Introduction to Data Mining and Knowledge Verkamo, I. 1996. Fast Discovery of AssociationRules.
Discovery, Third Edition ISBN: 1-892095-02-5, Two In Advances in Knowledge Discovery and DataMining,
Crows Corporation, 10500 Falls Road, Potomac, MD eds. U. Fayyad, G. Piatetsky-Shapiro, P.Smyth, and R.
20854 (U.S.A.), 1999. Uthurusamy, 307–328. Menlo Park Calif.: AAAI Press.
[2] Larose, D. T., “Discovering Knowledge in Data: An [13] Brachman, R., and Anand, T. 1996. The Process of
Introduction to Data Mining”, ISBN 0-471-66657-2, Knowledge Discovery in Databases: A Human-
John Wiley & Sons, Inc, 2005. Centered Approach. In Advances in Knowledge
[3] Dunham, M. H., Sridhar S., “Data Mining: Discoveryand Data Mining, 37–58, eds. U. Fayyad, G.
Introductory and Advanced Topics”, Pearson Piatetsky- Shapiro, P. Smyth, and R. Uthurusamy. Menlo
Education, New Delhi, ISBN: 81-7758-785-4, 1st Park, Calif.: AAAI Press.
Edition, 2006 [14] Berry, M. J., Linoff, G. S. (2000), “Mastering Data
[4] Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Mining: The Art and Science of Customer
Reinartz, T., Shearer, C. and Wirth, R... “CRISP-DM 1.0 Relationship Management”. Wiley Computer
: Step-by-step data mining guide, NCR Systems Publishing, New York.
Engineering Copenhagen (USA and Denmark),
166
ISSN 2347 - 3983

167

View publication stats

ComfortStar CCI-CHI-2013
No ratings yet
ComfortStar CCI-CHI-2013
5 pages
Vaquero User Manual
No ratings yet
Vaquero User Manual
21 pages
Chapter 9
No ratings yet
Chapter 9
130 pages
Data Mining Applications and Feature Scope Survey
No ratings yet
Data Mining Applications and Feature Scope Survey
5 pages
The Survey of Data Mining Applications and Feature Scope
No ratings yet
The Survey of Data Mining Applications and Feature Scope
16 pages
DATAMINING AND DATAWAREHOUSEAN IN-DEPTH REVIEW
No ratings yet
DATAMINING AND DATAWAREHOUSEAN IN-DEPTH REVIEW
14 pages
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
No ratings yet
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
6 pages
Data Scientist Skills
No ratings yet
Data Scientist Skills
11 pages
Data Mining - Past Present and Future - A Typical
No ratings yet
Data Mining - Past Present and Future - A Typical
10 pages
Data Mining New Notes Unit 3 PDF
No ratings yet
Data Mining New Notes Unit 3 PDF
12 pages
Literature Review On Big Data Analytics Vishal Kumar Harsh Bansal
No ratings yet
Literature Review On Big Data Analytics Vishal Kumar Harsh Bansal
6 pages
Data Mining System and Applications A Re
No ratings yet
Data Mining System and Applications A Re
13 pages
Proposal PHD CSE Synopsis On Sentimant Analysis For Blogs
100% (1)
Proposal PHD CSE Synopsis On Sentimant Analysis For Blogs
16 pages
A Review On Python Libraries and Ides For Data Science: November 2021
No ratings yet
A Review On Python Libraries and Ides For Data Science: November 2021
19 pages
AS C I T T D M: Tudy ON Omputational Ntelligence Echniques O ATA Ining
No ratings yet
AS C I T T D M: Tudy ON Omputational Ntelligence Echniques O ATA Ining
13 pages
METHODOLOGY Analitis Data Raya Sektor Awam (DRSA) PDF
No ratings yet
METHODOLOGY Analitis Data Raya Sektor Awam (DRSA) PDF
24 pages
METHODOLOGY Analitis Data Raya Sektor Awam (DRSA)
100% (1)
METHODOLOGY Analitis Data Raya Sektor Awam (DRSA)
24 pages
Big Data: How To Handle: A Survey: Dinesh MCA Deptt. PDM University, Bahadurgarh ABC MCA Deptt
No ratings yet
Big Data: How To Handle: A Survey: Dinesh MCA Deptt. PDM University, Bahadurgarh ABC MCA Deptt
8 pages
Techniques Softwares 1
No ratings yet
Techniques Softwares 1
7 pages
UNIT 3 DWM NOTES
No ratings yet
UNIT 3 DWM NOTES
17 pages
B SC (IT) VI-DSE3-M5
No ratings yet
B SC (IT) VI-DSE3-M5
13 pages
A Novel Approach For Understanding Ideology Behind Managing Data
No ratings yet
A Novel Approach For Understanding Ideology Behind Managing Data
11 pages
Ijitcs V10 N1 4
No ratings yet
Ijitcs V10 N1 4
9 pages
Big Data Manual - Edited
No ratings yet
Big Data Manual - Edited
69 pages
Data Mining Using Neural Networks: Miss. Mukta Arankalle
No ratings yet
Data Mining Using Neural Networks: Miss. Mukta Arankalle
36 pages
Winter Project Main
No ratings yet
Winter Project Main
176 pages
Big Data A Survey Dinesh
No ratings yet
Big Data A Survey Dinesh
9 pages
Pub Res Feb 20231
No ratings yet
Pub Res Feb 20231
5 pages
2023 core
No ratings yet
2023 core
16 pages
Paper Ljupce Markusheski PHD
No ratings yet
Paper Ljupce Markusheski PHD
12 pages
Data Mining Report (Final) 1
50% (2)
Data Mining Report (Final) 1
44 pages
Paper 1285
No ratings yet
Paper 1285
8 pages
5104 - 07.S. L. Nalawade1
No ratings yet
5104 - 07.S. L. Nalawade1
5 pages
Design and Analysis of DWH and BI in Education Dom
No ratings yet
Design and Analysis of DWH and BI in Education Dom
8 pages
Computing for Data Ysis
No ratings yet
Computing for Data Ysis
230 pages
Big Data Concept Handling and Challenges An Overvi
No ratings yet
Big Data Concept Handling and Challenges An Overvi
5 pages
Unit 5(DS)
No ratings yet
Unit 5(DS)
15 pages
DWDM Notes - Unit 1
No ratings yet
DWDM Notes - Unit 1
26 pages
Algorithm and Approaches To Handle Large Data-A Survey
No ratings yet
Algorithm and Approaches To Handle Large Data-A Survey
5 pages
Genetic Algorithms For Multi-Criterion Classification and Clustering in Data Mining
No ratings yet
Genetic Algorithms For Multi-Criterion Classification and Clustering in Data Mining
12 pages
Development of Privacy-Preservation of Big Data With Support of Hyperledger Fabric and IPFS
No ratings yet
Development of Privacy-Preservation of Big Data With Support of Hyperledger Fabric and IPFS
6 pages
Chapter-3 DATA MINING PDF
No ratings yet
Chapter-3 DATA MINING PDF
13 pages
DMDW Case Study Finished
No ratings yet
DMDW Case Study Finished
28 pages
Data Mining in Banking and Its Applications - A Rev
No ratings yet
Data Mining in Banking and Its Applications - A Rev
9 pages
Big Data Analytics and Its Applications: Annals of Emerging Technologies in Computing October 2017
No ratings yet
Big Data Analytics and Its Applications: Annals of Emerging Technologies in Computing October 2017
11 pages
V3N2 121 PDF
No ratings yet
V3N2 121 PDF
4 pages
DMC 1628 Data Warehousing and Data Mining
No ratings yet
DMC 1628 Data Warehousing and Data Mining
192 pages
IRJET-V8I596
No ratings yet
IRJET-V8I596
10 pages
Challenging Tools On Research Issues in Big Data Analytics: Althaf Rahaman - SK, Sai Rajesh.K .Girija Rani K
No ratings yet
Challenging Tools On Research Issues in Big Data Analytics: Althaf Rahaman - SK, Sai Rajesh.K .Girija Rani K
8 pages
A Informative Study On Big Data in Present Day World
No ratings yet
A Informative Study On Big Data in Present Day World
8 pages
Deep-learning-applications-and-challenges-in
No ratings yet
Deep-learning-applications-and-challenges-in
22 pages
25862487
No ratings yet
25862487
4 pages
FDS Notes
No ratings yet
FDS Notes
148 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages
For Seminar Presentation-Edited (Feb5)
No ratings yet
For Seminar Presentation-Edited (Feb5)
33 pages
A Systematic Literature Review On Features of Deep Learning in Big Data Analytics
No ratings yet
A Systematic Literature Review On Features of Deep Learning in Big Data Analytics
19 pages
DS
No ratings yet
DS
94 pages
ocs353-data-science-fundamentals-notes
No ratings yet
ocs353-data-science-fundamentals-notes
145 pages
Big Data Lit Rev
No ratings yet
Big Data Lit Rev
8 pages
Fujipress - JACIII 21 1 5
No ratings yet
Fujipress - JACIII 21 1 5
18 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
UNIT I DBMI
No ratings yet
UNIT I DBMI
35 pages
Unit 1 Notes DM
No ratings yet
Unit 1 Notes DM
81 pages
DMBI - for MBA - Unit V
No ratings yet
DMBI - for MBA - Unit V
52 pages
Unit 2 DATA WAREHOUSE AND DATA MART
No ratings yet
Unit 2 DATA WAREHOUSE AND DATA MART
17 pages
UNIT 1 CLASSIFICATION & PREDICTION DM
No ratings yet
UNIT 1 CLASSIFICATION & PREDICTION DM
71 pages
FloppyDriveInfo 1349101164
No ratings yet
FloppyDriveInfo 1349101164
27 pages
A Survey On Ubiquitous Wifi Based Indoor Localization System For Smartphone Users From Implementation Perspectives
No ratings yet
A Survey On Ubiquitous Wifi Based Indoor Localization System For Smartphone Users From Implementation Perspectives
21 pages
HKBK College of Engineering Department of Ise: Big Data Analytics (18Cs72) Seminar On The Topic Key-Value Pairs
100% (1)
HKBK College of Engineering Department of Ise: Big Data Analytics (18Cs72) Seminar On The Topic Key-Value Pairs
15 pages
TSF12 09 02 Hail 1150 P10 2
No ratings yet
TSF12 09 02 Hail 1150 P10 2
9 pages
Instru Assignment
No ratings yet
Instru Assignment
32 pages
Distributed Database System
No ratings yet
Distributed Database System
100 pages
9-Bis-New-for-VGT Latin Drums
No ratings yet
9-Bis-New-for-VGT Latin Drums
1 page
Fire Extinguisher Training
No ratings yet
Fire Extinguisher Training
32 pages
Quiz Year 6 Module
No ratings yet
Quiz Year 6 Module
19 pages
Universal Indicator CPI VI-3P R
No ratings yet
Universal Indicator CPI VI-3P R
4 pages
Module 2 Exercise2
No ratings yet
Module 2 Exercise2
2 pages
Introduction To Analog and Digital Communication: Chapter 10
No ratings yet
Introduction To Analog and Digital Communication: Chapter 10
77 pages
Paper-Cdtyr224x03ph4-Frt04-Jee Adv-170723
No ratings yet
Paper-Cdtyr224x03ph4-Frt04-Jee Adv-170723
16 pages
Anaerobic Digester Covers
No ratings yet
Anaerobic Digester Covers
2 pages
Patterns in Nature
No ratings yet
Patterns in Nature
3 pages
20J3XB - Programming - EN M80 VER00
No ratings yet
20J3XB - Programming - EN M80 VER00
142 pages
EXp6 Motion Dynamics new
No ratings yet
EXp6 Motion Dynamics new
5 pages
076bct041 AI Lab5
No ratings yet
076bct041 AI Lab5
5 pages
Gym Database
No ratings yet
Gym Database
17 pages
Monosaccharides Lecture
No ratings yet
Monosaccharides Lecture
24 pages
Jet and Rocket Propulsion PPT Cbit
No ratings yet
Jet and Rocket Propulsion PPT Cbit
44 pages
Capstone Concorde SST
No ratings yet
Capstone Concorde SST
20 pages
MTPDF4 Electric Field
No ratings yet
MTPDF4 Electric Field
33 pages
Chapter 5 Causation and Experimental Design
No ratings yet
Chapter 5 Causation and Experimental Design
30 pages
RE_SpringVeneer_ENG1
No ratings yet
RE_SpringVeneer_ENG1
7 pages
Ial WPH04 01 Oct19
No ratings yet
Ial WPH04 01 Oct19
28 pages

DataMining

Uploaded by

DataMining

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

DATA MINING -A DOMAIN SPECIFIC ANALYTICAL TOOL FOR DECISION

Prakash Chandra Behera Chinmaya Dash

SEE PROFILE SEE PROFILE

Chapter View project

The user has requested enhancement of the downloaded file.

DATA MINING - A DOMAIN SPECIFIC ANALYTICAL TOOL FOR DECISION

ABSTRACT information stored, and the discovery of patterns in raw data.

2.4 Discovering Patterns and Rules: 4. DATA MINING LIFE CYCLE

3.3 Classification of data mining systems according to the

This classification based on the knowledge discovered or

3.4 Classification of data mining systems according to

This classification is according to the data analysis approach

In this phase, various modelling techniques are selected and

FIGURE 2: Details of Data Mining Process Model

So, KDD is a process of mapping low-level data into other

9.3 Data cleaning and pre-processing:

Basic operations include removing noise if appropriate,

9.4 Data reduction and projection:

Finding useful features to represent the data depending on the

12.10Data Mining in the Web Education 12.15E-commerce

View publication stats

You might also like