Data Mining
Data Mining
Automated Mechanism,
Design Documentation,
Security Officer, System
Configuration, Systems
Design
activities on real evidence can only increase the In the context of predictive analytics, data mining is
likelihood that we will correctly identify the bad the process of building the representative model that
guys while helping to protect the innocent by fits the observational data. This model serves two
casting a more targeted net. Like the difference purposes: on the one hand it predicts the output
between a shotgun and a laser-sited 9mm, there (interest rate) based on the input variables (credit
is always the possibility of an error, but there is score, income level, and loan amount), and on the
much less collateral damage with the more other hand we can use it to understand the
accurate weapon. relationship between the output variable and all the
Again, the real issue in the debate comes back to input variables. For example, does income level really
privacy concerns. People do not like law matter in determining the loan interest rate? Does
enforcement knowing their business, which is a income level matter more than credit score? What
very reasonable concern, particularly when viewed happens when income levels double or if credit score
in light of past abuses. Unfortunately, this drops by 10 points? Model building in the context of
attitude confuses process with input issues and data mining can be used in both predictive and
places the blame on the tool rather than on the explanatory applications.
data resources tapped. Data mining can only be 1.1.3 Combination of Statistics, Machine Learning,
used on the data that are made available to it. and Computing
Data mining is not a vast repository designed to In the pursuit of extracting useful and relevant
maintain extensive files containing both public information from large data sets, data mining derives
and private records on each and every American, computational techniques from the disciplines of
as has been suggested by some. It is an analytical statistics, artificial intelligence, machine learning,
tool. If people are concerned about privacy issues, database theories, and pattern recognition.
then they should focus on the availability of and Algorithms used in data mining originated from
access to sensitive data resources, not the these disciplines, but have since evolved to adopt
analytical tools. Banning an analytical tool more diverse techniques such as parallel computing,
because of fear that it will be misused is similar to evolutionary computing, linguistics, and behavioral
banning pocket calculators because some people studies. One of the key ingredients of successful data
use them to cheat on their taxes. mining is substantial prior knowledge about the data
As with any powerful weapon used in the war on and the business processes that generate the data,
terrorism, the war on drugs, or the war on crime, known as subject matter expertise. Like many
safety starts with informed public safety quantitative frameworks, data mining is an iterative
consumers and well-trained personnel. As is process in which the practitioner gains more
emphasized throughout this text, domain information about the patterns and relationships
expertise frequently is the most important from data in each cycle. The art of data mining
component of a well-informed, professional combines the knowledge of statistics, subject matter
program of data mining and predictive analytics. expertise, database technologies, and machine
As such, it should be seen as an essential learning techniques to extract meaningful and useful
responsibility of each agency to ensure active information from the data. Data mining also typically
participation on the part of those in the know; operates on large data sets that need to be stored,
those professionals from within each organization processed, and computed. This is where database
that know where the data came from and how it techniques along with parallel and distributed
will be used. To relinquish the responsibility for computing techniques play an important role in data
analysis to outside organizations or consultants mining.
should be viewed in the same way as a suggestion 1.1.4 Algorithms
to entirely contract patrol services to a private We can also define data mining as a process of
security corporation: an unacceptable abdication discovering previously unknown patterns in the data
of an essential responsibility. using automatic iterative methods. Algorithms are
Unfortunately, serious misinformation regarding iterative step-by-step procedure to transform inputs
this very important tool might limit or somehow to output. The application of sophisticated algorithms
curtail its future use when we most need it in our for extracting useful patterns from the data
fight against terrorism. As such, it is incumbent differentiates data mining from traditional data
upon each organization to ensure absolute analysis techniques. Most of these algorithms were
integrity and an informed decision-making developed in recent decades and have been borrowed
process regarding the use of these tools and their from the fields of machine learning and artificial
output in an effort to ensure their ongoing intelligence. However, some of the algorithms are
availability and access for public safety based on the foundations of Bayesian probabilistic
applications. theories and regression analysis, originated hundreds
of years ago. These iterative algorithms automate the
process of searching for an optimal solution for a
View chapter Purchase book given data problem. Based on the data problem, data
mining is classified into tasks such as classification,
association analysis, clustering, and regression. Each
data mining task uses specific algorithms like
Multivariate Analysis: Overview decision trees, neural networks, k-nearest neighbors,
k-means clustering, among others. With increased
I. Olkin, A.R. Sampson, in International Encyclopedia research on data mining, the number of such
of the Social & Behavioral Sciences, 2001 algorithms is increasing, but a few classic algorithms
remain foundational to many data mining
6.7 Data Mining
Data mining refers to a set of approaches and applications.
techniques that permit ‘nuggets’ of valuable
information to be extracted from vast and loosely View chapter Purchase book
Jiawei Han, ... Jian Pei, in Data Mining (Third The non-trivial extraction of implicit, previously
Edition), 2012 unknown, and potentially useful information from
data (Frawley et al., 1991).
1.7.5 Data Mining and Society
As data mining developed as a professional activity, it
How does data mining impact society? What steps
was necessary to distinguish it from the previous
can data mining take to preserve the privacy of
activity of statistical modeling and the broader activity
individuals? Do we use data mining in our daily lives
of knowledge discovery. For the purposes of this
without even knowing that we do? These questions
handbook, we will use the following working
raise the following issues:
definitions:
■ Social impacts of data mining: With data mining
• Statistical modeling: The use of parametric
penetrating our everyday lives, it is important to
statistical algorithms to group or predict an
study the impact of data mining on society. How
outcome or event, based on predictor variables.
can we use data mining technology to benefit
society? How can we guard against its misuse? • Data mining: The use of machine learning
The improper disclosure or use of data and the algorithms to find faint patterns of relationship
potential violation of individual privacy and data between data elements in large, noisy, and messy
protection rights are areas of concern that need to data sets, which can lead to actions to increase
be addressed. benefit in some form (diagnosis, profit, detection,
etc.).
■ Privacy-preserving data mining: Data mining will
help scientific discovery, business management, • Knowledge discovery: The entire process of data
economy recovery, and security protection (e.g., access, data exploration, data preparation,
the real-time discovery of intruders and modeling, model deployment, and model
cyberattacks). However, it poses the risk of monitoring. This broad process includes data
disclosing an individual's personal information. mining activities, as shown in Figure 2.1.
Studies on privacy-preserving data publishing and
data mining are ongoing. The philosophy is to
observe data sensitivity and preserve people's
privacy while performing successful data mining.
■ Invisible data mining: We cannot expect everyone
in society to learn and master data mining
techniques. More and more systems should have
data mining functions built within so that people
can perform data mining or use data mining
results simply by mouse clicking, without any
knowledge of data mining algorithms. Intelligent Sign in to download full-size image
search engines and Internet-based stores perform
Figure 2.1. The relationship between data mining and knowledge discovery.
such invisible data mining by incorporating data
mining into their components to improve their As the practice of data mining developed further, the
functionality and performance. This is done often focus of the definitions shifted to specific aspects of
unbeknownst to the user. For example, when the information and its sources. In 1996, Fayyad et al.
purchasing items online, users may be unaware proposed the following:
that the store is likely collecting data on the Knowledge discovery in databases is the non-trivial
buying patterns of its customers, which may be process of identifying valid, novel, potential useful,
used to recommend other items for purchase in and ultimately understandable patterns in data.
the future.
The second definition focuses on the patterns in the
These issues and many additional ones relating to the data rather than just information in a generic sense.
research, development, and application of data These patterns are faint and hard to distinguish, and
mining are discussed throughout the book. they can only be sensed by analysis algorithms that
can evaluate nonlinear relationships between
View chapter Purchase book predictor variables and their targets and themselves.
This form of the definition of data mining developed
along with the rise of machine learning tools for use
in data mining. Tools like decision trees and neural
Process Models for Data Mining nets permit the analysis of nonlinear patterns in data
and Predictive Analysis easier than is possible in parametric statistical
algorithms. The reason is that machine learning
Colleen McCue, in Data Mining and Predictive algorithms learn the way humans do—by example,
Analysis (Second Edition), 2015 not by calculation of metrics based on averages and
data distributions.
Abstract
Data mining and predictive analytics can best be The definition of data mining was confined originally
understood as a process, rather than specific to just the process of model building. But as the
technology, tool, or tradecraft. Chapter 4 includes an practice matured, data mining tool packages (e.g.,
overview of four complementary approaches to SPSS-Clementine) included other necessary tools to
analysis: the Central Intelligence Agency (CIA) facilitate the building of models and for evaluating
Intelligence Process, the CRoss Industry Standard and displaying models. Soon, the definition of data
Process for Data Mining (CRISP-DM), SEMMA, and mining expanded to include those operations in
the Actionable Mining and Predictive Analysis process Figure 2.1 (and some include model visualization
developed specifically for the operational public safety also).
and security environment. The Actionable Mining The modern Knowledge Discovery in Databases
and Predictive Analysis process addresses unique (KDD) process combines the mathematics used to
requirements and constraints associated with the discover interesting patterns in data with the entire
applied setting, including data access and availability, process of extracting data and using resulting models
public safety-specific evaluation, and the requirement to apply to other data sets to leverage the information
for operationally relevant and actionable output. Data for some purpose. This process blends business
privacy and security also are addressed. systems engineering, elegant statistical methods, and
industrial-strength computing power to find structure
View chapter Purchase book (connections, patterns, associations, and basis
functions) rather than statistical parameters (means,
weights, thresholds, knots). In Chapter 3, we will
expand this rather linear organization of data mining
processes to describe the iterative, closed-loop
system with feedbacks that comprise the modern
approach to the practice of data mining.
About ScienceDirect Remote access Shopping cart Advertise Contact and support
Terms and conditions Privacy policy
We use cookies to help provide and enhance our service and tailor content and ads. By continuing you agree to the use of
cookies.
Copyright © 2020 Elsevier B.V. or its licensors or contributors. ScienceDirect ® is a registered trademark of Elsevier B.V.