0% found this document useful (0 votes)

146 views6 pages

Data Mining for Business & Web

1. Data mining is the process of discovering interesting patterns from massive amounts of data. It typically involves data cleaning, integration, selection, transformation, pattern discovery, pattern evaluation, and knowledge presentation. 2. Patterns are considered interesting if they are valid, novel, potentially useful, and easily understood by humans. These interesting patterns represent knowledge. 3. Data mining can be conducted on any kind of data as long as the data are meaningful for a target application, such as database data, data warehouse data, transactional data, and advanced data types including text, multimedia, spatial, temporal, graph, and web data.

Uploaded by

Raj Endran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

146 views6 pages

Data Mining for Business & Web

Uploaded by

Raj Endran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1.

8 Summary

Invisible data mining: We cannot expect everyone in society to learn and master
data mining techniques. More and more systems should have data mining functions built within so that people can perform data mining or use data mining results
simply by mouse clicking, without any knowledge of data mining algorithms. Intelligent search engines and Internet-based stores perform such invisible data mining by
incorporating data mining into their components to improve their functionality and
performance. This is done often unbeknownst to the user. For example, when purchasing items online, users may be unaware that the store is likely collecting data on
the buying patterns of its customers, which may be used to recommend other items
for purchase in the future.
These issues and many additional ones relating to the research, development, and
application of data mining are discussed throughout the book.

1.8

Summary
Necessity is the mother of invention. With the mounting growth of data in every application, data mining meets the imminent need for effective, scalable, and flexible data
analysis in our society. Data mining can be considered as a natural evolution of information technology and a confluence of several related disciplines and application
domains.
Data mining is the process of discovering interesting patterns from massive amounts
of data. As a knowledge discovery process, it typically involves data cleaning, data integration, data selection, data transformation, pattern discovery, pattern evaluation,
and knowledge presentation.
A pattern is interesting if it is valid on test data with some degree of certainty, novel,
potentially useful (e.g., can be acted on or validates a hunch about which the user was
curious), and easily understood by humans. Interesting patterns represent knowledge. Measures of pattern interestingness, either objective or subjective, can be used
to guide the discovery process.
We present a multidimensional view of data mining. The major dimensions are
data, knowledge, technologies, and applications.
Data mining can be conducted on any kind of data as long as the data are meaningful
for a target application, such as database data, data warehouse data, transactional
data, and advanced data types. Advanced data types include time-related or sequence
data, data streams, spatial and spatiotemporal data, text and multimedia data, graph
and networked data, and Web data.
A data warehouse is a repository for long-term storage of data from multiple sources,
organized so as to facilitate management decision making. The data are stored
under a unified schema and are typically summarized. Data warehouse systems provide multidimensional data analysis capabilities, collectively referred to as online
analytical processing.

1.6 Which Kinds of Applications Are Targeted?

the major topics in a collection of documents and, for each document in the collection,
the major topics involved.
Increasingly large amounts of text and multimedia data have been accumulated and
made available online due to the fast growth of the Web and applications such as digital libraries, digital governments, and health care information systems. Their effective
search and analysis have raised many challenging issues in data mining. Therefore, text
mining and multimedia data mining, integrated with information retrieval methods,
have become increasingly important.

1.6

Which Kinds of Applications Are Targeted?

Where there are data, there are data mining applications
As a highly application-driven discipline, data mining has seen great successes in many
applications. It is impossible to enumerate all applications where data mining plays a
critical role. Presentations of data mining in knowledge-intensive application domains,
such as bioinformatics and software engineering, require more in-depth treatment and
are beyond the scope of this book. To demonstrate the importance of applications as
a major dimension in data mining research and development, we briefly discuss two
highly successful and popular application examples of data mining: business intelligence
and search engines.

1.6.1

Business Intelligence
It is critical for businesses to acquire a better understanding of the commercial context
of their organization, such as their customers, the market, supply and resources, and
competitors. Business intelligence (BI) technologies provide historical, current, and
predictive views of business operations. Examples include reporting, online analytical
processing, business performance management, competitive intelligence, benchmarking, and predictive analytics.
How important is business intelligence? Without data mining, many businesses may
not be able to perform effective market analysis, compare customer feedback on similar products, discover the strengths and weaknesses of their competitors, retain highly
valuable customers, and make smart business decisions.
Clearly, data mining is the core of business intelligence. Online analytical processing tools in business intelligence rely on data warehousing and multidimensional data
mining. Classification and prediction techniques are the core of predictive analytics
in business intelligence, for which there are many applications in analyzing markets,
supplies, and sales. Moreover, clustering plays a central role in customer relationship
management, which groups customers based on their similarities. Using characterization mining techniques, we can better understand features of each customer group and
develop customized customer reward programs.

Chapter 1 Introduction

1.6.2

Web Search Engines

A Web search engine is a specialized computer server that searches for information
on the Web. The search results of a user query are often returned as a list (sometimes
called hits). The hits may consist of web pages, images, and other types of files. Some
search engines also search and return data available in public databases or open directories. Search engines differ from web directories in that web directories are maintained
by human editors whereas search engines operate algorithmically or by a mixture of
algorithmic and human input.
Web search engines are essentially very large data mining applications. Various data
mining techniques are used in all aspects of search engines, ranging from crawling 5
(e.g., deciding which pages should be crawled and the crawling frequencies), indexing
(e.g., selecting pages to be indexed and deciding to which extent the index should be
constructed), and searching (e.g., deciding how pages should be ranked, which advertisements should be added, and how the search results can be personalized or made
context aware).
Search engines pose grand challenges to data mining. First, they have to handle a
huge and ever-growing amount of data. Typically, such data cannot be processed using
one or a few machines. Instead, search engines often need to use computer clouds, which
consist of thousands or even hundreds of thousands of computers that collaboratively
mine the huge amount of data. Scaling up data mining methods over computer clouds
and large distributed data sets is an area for further research.
Second, Web search engines often have to deal with online data. A search engine
may be able to afford constructing a model offline on huge data sets. To do this, it may
construct a query classifier that assigns a search query to predefined categories based on
the query topic (i.e., whether the search query apple is meant to retrieve information
about a fruit or a brand of computers). Whether a model is constructed offline, the
application of the model online must be fast enough to answer user queries in real time.
Another challenge is maintaining and incrementally updating a model on fastgrowing data streams. For example, a query classifier may need to be incrementally
maintained continuously since new queries keep emerging and predefined categories
and the data distribution may change. Most of the existing model training methods are
offline and static and thus cannot be used in such a scenario.
Third, Web search engines often have to deal with queries that are asked only a very
small number of times. Suppose a search engine wants to provide context-aware query
recommendations. That is, when a user poses a query, the search engine tries to infer
the context of the query using the users profile and his query history in order to return
more customized answers within a small fraction of a second. However, although the
total number of queries asked can be huge, most of the queries may be asked only once
or a few times. Such severely skewed data are challenging for many data mining and
machine learning methods.
5A

Web crawler is a computer program that browses the Web in a methodical, automated manner.

1.7 Major Issues in Data Mining

1.7

Major Issues in Data Mining

Life is short but art is long. Hippocrates
Data mining is a dynamic and fast-expanding field with great strengths. In this section,
we briefly outline the major issues in data mining research, partitioning them into
five groups: mining methodology, user interaction, efficiency and scalability, diversity of
data types, and data mining and society. Many of these issues have been addressed in
recent data mining research and development to a certain extent and are now considered data mining requirements; others are still at the research stage. The issues continue
to stimulate further investigation and improvement in data mining.

1.7.1

Mining Methodology
Researchers have been vigorously developing new data mining methodologies. This
involves the investigation of new kinds of knowledge, mining in multidimensional
space, integrating methods from other disciplines, and the consideration of semantic ties
among data objects. In addition, mining methodologies should consider issues such as
data uncertainty, noise, and incompleteness. Some mining methods explore how userspecified measures can be used to assess the interestingness of discovered patterns as
well as guide the discovery process. Lets have a look at these various aspects of mining
methodology.
Mining various and new kinds of knowledge: Data mining covers a wide spectrum of
data analysis and knowledge discovery tasks, from data characterization and discrimination to association and correlation analysis, classification, regression, clustering,
outlier analysis, sequence analysis, and trend and evolution analysis. These tasks may
use the same database in different ways and require the development of numerous
data mining techniques. Due to the diversity of applications, new mining tasks continue to emerge, making data mining a dynamic and fast-growing field. For example,
for effective knowledge discovery in information networks, integrated clustering and
ranking may lead to the discovery of high-quality clusters and object ranks in large
networks.
Mining knowledge in multidimensional space: When searching for knowledge in large
data sets, we can explore the data in multidimensional space. That is, we can search
for interesting patterns among combinations of dimensions (attributes) at varying
levels of abstraction. Such mining is known as (exploratory) multidimensional data
mining. In many cases, data can be aggregated or viewed as a multidimensional data
cube. Mining knowledge in cube space can substantially enhance the power and
flexibility of data mining.
Data miningan interdisciplinary effort: The power of data mining can be substantially enhanced by integrating new methods from multiple disciplines. For example,

Chapter 1 Introduction

to mine data with natural language text, it makes sense to fuse data mining methods
with methods of information retrieval and natural language processing. As another
example, consider the mining of software bugs in large programs. This form of mining, known as bug mining, benefits from the incorporation of software engineering
knowledge into the data mining process.
Boosting the power of discovery in a networked environment: Most data objects reside
in a linked or interconnected environment, whether it be the Web, database relations, files, or documents. Semantic links across multiple data objects can be used
to advantage in data mining. Knowledge derived in one set of objects can be used
to boost the discovery of knowledge in a related or semantically linked set of
objects.
Handling uncertainty, noise, or incompleteness of data: Data often contain noise,
errors, exceptions, or uncertainty, or are incomplete. Errors and noise may confuse
the data mining process, leading to the derivation of erroneous patterns. Data cleaning, data preprocessing, outlier detection and removal, and uncertainty reasoning are
examples of techniques that need to be integrated with the data mining process.
Pattern evaluation and pattern- or constraint-guided mining: Not all the patterns generated by data mining processes are interesting. What makes a pattern interesting
may vary from user to user. Therefore, techniques are needed to assess the interestingness of discovered patterns based on subjective measures. These estimate the
value of patterns with respect to a given user class, based on user beliefs or expectations. Moreover, by using interestingness measures or user-specified constraints to
guide the discovery process, we may generate more interesting patterns and reduce
the search space.

1.7.2

User Interaction
The user plays an important role in the data mining process. Interesting areas of research
include how to interact with a data mining system, how to incorporate a users background knowledge in mining, and how to visualize and comprehend data mining results.
We introduce each of these here.
Interactive mining: The data mining process should be highly interactive. Thus, it is
important to build flexible user interfaces and an exploratory mining environment,
facilitating the users interaction with the system. A user may like to first sample a
set of data, explore general characteristics of the data, and estimate potential mining results. Interactive mining should allow users to dynamically change the focus
of a search, to refine mining requests based on returned results, and to drill, dice,
and pivot through the data and knowledge space interactively, dynamically exploring
cube space while mining.
Incorporation of background knowledge: Background knowledge, constraints, rules,
and other information regarding the domain under study should be incorporated

1.7 Major Issues in Data Mining

into the knowledge discovery process. Such knowledge can be used for pattern
evaluation as well as to guide the search toward interesting patterns.
Ad hoc data mining and data mining query languages: Query languages (e.g., SQL)
have played an important role in flexible searching because they allow users to pose
ad hoc queries. Similarly, high-level data mining query languages or other high-level
flexible user interfaces will give users the freedom to define ad hoc data mining tasks.
This should facilitate specification of the relevant sets of data for analysis, the domain
knowledge, the kinds of knowledge to be mined, and the conditions and constraints
to be enforced on the discovered patterns. Optimization of the processing of such
flexible mining requests is another promising area of study.
Presentation and visualization of data mining results: How can a data mining system
present data mining results, vividly and flexibly, so that the discovered knowledge
can be easily understood and directly usable by humans? This is especially crucial
if the data mining process is interactive. It requires the system to adopt expressive
knowledge representations, user-friendly interfaces, and visualization techniques.

1.7.3

Efficiency and Scalability

Efficiency and scalability are always considered when comparing data mining algorithms. As data amounts continue to multiply, these two factors are especially critical.
Efficiency and scalability of data mining algorithms: Data mining algorithms must be
efficient and scalable in order to effectively extract information from huge amounts
of data in many data repositories or in dynamic data streams. In other words, the
running time of a data mining algorithm must be predictable, short, and acceptable
by applications. Efficiency, scalability, performance, optimization, and the ability to
execute in real time are key criteria that drive the development of many new data
mining algorithms.
Parallel, distributed, and incremental mining algorithms: The humongous size of many
data sets, the wide distribution of data, and the computational complexity of some
data mining methods are factors that motivate the development of parallel and distributed data-intensive mining algorithms. Such algorithms first partition the data
into pieces. Each piece is processed, in parallel, by searching for patterns. The parallel processes may interact with one another. The patterns from each partition are
eventually merged.
Cloud computing and cluster computing, which use computers in a distributed
and collaborative way to tackle very large-scale computational tasks, are also active
research themes in parallel data mining. In addition, the high cost of some data mining processes and the incremental nature of input promote incremental data mining,
which incorporates new data updates without having to mine the entire data from
scratch. Such methods perform knowledge modification incrementally to amend
and strengthen what was previously discovered.

DM Mod1
No ratings yet
DM Mod1
29 pages
01 Intro
No ratings yet
01 Intro
28 pages
Module1 IntroToDataMining
No ratings yet
Module1 IntroToDataMining
36 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Data Mining Chapter 1
No ratings yet
Data Mining Chapter 1
43 pages
Ch1 (1) (Read-Only) (Compatibility Mode)
No ratings yet
Ch1 (1) (Read-Only) (Compatibility Mode)
39 pages
(Ebook PDF) Introduction To Business Data Mining 1St Edition Download
No ratings yet
(Ebook PDF) Introduction To Business Data Mining 1St Edition Download
44 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
41 pages
LECTURE 1 Data Mining
No ratings yet
LECTURE 1 Data Mining
41 pages
DWDM
No ratings yet
DWDM
30 pages
DWDM 3rd Edition Text Book Slides
No ratings yet
DWDM 3rd Edition Text Book Slides
938 pages
Week 1A - Overview and Introduction of Data Mining
No ratings yet
Week 1A - Overview and Introduction of Data Mining
41 pages
01 Intro
No ratings yet
01 Intro
52 pages
DM-Unit 1
No ratings yet
DM-Unit 1
110 pages
Intro to Data Mining Concepts
No ratings yet
Intro to Data Mining Concepts
50 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
41 pages
Data Mining
No ratings yet
Data Mining
26 pages
01 Intro
No ratings yet
01 Intro
41 pages
21IS503 UnitII LM5
No ratings yet
21IS503 UnitII LM5
20 pages
Data Mining for Beginners
No ratings yet
Data Mining for Beginners
6 pages
Chapter 1 - Tagged
No ratings yet
Chapter 1 - Tagged
46 pages
Inf 444e - Datamining N Advanced Databases Introduction 2019
No ratings yet
Inf 444e - Datamining N Advanced Databases Introduction 2019
32 pages
Web Mining - Lec1 2
No ratings yet
Web Mining - Lec1 2
62 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
39 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
39 pages
(Ebook PDF) Introduction To Business Data Mining 1st Editioninstant Download
100% (3)
(Ebook PDF) Introduction To Business Data Mining 1st Editioninstant Download
44 pages
Es 2646574663
No ratings yet
Es 2646574663
7 pages
Data Mining Basics for Beginners
No ratings yet
Data Mining Basics for Beginners
59 pages
DB 14
No ratings yet
DB 14
97 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
21 pages
1 Intro
No ratings yet
1 Intro
50 pages
Module 1
No ratings yet
Module 1
40 pages
Lec.01 Introduction To DM
No ratings yet
Lec.01 Introduction To DM
56 pages
Data Mining
No ratings yet
Data Mining
254 pages
Lecture 1. Introduction
No ratings yet
Lecture 1. Introduction
42 pages
01 Intro
No ratings yet
01 Intro
40 pages
Unit 1 A
No ratings yet
Unit 1 A
39 pages
01 - Data Mining Introduction
No ratings yet
01 - Data Mining Introduction
21 pages
Chapter 1 Intro
No ratings yet
Chapter 1 Intro
23 pages
Data Mining Essentials for Students
No ratings yet
Data Mining Essentials for Students
95 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
01 Intro
No ratings yet
01 Intro
40 pages
VIPDMTheory Chapter 1
No ratings yet
VIPDMTheory Chapter 1
25 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
01 Intro 1
No ratings yet
01 Intro 1
33 pages
Unit 1a
No ratings yet
Unit 1a
39 pages
DMM Finals
No ratings yet
DMM Finals
30 pages
IS414: Data Mining: DR - Waleed M.Ead
No ratings yet
IS414: Data Mining: DR - Waleed M.Ead
36 pages
Data Mining in Search Engine Analytics
No ratings yet
Data Mining in Search Engine Analytics
7 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
37 pages
Data Mining: Applications and Techniques
No ratings yet
Data Mining: Applications and Techniques
60 pages
1-Data Mining and Applications
No ratings yet
1-Data Mining and Applications
70 pages
Unit 1: Data Warehousing & Data Mining
No ratings yet
Unit 1: Data Warehousing & Data Mining
54 pages
DWDM LS1 Fall 24 25
No ratings yet
DWDM LS1 Fall 24 25
42 pages
Part 1
No ratings yet
Part 1
9 pages
Data Mining and Visualization
No ratings yet
Data Mining and Visualization
9 pages
Data Mining Merged PDF CS1 CS8
No ratings yet
Data Mining Merged PDF CS1 CS8
272 pages
BI Ch02
No ratings yet
BI Ch02
29 pages
01 Intro
No ratings yet
01 Intro
29 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Data Mining - Mining Sequential Patterns
No ratings yet
Data Mining - Mining Sequential Patterns
10 pages
Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
Data Mining-Mining Sequence Patterns in Biological Data
No ratings yet
Data Mining-Mining Sequence Patterns in Biological Data
6 pages
Data Mining-Mining Time Series Data
0% (1)
Data Mining-Mining Time Series Data
7 pages
Spatial Data Mining Techniques
No ratings yet
Spatial Data Mining Techniques
8 pages
Data Mining-Multimedia Datamining
No ratings yet
Data Mining-Multimedia Datamining
8 pages
Data Mining-Outlier Analysis
No ratings yet
Data Mining-Outlier Analysis
6 pages
Data Mining-Constraint Based Cluster Analysis
100% (1)
Data Mining-Constraint Based Cluster Analysis
4 pages
Data Mining-Partitioning Methods
100% (1)
Data Mining-Partitioning Methods
7 pages
Data Science: Classification & Regression
No ratings yet
Data Science: Classification & Regression
7 pages
Data Mining-Model Based Clustering
No ratings yet
Data Mining-Model Based Clustering
8 pages
Rule-Based Classification Guide
No ratings yet
Rule-Based Classification Guide
4 pages
Data Mining-Backpropagation
100% (1)
Data Mining-Backpropagation
5 pages
Data Mining - Discretization
100% (1)
Data Mining - Discretization
5 pages
Bayesian Classification Guide
No ratings yet
Bayesian Classification Guide
6 pages
Data Mining - Data Reduction
No ratings yet
Data Mining - Data Reduction
6 pages
Data Mining-Applications, Issues
No ratings yet
Data Mining-Applications, Issues
9 pages
02 Data Mining-Partitioning Method
No ratings yet
02 Data Mining-Partitioning Method
8 pages
Data Mining - Outlier Analysis
100% (3)
Data Mining - Outlier Analysis
11 pages
Data Warehouse Concepts & Models
No ratings yet
Data Warehouse Concepts & Models
7 pages
Data Mining - Density Based Clustering
No ratings yet
Data Mining - Density Based Clustering
8 pages
08 Data Mining-Other Classifications
No ratings yet
08 Data Mining-Other Classifications
4 pages
Civil Engineering Neural Network Study
No ratings yet
Civil Engineering Neural Network Study
7 pages
Examples With Loops
No ratings yet
Examples With Loops
10 pages
Database User Details Guide
No ratings yet
Database User Details Guide
1 page
10-5 Web Services Developers Guide
No ratings yet
10-5 Web Services Developers Guide
371 pages
Generations of Computer Languages
No ratings yet
Generations of Computer Languages
17 pages
05 Interworking EIS KNX
No ratings yet
05 Interworking EIS KNX
36 pages
Advanced Dotnet Tutorial
No ratings yet
Advanced Dotnet Tutorial
5 pages
Amdahl's Law Example #2: - Protein String Matching Code
No ratings yet
Amdahl's Law Example #2: - Protein String Matching Code
23 pages
Dart Client Side Web Programming
No ratings yet
Dart Client Side Web Programming
8 pages
Routing Basic Interview Questions and Answers
No ratings yet
Routing Basic Interview Questions and Answers
8 pages
SE206-01-Overview of SE
No ratings yet
SE206-01-Overview of SE
25 pages
ANSYS Fluent UDF Compilation Guide
No ratings yet
ANSYS Fluent UDF Compilation Guide
10 pages
Debug
No ratings yet
Debug
14 pages
Dialux Tutorial
100% (2)
Dialux Tutorial
11 pages
Sample Term Paper
No ratings yet
Sample Term Paper
7 pages
04 MDM Stdadvinstall
No ratings yet
04 MDM Stdadvinstall
210 pages
Set-1 1. Explain How To Interface 8255 Programmable Interface With 8051 Microcontroller To Expand Its I/O Capability. 2
No ratings yet
Set-1 1. Explain How To Interface 8255 Programmable Interface With 8051 Microcontroller To Expand Its I/O Capability. 2
3 pages
Pll2side 20140531
No ratings yet
Pll2side 20140531
2 pages
3.1 Assignment 3: Lemmings: 3.1.1 Background Information
No ratings yet
3.1 Assignment 3: Lemmings: 3.1.1 Background Information
4 pages
Java RandomAccessFile Guide
No ratings yet
Java RandomAccessFile Guide
2 pages
Owasp DC Sls Top10
No ratings yet
Owasp DC Sls Top10
13 pages
Troubleshooting OID DIP Synchronization Issues
100% (1)
Troubleshooting OID DIP Synchronization Issues
47 pages
Splunk Basic Tutorial (Admin + Developer)
100% (1)
Splunk Basic Tutorial (Admin + Developer)
13 pages
Pic12f1571 PDF
No ratings yet
Pic12f1571 PDF
334 pages
Oracle R12 On VMware Server V 1.5
No ratings yet
Oracle R12 On VMware Server V 1.5
107 pages
Kubernetes CKA 0100 Core Concepts PDF
No ratings yet
Kubernetes CKA 0100 Core Concepts PDF
77 pages
OPTIMA Installation Tool
No ratings yet
OPTIMA Installation Tool
76 pages
10.2.1.4 Packet Tracer - Configure and Verify NTP
No ratings yet
10.2.1.4 Packet Tracer - Configure and Verify NTP
2 pages
Zooming
No ratings yet
Zooming
3 pages
Mercedes EWA Installation Guide
No ratings yet
Mercedes EWA Installation Guide
23 pages

Data Mining for Business & Web

Uploaded by

Data Mining for Business & Web

Uploaded by

1.

1.6 Which Kinds of Applications Are Targeted?

Which Kinds of Applications Are Targeted?

Web Search Engines

1.7 Major Issues in Data Mining

Major Issues in Data Mining

1.7 Major Issues in Data Mining

Efficiency and Scalability

You might also like