(eBook PDF) Introduction to Data Mining, Global Edition 2nd Edition pdf download
(eBook PDF) Introduction to Data Mining, Global Edition 2nd Edition pdf download
https://ebooksecure.com/product/ebook-pdf-introduction-to-data-
mining-global-edition-2nd-edition/
http://ebooksecure.com/product/ebook-pdf-introduction-to-data-
mining-2nd-edition-by-pang-ning-tan/
http://ebooksecure.com/product/ebook-pdf-introduction-to-
business-data-mining-1st-edition/
https://ebooksecure.com/download/introduction-to-algorithms-for-
data-mining-and-machine-learning-ebook-pdf/
http://ebooksecure.com/product/ebook-pdf-data-mining-and-
predictive-analytics-2nd-edition/
(eBook PDF) Handbook of Statistical Analysis and Data
Mining Applications 2nd Edition
http://ebooksecure.com/product/ebook-pdf-handbook-of-statistical-
analysis-and-data-mining-applications-2nd-edition/
http://ebooksecure.com/product/ebook-pdf-data-mining-concepts-
and-techniques-3rd/
https://ebooksecure.com/download/big-data-mining-for-climate-
change-ebook-pdf/
http://ebooksecure.com/product/ebook-pdf-introduction-to-java-
programming-and-data-structures-comprehensive-version-11th-
global-edition/
http://ebooksecure.com/product/ebook-pdf-introduction-to-global-
business-understanding-the-international-environment-global-
business-functions-2nd-edition/
Preface to the Second Edition 7
are time consuming, such hands-on assignments greatly enhance the value of
the course.
Joonsoo Lee, Yue Luo, Anuj Nanavati, Tyler Olsen, Sunyoung Park, Aashish
Phansalkar, Geoff Prewett, Michael Ryoo, Daryl Shannon, and Mei Yang.
Ronald Kostoff (ONR) read an early version of the clustering chapter
and offered numerous suggestions. George Karypis provided invaluable LATEX
assistance in creating an author index. Irene Moulitsas also provided assistance
with LATEX and reviewed some of the appendices. Musetta Steinbach was very
helpful in finding errors in the figures.
We would like to acknowledge our colleagues at the University of Minnesota
and Michigan State who have helped create a positive environment for data
mining research. They include Arindam Banerjee, Dan Boley, Joyce Chai, Anil
Jain, Ravi Janardan, Rong Jin, George Karypis, Claudia Neuhauser, Haesun
Park, William F. Punch, György Simon, Shashi Shekhar, and Jaideep Srivas-
tava. The collaborators on our many data mining projects, who also have our
gratitude, include Ramesh Agrawal, Maneesh Bhargava, Steve Cannon, Alok
Choudhary, Imme Ebert-Uphoff, Auroop Ganguly, Piet C. de Groen, Fran
Hill, Yongdae Kim, Steve Klooster, Kerry Long, Nihar Mahapatra, Rama Ne-
mani, Nikunj Oza, Chris Potter, Lisiane Pruinelli, Nagiza Samatova, Jonathan
Shapiro, Kevin Silverstein, Brian Van Ness, Bonnie Westra, Nevin Young, and
Zhi-Li Zhang.
The departments of Computer Science and Engineering at the University of
Minnesota and Michigan State University provided computing resources and
a supportive environment for this project. ARDA, ARL, ARO, DOE, NASA,
NOAA, and NSF provided research support for Pang-Ning Tan, Michael Stein-
bach, Anuj Karpatne, and Vipin Kumar. In particular, Kamal Abdali, Mitra
Basu, Dick Brackney, Jagdish Chandra, Joe Coughlan, Michael Coyle, Stephen
Davis, Frederica Darema, Richard Hirsch, Chandrika Kamath, Tsengdar Lee,
Raju Namburu, N. Radhakrishnan, James Sidoran, Sylvia Spengler, Bha-
vani Thuraisingham, Walt Tiernin, Maria Zemankova, Aidong Zhang, and
Xiaodong Zhang have been supportive of our research in data mining and
high-performance computing.
It was a pleasure working with the helpful staff at Pearson Education.
In particular, we would like to thank Matt Goldstein, Kathy Smith, Carole
Snyder, and Joyce Wells. We would also like to thank George Nichols, who
helped with the art work and Paul Anagnostopoulos, who provided LATEX
support.
We are grateful to the following Pearson reviewers: Leman Akoglu (Carnegie
Mellon University), Chien-Chung Chan (University of Akron), Zhengxin Chen
(University of Nebraska at Omaha), Chris Clifton (Purdue University), Joy-
deep Ghosh (University of Texas, Austin), Nazli Goharian (Illinois Institute of
Technology), J. Michael Hardin (University of Alabama), Jingrui He (Arizona
10 Preface to the Second Edition
1 Introduction 21
1.1 What Is Data Mining? . . . . . . . . . . . . . . . . . . . . . . . 24
1.2 Motivating Challenges . . . . . . . . . . . . . . . . . . . . . . . 25
1.3 The Origins of Data Mining . . . . . . . . . . . . . . . . . . . . 27
1.4 Data Mining Tasks . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5 Scope and Organization of the Book . . . . . . . . . . . . . . . 33
1.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2 Data 43
2.1 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.1.1 Attributes and Measurement . . . . . . . . . . . . . . . 47
2.1.2 Types of Data Sets . . . . . . . . . . . . . . . . . . . . . 54
2.2 Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.2.1 Measurement and Data Collection Issues . . . . . . . . . 62
2.2.2 Issues Related to Applications . . . . . . . . . . . . . . 69
2.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.3.1 Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.3.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.3.3 Dimensionality Reduction . . . . . . . . . . . . . . . . . 76
2.3.4 Feature Subset Selection . . . . . . . . . . . . . . . . . . 78
2.3.5 Feature Creation . . . . . . . . . . . . . . . . . . . . . . 81
2.3.6 Discretization and Binarization . . . . . . . . . . . . . . 83
2.3.7 Variable Transformation . . . . . . . . . . . . . . . . . . 89
2.4 Measures of Similarity and Dissimilarity . . . . . . . . . . . . . 91
2.4.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.4.2 Similarity and Dissimilarity between Simple Attributes . 94
2.4.3 Dissimilarities between Data Objects . . . . . . . . . . . 96
2.4.4 Similarities between Data Objects . . . . . . . . . . . . 98
12 Contents
Introduction
Rapid advances in data collection and storage technology, coupled with the
ease with which data can be generated and disseminated, have triggered the
explosive growth of data, leading to the current age of big data. Deriving
actionable insights from these large data sets is increasingly important in
decision making across almost all areas of society, including business and
industry; science and engineering; medicine and biotechnology; and govern-
ment and individuals. However, the amount of data (volume), its complexity
(variety), and the rate at which it is being collected and processed (velocity)
have simply become too great for humans to analyze unaided. Thus, there is
a great need for automated tools for extracting useful information from the
big data despite the challenges posed by its enormity and diversity.
Data mining blends traditional data analysis methods with sophisticated
algorithms for processing this abundance of data. In this introductory chapter,
we present an overview of data mining and outline the key topics to be covered
in this book. We start with a description of some applications that require
more advanced techniques for data analysis.
surface, oceans, and atmosphere. However, because of the size and spatio-
temporal nature of the data, traditional methods are often not suitable for
analyzing these data sets. Techniques developed in data mining can aid Earth
scientists in answering questions such as the following: “What is the relation-
ship between the frequency and intensity of ecosystem disturbances such as
droughts and hurricanes to global warming?”; “How is land surface precipita-
tion and temperature affected by ocean surface temperature?”; and “How well
can we predict the beginning and end of the growing season for a region?”
As another example, researchers in molecular biology hope to use the large
amounts of genomic data to better understand the structure and function of
genes. In the past, traditional methods in molecular biology allowed scientists
to study only a few genes at a time in a given experiment. Recent break-
throughs in microarray technology have enabled scientists to compare the
behavior of thousands of genes under various situations. Such comparisons
can help determine the function of each gene, and perhaps isolate the genes
responsible for certain diseases. However, the noisy, high-dimensional nature
of data requires new data analysis methods. In addition to analyzing gene
expression data, data mining can also be used to address other important
biological challenges such as protein structure prediction, multiple sequence
alignment, the modeling of biochemical pathways, and phylogenetics.
Another example is the use of data mining techniques to analyze electronic
health record (EHR) data, which has become increasingly available. Not very
long ago, studies of patients required manually examining the physical records
of individual patients and extracting very specific pieces of information per-
tinent to the particular question being investigated. EHRs allow for a faster
and broader exploration of such data. However, there are significant challenges
since the observations on any one patient typically occur during their visits
to a doctor or hospital and only a small number of details about the health
of the patient are measured during any particular visit.
Currently, EHR analysis focuses on simple types of data, e.g., a patient’s
blood pressure or the diagnosis code of a disease. However, large amounts of
more complex types of medical data are also being collected, such as electrocar-
diograms (ECGs) and neuroimages from magnetic resonance imaging (MRI)
or functional Magnetic Resonance Imaging (fMRI). Although challenging to
analyze, this data also provides vital information about patients. Integrating
and analyzing such data, with traditional EHR and genomic data is one of the
capabilities needed to enable precision medicine, which aims to provide more
personalized patient care.
24 Chapter 1 Introduction
Feature Selection
Filtering Patterns
Dimensionality Reduction
Visualization
Normalization
Pattern Interpretation
Data Subsetting
Those who have no ulcer in the mouth suck the poison, and spit it out.
The powder of the fruit of the Nol Vel is also administered with water.
The patient is made to wear a cotton thread in the name of Charmālia Nāg,
Sharmalia Nāg, or Vasangi Nāg, and certain observances, as stated above,
are promised to the snake deity.67
The ends of peacock feathers are pounded and smoked in a chilum (clay
pipe) by the patient.68
Some people believe that snakes, like evil spirits, can enter the bodies of
human beings. Such persons, when possessed, are supposed to have the
power of curing snake-bites.73
Sometimes the exorcist fans the patient with branches of the Nim tree,
reciting mantras, and thereupon the patient becomes possessed by the snake
and declares the cause of his offence.
Some exorcists present a magic epistle or charm asking the snake that bit
the patient to be present. The snake obeys the call, and appears before the
exorcist. The latter then asks the snake to suck the poison from the wound
of the patient, which is done by the snake, and the patient is then cured.75
In some places, the exorcist ties up the patient when the snake tells the
cause of the bite. Next the exorcist calls on the snake to leave the body of
the patient, who then begins to crawl about like a snake and is cured.
On some occasions, the exorcist slaps the cheek of the person who calls him
to attend the patient. It is said that the poison disappears as soon as the slap
is given.76
Some exorcists take a stick having seven joints and break them one by one.
As the stick is broken, the patient recovers, his recovery being complete
when the seventh joint is broken.77
It is believed that the Dhedas are the oldest worshippers of Nāgs or snakes.
When a person is bitten by a snake, he is seated near a Dheda, who prays
the snake to leave the body of the patient. It is said that in some cases this
method proves efficacious in curing the patient.78
It is stated that exorcists who know the mantra (incantation) for the cure of
snake-bites must lead a strictly moral life. If they touch a woman in child-
bed or during her period the mantra loses its power. This can be regained
through purification, bathing, and by reciting the mantra while inhaling the
smoke of burning frankincense. Some exorcists abstain from certain kinds
of vegetables and sweets, e. g., the Mogri (Rat-tailed radish), Julebi (a kind
of sweet), etc. They have also to abstain from articles of a colour like that of
a snake.79
A belief prevails that there is a precious stone in the head of the snake. Such
stones are called mohors. They are occasionally shown to the people by
snake-charmers, who declare that it is very difficult to procure them.
It is stated that on dark nights snakes take these mohors out of their head
and place them on prominent spots in order to be able to move about in the
dark by their light.80
It is believed that snakes give these mohors to those who please them. If
one tries to take a mohor by force, the snake swallows it and dissolves it
into water.81
As stated above, the mohor has the property of absorbing the poison from
snake-bites.
It is believed by some people that the mohor shines the most when a
rainbow appears in the sky.83
At times snakes are seen in houses. They are believed to be the guardians of
the houses, and worshipped with offerings of lamps fed with ghi. After
worship, the members of the family pray to the snake, “Oh snake! Thou art
our guardian. Protect our health and wealth. We are thy children and live in
thy garden.”88
The Kāli Parāj or aboriginal tribes in Gujarāt give such names as Kāgdo
(crow), Kolo (Jackal), Bilādo (cat), Kutro (dog) to their children according
as one or other of these animals is heard to cry at the time of birth.5
Instances of family or clan names derived from trees and animals are as
follows:—
The cow, the she-goat, the horse, the deer, peacock, the Tilad or singing
sparrow, the goose, the Nāg or snake, the eagle, the elephant and the male
monkey are believed to be sacred by all Hindus. Of these, the greatest
sanctity attaches to the cow. Her urine is sipped for the atonement of sins.
The cow is also revered by the Pārsis.12
The mouth of the she-goat and the smell of the horse are considered
sacred.12
Brāhmans, Baniās, Bhātiās, Kunbis, Sutārs and Darjis abstain from flesh
and liquor.16
Some Brāhmans and Baniās do not eat tādiās (fruit of the palm tree) as they
look like human eyes.17
Some Brāhmans abstain from garlic and onions. Some do not eat Kodra
(punctured millet).18
The masur (Lentil) pulse is not eaten by Brāhmans and Baniās, because,
when cooked, it looks red like blood.19
The Humbad Baniās do not eat whey, milk, curdled milk and clarified
butter.20
The Shrāvaks abstain from the suran (Elephant foot), potatoes and roots
that grow underground.21
Mahomedans abstain from the suran, because “su” the first letter of the
word suran is also the first letter of their taboo’d animal the pig.22
There are some deities associated with the worship of animals. These
animals, with the deities with whom they are connected, are given below.
9. Ghodo the horse is the conveyance of the Sun. The horse is also
associated with the planet Guru or Jupiter and Shukra or Venus.
10. Mrig the deer is supposed to be the conveyance of the Moon as well as
of Mangal or Mars.
The animals mentioned above are worshipped along with deities and
planets with whom they are associated23.
In the temples of the Mātās cocks and hens, and in the temple of Kāl
Bhairav, dogs, are worshipped.25
For the propitiation of goddesses and evil spirits, male goats, he-buffaloes
and cocks are sacrificed.26
In his first incarnation, the god Vishnu was born as a fish, in the second as
an alligator, and in the third as a boar. For this reason the images of these
animals are worshipped.27
All the gods, goddesses and spirits mentioned in the preceding pages are
represented by idols made of stone, metal or wood. In addition to stone
idols of gods there are certain stones which are considered to represent gods
and worshipped as such. Some of these stones are described below.
All the stones found in the river Narbada are believed to represent the god
Shiva and worshipped.
There is a kind of stone found in the river Gandaki which is smooth on one
side and porous on the other. It is either round or square and about five
inches in length. This stone is called Shāligrām and is believed to represent
the god Vishnu. It is kept in the household gods and worshipped daily.
There is another kind of hard, white, porous stone found near Dwārka. It is
also worshipped along with the idol of Vishnu.
There is a tank near the Pir in Kutiāna in which bored stones are found
floating on the surface of the water. These stones are considered sacred.29
There is also a kind of red stone which is supposed to cure skin diseases.31
This sacrifice is generally made on the eighth or tenth day of the bright half
of Ashvin.
In place of human blood, milk mixed with gulāl (red powder) and molasses
is offered.35
In ancient times, when a well was dug, a human sacrifice was made to it if
it did not yield water, with the belief that this would bring water into the
well. Now-a-days, instead of this sacrifice, blood from the fourth finger of a
man is sprinkled over the spot.36
It is also related that in ancient times, when a king was crowned, a human
sacrifice was offered. Now-a-days, instead of this sacrifice, the king’s
forehead is marked with the blood from the fourth finger of a low caste
Hindu at the time of the coronation ceremony.37
There are a few stones which are supposed to have the power of curing
certain diseases. One of such stones is known as Ratvano Pāro. It is found
at a distance of about two miles from Kolki. It is marked with red lines. It is
bored and worn round the neck by persons suffering from ratawa38 (a
disease in which red spots or pimples are seen on the skin).
There is another stone called Suleimani Pāro which is supposed to have the
power of curing many diseases.39
Sieves for flour and corn, brooms, sambelus or corn pounders, and ploughs
are regarded as sacred.
1. Because articles of food such as flour, grain, etc., are sifted through
them.41
3. Because the fire used for igniting the sacrificial fuel is taken in a sieve, or
is covered with a sieve while it is being carried to the sacrificial altar.42
5. Because, in some communities like the Bhātiās, the bride’s mother, when
receiving the bridegroom in the marriage booth, carries in a dish a lamp
covered with a sieve.44
The sambelu is considered so sacred that it is not touched with the foot. If a
woman lie down during day time, she will not touch it either with her head
or with her foot.
Among Shrigaud Brāhmans, on the marriage day, one of the men of the
bridegroom’s party wears a wreath made of a sambelu, a broom and other
articles. Some special marks are also made on his forehead. Thus adorned,
he goes with the bridegroom’s procession and plays jokes with the parents
of both the bride and bridegroom. His doing so is supposed to bless the
bridal pair with a long life and a large family.48
On the marriage day, after the ceremony of propitiating the nine planets has
been performed in the bride’s house, in some castes three, and in others one
sambelu, is kept near the spot where the planets are worshipped. Next, five
unwidowed women of the family hold the sambelus and thrash them five or
seven times on the floor repeating the words “On the chest of the ill-wisher
of the host.” The sambelus are bound together by a thread.49
Some people consider the plough sacred because Sita, the consort of Rām,
was born of the earth by the touch of a plough.53 Others hold it sacred as it
was used as a weapon by Baldev, the brother of the god Krishna.
On account of the sanctity which attaches to the plough, it forms part of the
articles, with which a bridegroom is received in the marriage pandal by the
bride’s mother.54
It is related that king Janak ploughed the soil on which he had to perform a
sacrifice. Hence it has become a practice to purify with a plough the spot on
which a sacrifice is to be performed.55
In some places, on the Balev day, a number of persons gather together near
a pond, and each of them fills an earthen jar with the water of the pond.
Next, one of the party is made to stand at a long distance from the others
with a small plough in his hands. The others then run a race towards the
latter. He who wins the race is presented with molasses and a cocoanut.56
When a newly-born infant does not cry, the leaves of a broom are thrown
into the fire and their smoke is passed over the child. It is said that this
makes the child cry.58
Some people consider brooms sacred, because they are used in sweeping
the ground58 (that is the earth, which is a goddess).
In some places, children suffering from cough are fanned with a broom.59
Some believe that if a broom be kept erect in the house, a quarrel between
the husband and wife is sure to follow. There is also a belief that if a person
thrashes another with a broom, the former is liable to suffer from a gland
under the arm.62
The Agnihotris keep a constant fire burning in their houses and worship it
thrice a day, morning, noon and evening65.
The Pārsis consider fire so sacred that they do not smoke. Neither do they
cross fire. In their temples called Agiāris a fire of sandal wood is kept
constantly burning. It is considered a great mishap if this fire is
extinguished.65
Fire is specially worshipped on the Holi day, that is the full-moon day of the
month of Fālgun.66
The fire to be used for sacrifices and agnihotras is produced by the friction
of two pieces of the wood of the Arani,71 the Pipal, the Shami72 or the
bamboo while mantras or incantations are being recited by Brāhmans.73
ebooksecure.com