100% found this document useful (4 votes)
24 views

Download Pattern recognition algorithms for data mining scalability knowledge discovery and soft granular computing 1st Edition Sankar K. Pal ebook All Chapters PDF

soft

Uploaded by

nittyaxo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (4 votes)
24 views

Download Pattern recognition algorithms for data mining scalability knowledge discovery and soft granular computing 1st Edition Sankar K. Pal ebook All Chapters PDF

soft

Uploaded by

nittyaxo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Download the full version of the ebook at ebookfinal.

com

Pattern recognition algorithms for data mining


scalability knowledge discovery and soft granular
computing 1st Edition Sankar K. Pal

https://ebookfinal.com/download/pattern-recognition-
algorithms-for-data-mining-scalability-knowledge-discovery-
and-soft-granular-computing-1st-edition-sankar-k-pal/

OR CLICK BUTTON

DOWNLOAD EBOOK

Download more ebook instantly today at https://ebookfinal.com


Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Pattern Recognition Algorithms for Data Mining 1st Edition


Sankar K. Pal

https://ebookfinal.com/download/pattern-recognition-algorithms-for-
data-mining-1st-edition-sankar-k-pal/

ebookfinal.com

Data Mining and Knowledge Discovery Technologies Advances


in Data Warehousing and Mining 1st Edition David Taniar

https://ebookfinal.com/download/data-mining-and-knowledge-discovery-
technologies-advances-in-data-warehousing-and-mining-1st-edition-
david-taniar/
ebookfinal.com

Knowledge Discovery and Data Mining Challenges and


Realities Xingquan Zhu

https://ebookfinal.com/download/knowledge-discovery-and-data-mining-
challenges-and-realities-xingquan-zhu/

ebookfinal.com

Cloud Computing Solutions 1st Edition Souvik Pal

https://ebookfinal.com/download/cloud-computing-solutions-1st-edition-
souvik-pal/

ebookfinal.com
Pattern Recognition 1st Edition William Gibson

https://ebookfinal.com/download/pattern-recognition-1st-edition-
william-gibson/

ebookfinal.com

Pattern Recognition and Trading Decisions Chris Satchwell

https://ebookfinal.com/download/pattern-recognition-and-trading-
decisions-chris-satchwell/

ebookfinal.com

Pattern Recognition 4ed Edition Sergios Theodoridis

https://ebookfinal.com/download/pattern-recognition-4ed-edition-
sergios-theodoridis/

ebookfinal.com

Data Mining for Bioinformatics 1st Edition Sumeet Dua

https://ebookfinal.com/download/data-mining-for-bioinformatics-1st-
edition-sumeet-dua/

ebookfinal.com

Algorithms and Data Structures The Science of Computing


Electrical and Computer Engineering Series 1st Edition
Douglas Baldwin
https://ebookfinal.com/download/algorithms-and-data-structures-the-
science-of-computing-electrical-and-computer-engineering-series-1st-
edition-douglas-baldwin/
ebookfinal.com
Pattern Recognition
Algorithms for
Data Mining
Scalability, Knowledge Discovery and Soft
Granular Computing

Sankar K. Pal and Pabitra Mitra


Machine Intelligence Unit
Indian Statistical Institute
Calcutta, India

CHAPMAN & HALL/CRC


A CRC Press Company
Boca Raton London New York Washington, D.C.

Cover art provided by Laura Bright


(http://laurabright.com).
http://www.ciaadvertising.org/SA/sping_03/391K/
lbright/paper/site/report/introduction.html

© 2004 by Taylor & Francis Group, LLC


C4576 disclaimer.fm Page 1 Tuesday, April 6, 2004 10:36 AM

Library of Congress Cataloging-in-Publication Data

Pal, Sankar K.
Pattern recognition algorithms for data mining : scalability, knowledge discovery, and
soft granular computing / Sankar K. Pal and Pabitra Mitra.
p. cm.
Includes bibliographical references and index.
ISBN 1-58488-457-6 (alk. paper)
1. Data mining. 2. Pattern recognition systems. 3. Computer algorithms. 4. Granular
computing / Sankar K. Pal and Pabita Mitra.

QA76.9.D343P38 2004
006.3'12—dc22 2004043539

This book contains information obtained from authentic and highly regarded sources. Reprinted material
is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable
efforts have been made to publish reliable data and information, but the author and the publisher cannot
assume responsibility for the validity of all materials or for the consequences of their use.

Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, microfilming, and recording, or by any information storage or
retrieval system, without prior permission in writing from the publisher.

The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for
creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC
for such copying.

Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation, without intent to infringe.

Visit the CRC Press Web site at www.crcpress.com

© 2004 by CRC Press LLC

No claim to original U.S. Government works


International Standard Book Number 1-58488-457-6
Library of Congress Card Number 2004043539
Printed in the United States of America 1 2 3 4 5 6 7 8 9 0
Printed on acid-free paper

© 2004 by Taylor & Francis Group, LLC


To our parents

© 2004 by Taylor & Francis Group, LLC


Contents

Foreword xiii

Preface xxi

List of Tables xxv

List of Figures xxvii

1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Pattern Recognition in Brief . . . . . . . . . . . . . . . . . . 3
1.2.1 Data acquisition . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Feature selection/extraction . . . . . . . . . . . . . . . 4
1.2.3 Classification . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Knowledge Discovery in Databases (KDD) . . . . . . . . . . 7
1.4 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Data mining tasks . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Data mining tools . . . . . . . . . . . . . . . . . . . . 12
1.4.3 Applications of data mining . . . . . . . . . . . . . . . 12
1.5 Different Perspectives of Data Mining . . . . . . . . . . . . . 14
1.5.1 Database perspective . . . . . . . . . . . . . . . . . . . 14
1.5.2 Statistical perspective . . . . . . . . . . . . . . . . . . 15
1.5.3 Pattern recognition perspective . . . . . . . . . . . . . 15
1.5.4 Research issues and challenges . . . . . . . . . . . . . 16
1.6 Scaling Pattern Recognition Algorithms to Large Data Sets . 17
1.6.1 Data reduction . . . . . . . . . . . . . . . . . . . . . . 17
1.6.2 Dimensionality reduction . . . . . . . . . . . . . . . . 18
1.6.3 Active learning . . . . . . . . . . . . . . . . . . . . . . 19
1.6.4 Data partitioning . . . . . . . . . . . . . . . . . . . . . 19
1.6.5 Granular computing . . . . . . . . . . . . . . . . . . . 20
1.6.6 Efficient search algorithms . . . . . . . . . . . . . . . . 20
1.7 Significance of Soft Computing in KDD . . . . . . . . . . . . 21
1.8 Scope of the Book . . . . . . . . . . . . . . . . . . . . . . . . 22

vii
© 2004 by Taylor & Francis Group, LLC
viii

2 Multiscale Data Condensation 29


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Data Condensation Algorithms . . . . . . . . . . . . . . . . . 32
2.2.1 Condensed nearest neighbor rule . . . . . . . . . . . . 32
2.2.2 Learning vector quantization . . . . . . . . . . . . . . 33
2.2.3 Astrahan’s density-based method . . . . . . . . . . . . 34
2.3 Multiscale Representation of Data . . . . . . . . . . . . . . . 34
2.4 Nearest Neighbor Density Estimate . . . . . . . . . . . . . . 37
2.5 Multiscale Data Condensation Algorithm . . . . . . . . . . . 38
2.6 Experimental Results and Comparisons . . . . . . . . . . . . 40
2.6.1 Density estimation . . . . . . . . . . . . . . . . . . . . 41
2.6.2 Test of statistical significance . . . . . . . . . . . . . . 41
2.6.3 Classification: Forest cover data . . . . . . . . . . . . 47
2.6.4 Clustering: Satellite image data . . . . . . . . . . . . . 48
2.6.5 Rule generation: Census data . . . . . . . . . . . . . . 49
2.6.6 Study on scalability . . . . . . . . . . . . . . . . . . . 52
2.6.7 Choice of scale parameter . . . . . . . . . . . . . . . . 52
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3 Unsupervised Feature Selection 59


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3.1 Filter approach . . . . . . . . . . . . . . . . . . . . . . 63
3.3.2 Wrapper approach . . . . . . . . . . . . . . . . . . . . 64
3.4 Feature Selection Using Feature Similarity (FSFS) . . . . . . 64
3.4.1 Feature similarity measures . . . . . . . . . . . . . . . 65
3.4.2 Feature selection through clustering . . . . . . . . . . 68
3.5 Feature Evaluation Indices . . . . . . . . . . . . . . . . . . . 71
3.5.1 Supervised indices . . . . . . . . . . . . . . . . . . . . 71
3.5.2 Unsupervised indices . . . . . . . . . . . . . . . . . . . 72
3.5.3 Representation entropy . . . . . . . . . . . . . . . . . 73
3.6 Experimental Results and Comparisons . . . . . . . . . . . . 74
3.6.1 Comparison: Classification and clustering performance 74
3.6.2 Redundancy reduction: Quantitative study . . . . . . 79
3.6.3 Effect of cluster size . . . . . . . . . . . . . . . . . . . 80
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4 Active Learning Using Support Vector Machine 83


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2 Support Vector Machine . . . . . . . . . . . . . . . . . . . . 86
4.3 Incremental Support Vector Learning with Multiple Points . 88
4.4 Statistical Query Model of Learning . . . . . . . . . . . . . . 89
4.4.1 Query strategy . . . . . . . . . . . . . . . . . . . . . . 90
4.4.2 Confidence factor of support vector set . . . . . . . . . 90

© 2004 by Taylor & Francis Group, LLC


ix

4.5 Learning Support Vectors with Statistical Queries . . . . . . 91


4.6 Experimental Results and Comparison . . . . . . . . . . . . 94
4.6.1 Classification accuracy and training time . . . . . . . 94
4.6.2 Effectiveness of the confidence factor . . . . . . . . . . 97
4.6.3 Margin distribution . . . . . . . . . . . . . . . . . . . 97
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 Rough-fuzzy Case Generation 103


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Soft Granular Computing . . . . . . . . . . . . . . . . . . . . 105
5.3 Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3.1 Information systems . . . . . . . . . . . . . . . . . . . 107
5.3.2 Indiscernibility and set approximation . . . . . . . . . 107
5.3.3 Reducts . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3.4 Dependency rule generation . . . . . . . . . . . . . . . 110
5.4 Linguistic Representation of Patterns and Fuzzy Granulation 111
5.5 Rough-fuzzy Case Generation Methodology . . . . . . . . . . 114
5.5.1 Thresholding and rule generation . . . . . . . . . . . . 115
5.5.2 Mapping dependency rules to cases . . . . . . . . . . . 117
5.5.3 Case retrieval . . . . . . . . . . . . . . . . . . . . . . . 118
5.6 Experimental Results and Comparison . . . . . . . . . . . . 120
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6 Rough-fuzzy Clustering 123


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.2 Clustering Methodologies . . . . . . . . . . . . . . . . . . . . 124
6.3 Algorithms for Clustering Large Data Sets . . . . . . . . . . 126
6.3.1 CLARANS: Clustering large applications based upon
randomized search . . . . . . . . . . . . . . . . . . . . 126
6.3.2 BIRCH: Balanced iterative reducing and clustering us-
ing hierarchies . . . . . . . . . . . . . . . . . . . . . . 126
6.3.3 DBSCAN: Density-based spatial clustering of applica-
tions with noise . . . . . . . . . . . . . . . . . . . . . . 127
6.3.4 STING: Statistical information grid . . . . . . . . . . 128
6.4 CEMMiSTRI: Clustering using EM, Minimal Spanning Tree
and Rough-fuzzy Initialization . . . . . . . . . . . . . . . . . 129
6.4.1 Mixture model estimation via EM algorithm . . . . . 130
6.4.2 Rough set initialization of mixture parameters . . . . 131
6.4.3 Mapping reducts to mixture parameters . . . . . . . . 132
6.4.4 Graph-theoretic clustering of Gaussian components . . 133
6.5 Experimental Results and Comparison . . . . . . . . . . . . 135
6.6 Multispectral Image Segmentation . . . . . . . . . . . . . . . 139
6.6.1 Discretization of image bands . . . . . . . . . . . . . . 141
6.6.2 Integration of EM, MST and rough sets . . . . . . . . 141
6.6.3 Index for segmentation quality . . . . . . . . . . . . . 141

© 2004 by Taylor & Francis Group, LLC


x

6.6.4 Experimental results and comparison . . . . . . . . . . 141


6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7 Rough Self-Organizing Map 149


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2 Self-Organizing Maps (SOM) . . . . . . . . . . . . . . . . . . 150
7.2.1 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.2.2 Effect of neighborhood . . . . . . . . . . . . . . . . . . 152
7.3 Incorporation of Rough Sets in SOM (RSOM) . . . . . . . . 152
7.3.1 Unsupervised rough set rule generation . . . . . . . . 153
7.3.2 Mapping rough set rules to network weights . . . . . . 153
7.4 Rule Generation and Evaluation . . . . . . . . . . . . . . . . 154
7.4.1 Extraction methodology . . . . . . . . . . . . . . . . . 154
7.4.2 Evaluation indices . . . . . . . . . . . . . . . . . . . . 155
7.5 Experimental Results and Comparison . . . . . . . . . . . . 156
7.5.1 Clustering and quantization error . . . . . . . . . . . . 157
7.5.2 Performance of rules . . . . . . . . . . . . . . . . . . . 162
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

8 Classification, Rule Generation and Evaluation using Modu-


lar Rough-fuzzy MLP 165
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.2 Ensemble Classifiers . . . . . . . . . . . . . . . . . . . . . . . 167
8.3 Association Rules . . . . . . . . . . . . . . . . . . . . . . . . 170
8.3.1 Rule generation algorithms . . . . . . . . . . . . . . . 170
8.3.2 Rule interestingness . . . . . . . . . . . . . . . . . . . 173
8.4 Classification Rules . . . . . . . . . . . . . . . . . . . . . . . 173
8.5 Rough-fuzzy MLP . . . . . . . . . . . . . . . . . . . . . . . . 175
8.5.1 Fuzzy MLP . . . . . . . . . . . . . . . . . . . . . . . . 175
8.5.2 Rough set knowledge encoding . . . . . . . . . . . . . 176
8.6 Modular Evolution of Rough-fuzzy MLP . . . . . . . . . . . 178
8.6.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 178
8.6.2 Evolutionary design . . . . . . . . . . . . . . . . . . . 182
8.7 Rule Extraction and Quantitative Evaluation . . . . . . . . . 184
8.7.1 Rule extraction methodology . . . . . . . . . . . . . . 184
8.7.2 Quantitative measures . . . . . . . . . . . . . . . . . . 188
8.8 Experimental Results and Comparison . . . . . . . . . . . . 189
8.8.1 Classification . . . . . . . . . . . . . . . . . . . . . . . 190
8.8.2 Rule extraction . . . . . . . . . . . . . . . . . . . . . . 192
8.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

© 2004 by Taylor & Francis Group, LLC


xi

A Role of Soft-Computing Tools in KDD 201


A.1 Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
A.1.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 202
A.1.2 Association rules . . . . . . . . . . . . . . . . . . . . . 203
A.1.3 Functional dependencies . . . . . . . . . . . . . . . . . 204
A.1.4 Data summarization . . . . . . . . . . . . . . . . . . . 204
A.1.5 Web application . . . . . . . . . . . . . . . . . . . . . 205
A.1.6 Image retrieval . . . . . . . . . . . . . . . . . . . . . . 205
A.2 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 206
A.2.1 Rule extraction . . . . . . . . . . . . . . . . . . . . . . 206
A.2.2 Clustering and self organization . . . . . . . . . . . . . 206
A.2.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . 207
A.3 Neuro-fuzzy Computing . . . . . . . . . . . . . . . . . . . . . 207
A.4 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . 208
A.5 Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
A.6 Other Hybridizations . . . . . . . . . . . . . . . . . . . . . . 210

B Data Sets Used in Experiments 211

References 215

Index 237

About the Authors 243

© 2004 by Taylor & Francis Group, LLC


Foreword

Indian Statistical Institute (ISI), the home base of Professors S.K. Pal and P.
Mitra, has long been recognized as the world’s premier center of fundamental
research in probability, statistics and, more recently, pattern recognition and
machine intelligence. The halls of ISI are adorned with the names of P.C. Ma-
halanobis, C.R. Rao, R.C. Bose, D. Basu, J.K. Ghosh, D. Dutta Majumder,
K.R. Parthasarathi and other great intellects of the past century–great intel-
lects who have contributed so much and in so many ways to the advancement
of science and technology. The work of Professors Pal and Mitra, ”Pattern
Recognition Algorithms for Data Mining,” or PRDM for short, reflects this
illustrious legacy. The importance of PRDM is hard to exaggerate. It is a
treatise that is an exemplar of authority, deep insights, encyclopedic coverage
and high expository skill.
The primary objective of PRDM, as stated by the authors, is to provide
a unified framework for addressing pattern recognition tasks which are es-
sential for data mining. In reality, the book accomplishes much more; it
develops a unified framework and presents detailed analyses of a wide spec-
trum of methodologies for dealing with problems in which recognition, in one
form or another, plays an important role. Thus, the concepts and techniques
described in PRDM are of relevance not only to problems in pattern recog-
nition, but, more generally, to classification, analysis of dependencies, system
identification, authentication, and ultimately, to data mining. In this broad
perspective, conventional pattern recognition becomes a specialty–a specialty
with deep roots and a large store of working concepts and techniques.
Traditional pattern recognition is subsumed by what may be called recog-
nition technology. I take some credit for arguing, some time ago, that de-
velopment of recognition technology should be accorded a high priority. My
arguments may be found in the foreword,” Recognition Technology and Fuzzy
Logic, ”Special Issue on Recognition Technology, IEEE Transactions on Fuzzy
Systems, 2001. A visible consequence of my arguments was an addition of
the subtitle ”Soft Computing in Recognition and Search,” to the title of the
journal ”Approximate Reasoning.” What is important to note is that recogni-
tion technology is based on soft computing–a coalition of methodologies which
collectively provide a platform for the conception, design and utilization of in-
telligent systems. The principal constitutes of soft computing are fuzzy logic,
neurocomputing, evolutionary computing, probabilistic computing, rough set
theory and machine learning. These are the methodologies which are de-
scribed and applied in PRDM with a high level of authority and expository

xiii
© 2004 by Taylor & Francis Group, LLC
xiv

skill. Particularly worthy of note is the exposition of methods in which rough


set theory and fuzzy logic are used in combination.
Much of the material in PRDM is new and reflects the authors’ extensive
experience in dealing with a wide variety of problems in which recognition and
analysis of dependencies play essential roles. Such is the case in data mining
and, in particular, in the analysis of both causal and non-causal dependencies.
A pivotal issue–which subsumes feature selection and feature extraction–
and which receives a great deal of attention in PRDM, is that of feature
analysis. Feature analysis has a position of centrality in recognition, and
its discussion in PRDM is an order of magnitude more advanced and more
insightful than what can be found in the existing literature. And yet, it
cannot be claimed that the basic problem of feature selection–especially in
the context of data mining–has been solved or is even close to solution. Why?
The reason, in my view, is the following. To define what is meant by a feature
it is necessary to define what is meant by relevance. Conventionally, relevance
is defined as a bivalent concept, that is, if q is a query and p is a proposition or
a collection of propositions, then either p is relevant to q or p is not relevant
to q, with no shades of gray allowed. But it is quite obvious that relevance is a
matter of degree, which is consistent with the fact that in a natural language
we allow expressions such as quite relevant, not very relevant, highly relevant,
etc. In the existing literature, there is no definition of relevance which makes
it possible to answer the question: To what degree is p relevant to q? For
example, if q is: How old is Carol? and p is: Carol has a middle-aged mother,
then to what degree is the knowledge that Carol has a middle-aged mother,
relevant to the query: How old is Carol? As stated earlier, the problem is that
relevance is not a bivalent concept, as it is frequently assumed to be; rather,
relevance is a fuzzy concept which does not lend itself to definition within the
conceptual structure of bivalent logic. However, what can be found in PRDM
is a very thorough discussion of a related issue, namely, methods of assessment
of relative importance of features in the context of pattern recognition and
data mining.
A difficult problem which arises both in assessment of the degree of relevance
of a proposition, p, and in assessment of the degree of importance of a feature,
f, relates to combination of such degrees. More concretely, if we have two
propositions p−1 and p2 with respective degrees of relevance r1 and r2 , then
all that can be said about the relevance of (p1 , p2 ) is that it is bounded
from below by max(r1 , r2 ). This makes it possible for both p1 and p2 to be
irrelevant (r1 = r2 = 0), and yet the degree of relevance of (p1 , p2 ) may be
close to 1.
The point I am trying to make is that there are many basic issues in pattern
recognition–and especially in relation to its role in data mining–whose reso-
lution lies beyond the reach of methods based on bivalent logic and bivalent–
logic-based probability theory. The issue of relevance is a case in point. An-
other basic issue is that of causality. But what is widely unrecognized is that
even such familiar concepts as cluster and edge are undefinable within the

© 2004 by Taylor & Francis Group, LLC


xv

conceptual structure of bivalent logic. This assertion is not contradicted by


the fact that there is an enormous literature on cluster analysis and edge de-
tection. What cannot be found in this literature are formalized definitions of
cluster and edge.
How can relevance, causality, cluster, edge and many other familiar concepts
be defined? In my view, what is needed for this purpose is the methodology
of computing with words. In this methodology, the objects of computation
are words and propositions drawn from a natural language. I cannot be more
detailed in a foreword.
Although PRDM does not venture into computing with words directly, it
does lay the groundwork for it, especially through extensive exposition of
granular computing and related methods of computation. It does so through
an exceptionally insightful discussion of advanced methods drawn from fuzzy
logic, neurocomputing, probabilistic computing, rough set theory and machine
learning.
In summary, “Pattern Recognition Algorithms in Data Mining” is a book
that commands admiration. Its authors, Professors S.K. Pal and P. Mitra are
foremost authorities in pattern recognition, data mining and related fields.
Within its covers, the reader finds an exceptionally well-organized exposition
of every concept and every method that is of relevance to the theme of the
book. There is much that is original and much that cannot be found in the
literature. The authors and the publisher deserve our thanks and congrat-
ulations for producing a definitive work that contributes so much and in so
many important ways to the advancement of both the theory and practice of
recognition technology, data mining and related fields. The magnum opus of
Professors Pal and Mitra is a must reading for anyone who is interested in the
conception, design and utilization of intelligent systems.

March 2004 Lotfi A. Zadeh


University of California
Berkeley, CA, USA

© 2004 by Taylor & Francis Group, LLC


Foreword

Data mining offers techniques of discovering patterns in voluminous databases.


In other words, data mining is a technique of discovering knowledge from
large data sets (KDD). Knowledge is usually presented in the form of decision
rules easy to understand and used by humans. Therefore, methods for rule
generation and evaluation are of utmost importance in this context.
Many approaches to accomplish this have been developed and explored in
recent years. The prominent scientist Prof. Sankar K. Pal and his student
Dr. Pabitra Mitra present in this valuable volume, in addition to classi-
cal methods, recently emerged various new methodologies for data mining,
such as rough sets, rough fuzzy hybridization, granular computing, artificial
neural networks, genetic algorithms, and others. In addition to theoretical
foundations, the book also includes experimental results. Many real life and
nontrivial examples given in the book show how the new techniques work and
can be used in reality and what advantages they offer compared with classical
methods (e.g., statistics).
This book covers a wide spectrum of problems related to data mining, data
analysis, and knowledge discovery in large databases. It should be recom-
mended reading for any researcher or practitioner working in these areas.
Also graduate students in AI get a very well-organized book presenting mod-
ern concepts and tools used in this domain.
In the appendix various basic computing tools and data sets used in exper-
iments are supplied. A complete bibliography on the subject is also included.
The book presents an unbeatable combination of theory and practice and
gives a comprehensive view on methods and tools in modern KDD.
The authors deserve the highest appreciation for this excellent monograph.

January 2004 Zdzislaw Pawlak


Polish Academy of Sciences
Warsaw, Poland

xvii
© 2004 by Taylor & Francis Group, LLC
Foreword

This is the latest in a series of volumes by Professor Sankar Pal and his col-
laborators on pattern recognition methodologies and applications. Knowledge
discovery and data mining, the recognition of patterns that may be present in
very large data sets and across distributed heterogeneous databases, is an ap-
plication of current prominence. This volume provides a very useful, thorough
exposition of the many facets of this application from several perspectives.
The chapters provide overviews of pattern recognition, data mining, outline
some of the research issues and carefully take the reader through the many
steps that are involved in reaching the desired goal of exposing the patterns
that may be embedded in voluminous data sets. These steps include prepro-
cessing operations for reducing the volume of the data and the dimensionality
of the feature space, clustering, segmentation, and classification. Search al-
gorithms and statistical and database operations are examined. Attention is
devoted to soft computing algorithms derived from the theories of rough sets,
fuzzy sets, genetic algorithms, multilayer perceptrons (MLP), and various hy-
brid combinations of these methodologies.
A valuable expository appendix describes various soft computing method-
ologies and their role in knowledge discovery and data mining (KDD). A sec-
ond appendix provides the reader with several data sets for experimentation
with the procedures described in this volume.
As has been the case with previous volumes by Professor Pal and his col-
laborators, this volume will be very useful to both researchers and students
interested in the latest advances in pattern recognition and its applications in
KDD.
I congratulate the authors of this volume and I am pleased to recommend
it as a valuable addition to the books in this field.

February 2004 Laveen N. Kanal


University of Maryland
College Park, MD, USA

xix
© 2004 by Taylor & Francis Group, LLC
Preface

In recent years, government agencies and scientific, business and commercial


organizations are routinely using computers not just for computational pur-
poses but also for storage, in massive databases, of the immense volumes of
data that they routinely generate or require from other sources. We are in the
midst of an information explosion, and there is an urgent need for method-
ologies that will help us bring some semblance of order into the phenomenal
volumes of data. Traditional statistical data summarization and database
management techniques are just not adequate for handling data on this scale,
and for extracting intelligently information or knowledge that may be useful
for exploring the domain in question or the phenomena responsible for the data
and providing support to decision-making processes. This quest had thrown
up some new phrases, for example, data mining and knowledge discovery in
databases (KDD).
Data mining deals with the process of identifying valid, novel, potentially
useful, and ultimately understandable patterns in data. It may be viewed as
applying pattern recognition (PR) and machine learning principles in the con-
text of voluminous, possibly heterogeneous data sets. Two major challenges
in applying PR algorithms to data mining problems are those of “scalability”
to large/huge data sets and of “discovering knowledge” which is valid and
comprehensible to humans. Research is going on in these lines for developing
efficient PR methodologies and algorithms, in different classical and modern
computing frameworks, as applicable to various data mining tasks with real
life applications.
The present book is aimed at providing a treatise in a unified framework,
with both theoretical and experimental results, addressing certain pattern
recognition tasks essential for data mining. Tasks considered include data
condensation, feature selection, case generation, clustering/classification, rule
generation and rule evaluation. Various theories, methodologies and algo-
rithms using both a classical approach and hybrid paradigm (e.g., integrating
fuzzy logic, artificial neural networks, rough sets, genetic algorithms) have
been presented. The emphasis is given on (a) handling data sets that are
large (both in size and dimension) and involve classes that are overlapping,
intractable and/or have nonlinear boundaries, and (b) demonstrating the sig-
nificance of granular computing in soft computing frameworks for generating
linguistic rules and dealing with the knowledge discovery aspect, besides re-
ducing the computation time.
It is shown how several novel strategies based on multi-scale data con-

xxi
© 2004 by Taylor & Francis Group, LLC
xxii

densation, dimensionality reduction, active support vector learning, granular


computing and efficient search heuristics can be employed for dealing with
the issue of scaling up in large scale learning problem. The tasks of encoding,
extraction and evaluation of knowledge in the form of human comprehensible
linguistic rules are addressed in a soft computing framework by different in-
tegrations of its constituting tools. Various real life data sets, mainly large in
dimension and/or size, taken from varied domains, e.g., geographical informa-
tion systems, remote sensing imagery, population census, speech recognition
and cancer management, are considered to demonstrate the superiority of
these methodologies with statistical significance.
Examples are provided, wherever necessary, to make the concepts more
clear. A comprehensive bibliography on the subject is appended. Major
portions of the text presented in the book are from the published work of the
authors. Some references in the related areas might have been inadvertently
omitted because of oversight or ignorance.
This volume, which is unique in its character, will be useful to graduate
students and researchers in computer science, electrical engineering, system
science, and information technology both as a text and a reference book for
some parts of the curriculum. The researchers and practitioners in industry
and research and development laboratories working in fields such as system
design, pattern recognition, data mining, image processing, machine learning
and soft computing will also benefit. For convenience, brief descriptions of
the data sets used in the experiments are provided in the Appendix.
The text is organized in eight chapters. Chapter 1 describes briefly ba-
sic concepts, features and techniques of PR and introduces data mining and
knowledge discovery in light of PR, different research issues and challenges,
the problems of scaling of PR algorithms to large data sets, and the signifi-
cance of soft computing in knowledge discovery.
Chapters 2 and 3 deal with the (pre-processing) tasks of multi-scale data
condensation and unsupervised feature selection or dimensionality reduction.
After providing a review in the respective fields, a methodology based on a
statistical approach is described in detail in each chapter along with experi-
mental results. The method of k-NN density estimation and the concept of
representation entropy, used therein, are explained in their respective chap-
ters. The data condensation strategy preserves the salient characteristics of
the original data at different scales by representing the underlying probability
density. The unsupervised feature selection algorithm is based on computing
the similarity between features and then removing the redundancy therein
without requiring any search. These methods are scalable.
Chapter 4 concerns the problem of learning with support vector machine
(SVM). After describing the design procedure of SVM, two active learning
strategies for handling the large quadratic problem in a SVM framework are
presented. In order to reduce the sample complexity, a statistical query model
is employed incorporating a trade-off between the efficiency and robustness in
performance.

© 2004 by Taylor & Francis Group, LLC


xxiii

Chapters 5 to 8 highlight the significance of granular computing for dif-


ferent mining tasks in a soft paradigm. While the rough-fuzzy framework is
used for case generation in Chapter 5, the same is integrated with expectation
maximization algorithm and minimal spanning trees in Chapter 6 for cluster-
ing large data sets. The role of rough sets is to use information granules for
extracting the domain knowledge which is encoded in different ways. Since
computation is made using the granules (clump of objects), not the individual
points, the methods are fast. The cluster quality, envisaged on a multi-spectral
image segmentation problem, is also improved owing to the said integration.
In Chapter 7, design procedure of a rough self-organizing map (RSOM) is
described for clustering and unsupervised linguistic rule generation with a
structured network.
The problems of classification, and rule generation and evaluation in a su-
pervised mode are addressed in Chapter 8 with a modular approach through a
synergistic integration of four soft computing tools, namely, fuzzy sets, rough
sets, neural nets and genetic algorithms. A modular evolutionary rough-fuzzy
multi-layered perceptron is described which results in accelerated training,
compact network, unambiguous linguistic rules and improved accuracy. Dif-
ferent rule evaluation indices are used to reflect the knowledge discovery as-
pect.
Finally, we take this opportunity to thank Mr. Robert B. Stern of Chapman
& Hall/CRC Press, Florida, for his initiative and encouragement. Financial
support to Dr. Pabitra Mitra from the Council of Scientific and Industrial
Research (CSIR), New Delhi in the form of Research Associateship (through
Grant # 22/346/02-EMR II) is also gratefully acknowledged.

Sankar K. Pal
September 13, 2003 Pabitra Mitra

© 2004 by Taylor & Francis Group, LLC


List of Tables

2.1 Comparison of k-NN density estimation error of condensation


algorithms (lower CR) . . . . . . . . . . . . . . . . . . . . . . 43
2.2 Comparison of k-NN density estimation error of condensation
algorithms (higher CR) . . . . . . . . . . . . . . . . . . . . . 44
2.3 Comparison of kernel (Gaussian) density estimation error of
condensation algorithms (lower CR, same condensed set as Ta-
ble 2.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.4 Comparison of kernel (Gaussian) density estimation error of
condensation algorithms (higher CR, same condensed set as
Table 2.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.5 Classification performance for Forest cover type data . . . . . 58
2.6 β value and CPU time of different clustering methods . . . . 58
2.7 Rule generation performance for the Census data . . . . . . . 58

3.1 Comparison of feature selection algorithms for large dimen-


sional data sets . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2 Comparison of feature selection algorithms for medium dimen-
sional data sets . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.3 Comparison of feature selection algorithms for low dimensional
data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4 Comparison of feature selection algorithms for large data sets
when search algorithms use FFEI as the selection criterion . . 79
s
3.5 Representation entropy HR of subsets selected using some al-
gorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.6 Redundancy reduction using different feature similarity mea-
sures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.1 Comparison of performance of SVM design algorithms . . . . 96

5.1 Hiring: An example of a decision table . . . . . . . . . . . . . 107


5.2 Two decision tables obtained by splitting the Hiring table S
(Table 5.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.3 Discernibility matrix MAccept for the split Hiring decision table
SAccept (Table 5.2(a)) . . . . . . . . . . . . . . . . . . . . . . 112
5.4 Rough dependency rules for the Iris data . . . . . . . . . . . 121
5.5 Cases generated for the Iris data . . . . . . . . . . . . . . . . 121
5.6 Comparison of case selection algorithms for Iris data . . . . . 121

xxv
© 2004 by Taylor & Francis Group, LLC
xxvi

5.7 Comparison of case selection algorithms for Forest cover type


data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.8 Comparison of case selection algorithms for Multiple features
data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.1 Comparative performance of clustering algorithms . . . . . . 139


6.2 Comparative performance of different clustering methods for
the Calcutta image . . . . . . . . . . . . . . . . . . . . . . . . 144
6.3 Comparative performance of different clustering methods for
the Bombay image . . . . . . . . . . . . . . . . . . . . . . . . 146

7.1 Comparison of RSOM with randomly and linearly initialized


SOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.2 Comparison of rules extracted from RSOM and FSOM . . . . 162

8.1 Rough set dependency rules for Vowel data along with the input
fuzzification parameter values . . . . . . . . . . . . . . . . . . 191
8.2 Comparative performance of different models . . . . . . . . . 193
8.3 Comparison of the performance of the rules extracted by vari-
ous methods for Vowel, Pat and Hepatobiliary data . . . . . . 195
8.4 Rules extracted from trained networks (Model S) for Vowel
data along with the input fuzzification parameter values . . . 196
8.5 Rules extracted from trained networks (Model S) for Pat data
along with the input fuzzification parameter values . . . . . . 196
8.6 Rules extracted from trained networks (Model S) for Hepato-
biliary data along with the input fuzzification parameter values 197
8.7 Crude rules obtained via rough set theory for staging of cervical
cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.8 Rules extracted from the modular rough MLP for staging of
cervical cancer . . . . . . . . . . . . . . . . . . . . . . . . . . 199

© 2004 by Taylor & Francis Group, LLC


List of Figures

1.1 The KDD process [189]. . . . . . . . . . . . . . . . . . . . . . 9


1.2 Application areas of data mining. . . . . . . . . . . . . . . . . 13

2.1 Multiresolution data reduction. . . . . . . . . . . . . . . . . . 31


2.2 Representation of data set at different levels of detail by the
condensed sets. ‘.’ is a point belonging to the condensed set;
the circles about the points denote the discs covered that point.
The two bold circles denote the boundaries of the data set. . 36
2.3 Plot of the condensed points (of the Norm data) for the mul-
tiscale algorithm and Astrahan’s method, for different sizes of
the condensed set. Bold dots represent a selected point and the
discs represent the area of F1 − F2 plane covered by a selected
point at their center. . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 IRS images of Calcutta: (a) original Band 4 image, and seg-
mented images using (b) k-means algorithm, (c) Astrahan’s
method, (d) multiscale algorithm. . . . . . . . . . . . . . . . . 50
2.5 Variation in error in density estimate (log-likelihood measure)
with the size of the Condensed Set (expressed as percentage of
the original set) with the corresponding, for (a) the Norm data,
(b) Vowel data, (c) Wisconsin Cancer data. . . . . . . . . . . 53
2.6 Variation of condensation ratio CR (%) with k. . . . . . . . . 54

3.1 Nature of errors in linear regression, (a) Least square fit (e),


(b) Least square projection fit (λ2 ). . . . . . . . . . . . . . . . 68
3.2 Feature clusters. . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3 Variation in classification accuracy with size of the reduced
subset for (a) Multiple features, (b) Ionosphere, and (c) Cancer
data sets. The vertical dotted line marks the point for which
results are reported in Tables 3.1−3.3. . . . . . . . . . . . . . 78
3.4 Variation in size of the reduced subset with parameter k for (a)
multiple features, (b) ionosphere, and (c) cancer data. . . . . 81

4.1 SVM as maximum margin classifier (linearly separable case). 87


4.2 Incremental support vector learning with multiple points (Al-
gorithm 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3 Active support vector learning with statistical queries (Algo-
rithm 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

xxvii
© 2004 by Taylor & Francis Group, LLC
xxviii

4.4 Variation of atest with CPU time for (a) cancer, (b) ionosphere,
(c) heart, (d) twonorm, and (e) forest cover type data. . . . . 98
4.5 Variation of confidence factor c and distance D for (a) cancer,
(b) ionosphere, (c) heart, and (d) twonorm data. . . . . . . . 99
4.6 Variation of confidence factor c with iterations of StatQSVM
algorithm for (a) cancer, (b) ionosphere, (c) heart, and (d)
twonorm data. . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.7 Margin distribution obtained at each iteration by the StatQSVM
algorithm for the Twonorm data. The bold line denotes the fi-
nal distribution obtained. . . . . . . . . . . . . . . . . . . . . 101
4.8 Margin distribution obtained by some SVM design algorithms
for the Twonorm data set. . . . . . . . . . . . . . . . . . . . . 102

5.1 Rough representation of a set with upper and lower approxi-


mations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 π−Membership functions for linguistic fuzzy sets low (L), medium
(M) and high (H) for each feature axis. . . . . . . . . . . . . . 114
5.3 Generation of crisp granules from linguistic (fuzzy) represen-
tation of the features F1 and F2 . Dark region (M1 , M2 ) indi-
cates a crisp granule obtained by 0.5-cuts on the µ1medium and
µ2medium functions. . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4 Schematic diagram of rough-fuzzy case generation. . . . . . . 116
5.5 Rough-fuzzy case generation for a two-dimensional data. . . . 119

6.1 Rough-fuzzy generation of crude clusters for two-dimensional


data (a) data distribution and rough set rules, (b) probability
density function for the initial mixture model. . . . . . . . . . 133
6.2 Using minimal spanning tree to form clusters. . . . . . . . . . 134
6.3 Scatter plot of the artificial data Pat. . . . . . . . . . . . . . . 137
6.4 Scatter plot of points belonging to four different component
Gaussians for the Pat data. Each Gaussian is represented by a
separate symbol (+, o, , and ). . . . . . . . . . . . . . . . . 138
6.5 Variation of log-likelihood with EM iterations for the Pat data. 138
6.6 Final clusters obtained using (a) hybrid algorithm (b)
k-means algorithm for the Pat data (clusters are marked by
‘+’ and ‘o’). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.7 Block diagram of the image segmentation algorithm. . . . . . 142
6.8 Segmented IRS image of Calcutta using (a) CEMMiSTRI, (b)
EM with MST (EMMST), (c) fuzzy k-means algorithm (FKM),
(d) rough set initialized EM (REM), (e) EM with k-means ini-
tialization (KMEM), (f) rough set initialized k-means (RKM),
(g) EM with random initialization (EM), (h) k-means with ran-
dom initialization (KM). . . . . . . . . . . . . . . . . . . . . . 145
6.9 Segmented IRS image of Bombay using (a) CEMMiSTRI, (b)
k-means with random initialization (KM). . . . . . . . . . . . 146

© 2004 by Taylor & Francis Group, LLC


xxix

6.10 Zoomed images of a bridge on the river Ganges in Calcutta


for (a) CEMMiSTRI, (b) k-means with random initialization
(KM). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.11 Zoomed images of two parallel airstrips of Calcutta airport
for (a) CEMMiSTRI, (b) k-means with random initialization
(KM). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.1 The basic network structure for the Kohonen feature map. . . 151
7.2 Neighborhood Nc , centered on unit c (xc , yc ). Three different
neighborhoods are shown at distance d = 1, 2, and 3. . . . . . 153
7.3 Mapping of reducts in the competitive layer of RSOM. . . . . 154
7.4 Variation of quantization error with iteration for Pat data. . . 160
7.5 Variation of quantization error with iteration for vowel data. 160
7.6 Plot showing the frequency of winning nodes using random
weights for the Pat data. . . . . . . . . . . . . . . . . . . . . . 161
7.7 Plot showing the frequency of winning nodes using rough set
knowledge for the Pat data. . . . . . . . . . . . . . . . . . . . 161

8.1 Illustration of adaptive thresholding of membership functions. 177


8.2 Intra- and inter-module links. . . . . . . . . . . . . . . . . . . 179
8.3 Steps for designing a sample modular rough-fuzzy MLP. . . . 181
8.4 Chromosome representation. . . . . . . . . . . . . . . . . . . . 182
8.5 Variation of mutation probability with iteration. . . . . . . . 183
8.6 Variation of mutation probability along the encoded string (chro-
mosome). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
8.7 (a) Input π-functions and (b) data distribution along F1 axis
for the Vowel data. Solid lines represent the initial functions
and dashed lines represent the functions obtained finally after
tuning with GAs. The horizontal dotted lines represent the
threshold level. . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.8 Histogram plot of the distribution of weight values with (a)
Model S and (b) Model F for Vowel data. . . . . . . . . . . . 194
8.9 Positive connectivity of the network obtained for the Vowel
data, using Model S. (Bold lines indicate weights greater than
P T hres2 , while others indicate values between P T hres1 and
P T hres2 .) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

© 2004 by Taylor & Francis Group, LLC


Chapter 1
Introduction

1.1 Introduction
Pattern recognition (PR) is an activity that we humans normally excel
in. We do it almost all the time, and without conscious effort. We receive
information via our various sensory organs, which is processed instantaneously
by our brain so that, almost immediately, we are able to identify the source
of the information, without having made any perceptible effort. What is
even more impressive is the accuracy with which we can perform recognition
tasks even under non-ideal conditions, for instance, when the information that
needs to be processed is vague, imprecise or even incomplete. In fact, most
of our day-to-day activities are based on our success in performing various
pattern recognition tasks. For example, when we read a book, we recognize the
letters, words and, ultimately, concepts and notions, from the visual signals
received by our brain, which processes them speedily and probably does a
neurobiological implementation of template-matching! [189]
The discipline of pattern recognition (or pattern recognition by machine)
essentially deals with the problem of developing algorithms and methodolo-
gies/devices that can enable the computer-implementation of many of the
recognition tasks that humans normally perform. The motivation is to per-
form these tasks more accurately, or faster, and perhaps more economically
than humans and, in many cases, to release them from drudgery resulting from
performing routine recognition tasks repetitively and mechanically. The scope
of PR also encompasses tasks humans are not good at, such as reading bar
codes. The goal of pattern recognition research is to devise ways and means of
automating certain decision-making processes that lead to classification and
recognition.
Machine recognition of patterns can be viewed as a two-fold task, consisting
of learning the invariant and common properties of a set of samples charac-
terizing a class, and of deciding that a new sample is a possible member of
the class by noting that it has properties common to those of the set of sam-
ples. The task of pattern recognition by a computer can be described as a
transformation from the measurement space M to the feature space F and
finally to the decision space D; i.e.,
M → F → D.

1
© 2004 by Taylor & Francis Group, LLC
2 Pattern Recognition Algorithms for Data Mining

Here the mapping δ : F → D is the decision function, and the elements


d ∈ D are termed as decisions.
PR has been a thriving field of research for the past few decades, as is amply
borne out by the numerous books [55, 59, 72, 200, 204, 206] devoted to it.
In this regard, mention must be made of the seminal article by Kanal [104],
which gives a comprehensive review of the advances made in the field until
the early 1970s. More recently, a review article by Jain et al. [101] provides
an engrossing survey of the advances made in statistical pattern recognition
till the end of the twentieth century. Though the subject has attained a
very mature level during the past four decades or so, it remains green to the
researchers due to continuous cross-fertilization of ideas from disciplines such
as computer science, physics, neurobiology, psychology, engineering, statistics,
mathematics and cognitive science. Depending on the practical need and
demand, various modern methodologies have come into being, which often
supplement the classical techniques [189].
In recent years, the rapid advances made in computer technology have en-
sured that large sections of the world population have been able to gain easy
access to computers on account of falling costs worldwide, and their use is now
commonplace in all walks of life. Government agencies and scientific, busi-
ness and commercial organizations are routinely using computers, not just
for computational purposes but also for storage, in massive databases, of the
immense volumes of data that they routinely generate or require from other
sources. Large-scale computer networking has ensured that such data has
become accessible to more and more people. In other words, we are in the
midst of an information explosion, and there is urgent need for methodologies
that will help us bring some semblance of order into the phenomenal volumes
of data that can readily be accessed by us with a few clicks of the keys of our
computer keyboard. Traditional statistical data summarization and database
management techniques are just not adequate for handling data on this scale
and for intelligently extracting information, or rather, knowledge that may
be useful for exploring the domain in question or the phenomena responsible
for the data, and providing support to decision-making processes. This quest
has thrown up some new phrases, for example, data mining and knowledge
discovery in databases (KDD) [43, 65, 66, 88, 89, 92].
The massive databases that we are talking about are generally character-
ized by the presence of not just numeric, but also textual, symbolic, pictorial
and aural data. They may contain redundancy, errors, imprecision, and so on.
KDD is aimed at discovering natural structures within such massive and often
heterogeneous data. Therefore PR plays a significant role in KDD process.
However, KDD is visualized as being capable not only of knowledge discovery
using generalizations and magnifications of existing and new pattern recogni-
tion algorithms, but also of the adaptation of these algorithms to enable them
to process such data, the storage and accessing of the data, its preprocessing
and cleaning, interpretation, visualization and application of the results, and
the modeling and support of the overall human-machine interaction.

© 2004 by Taylor & Francis Group, LLC


Introduction 3

Data mining is that part of knowledge discovery which deals with the pro-
cess of identifying valid, novel, potentially useful, and ultimately understand-
able patterns in data, and excludes the knowledge interpretation part of KDD.
Therefore, as it stands now, data mining can be viewed as applying PR and
machine learning principles in the context of voluminous, possibly heteroge-
neous data sets [189].
The objective of this book is to provide some results of investigations,
both theoretical and experimental, addressing certain pattern recognition
tasks essential for data mining. Tasks considered include data condensation,
feature selection, case generation, clustering, classification and rule genera-
tion/evaluation. Various methodologies based on both classical and soft com-
puting approaches (integrating fuzzy logic, artificial neural networks, rough
sets, genetic algorithms) have been presented. The emphasis of these method-
ologies is given on (a) handling data sets which are large (both in size and
dimension) and involve classes that are overlapping, intractable and/or having
nonlinear boundaries, and (b) demonstrating the significance of granular com-
puting in soft computing paradigm for generating linguistic rules and dealing
with the knowledge discovery aspect. Before we describe the scope of the
book, we provide a brief review of pattern recognition, knowledge discovery
in data bases, data mining, challenges in application of pattern recognition
algorithms to data mining problems, and some of the possible solutions.
Section 1.2 presents a description of the basic concept, features and tech-
niques of pattern recognition briefly. Next, we define the KDD process and
describe its various components. In Section 1.4 we elaborate upon the data
mining aspects of KDD, discussing its components, tasks involved, approaches
and application areas. The pattern recognition perspective of data mining is
introduced next and related research challenges are mentioned. The problem
of scaling pattern recognition algorithms to large data sets is discussed in Sec-
tion 1.6. Some broad approaches to achieving scalability are listed. The role
of soft computing in knowledge discovery is described in Section 1.7. Finally,
Section 1.8 discusses the plan of the book.

1.2 Pattern Recognition in Brief


A typical pattern recognition system consists of three phases, namely, data
acquisition, feature selection/extraction and classification/clustering. In the
data acquisition phase, depending on the environment within which the ob-
jects are to be classified/clustered, data are gathered using a set of sensors.
These are then passed on to the feature selection/extraction phase, where
the dimensionality of the data is reduced by retaining/measuring only some
characteristic features or properties. In a broader perspective, this stage

© 2004 by Taylor & Francis Group, LLC


4 Pattern Recognition Algorithms for Data Mining

significantly influences the entire recognition process. Finally, in the clas-


sification/clustering phase, the selected/extracted features are passed on to
the classifying/clustering system that evaluates the incoming information and
makes a final decision. This phase basically establishes a transformation be-
tween the features and the classes/clusters. Different forms of transformation
can be a Bayesian rule of computing a posterior class probabilities, nearest
neighbor rule, linear discriminant functions, perceptron rule, nearest proto-
type rule, etc. [55, 59].

1.2.1 Data acquisition


Pattern recognition techniques are applicable in a wide domain, where the
data may be qualitative, quantitative, or both; they may be numerical, linguis-
tic, pictorial, or any combination thereof. The collection of data constitutes
the data acquisition phase. Generally, the data structures that are used in
pattern recognition systems are of two types: object data vectors and relational
data. Object data, a set of numerical vectors, are represented in the sequel
as Y = {y1 , y2 , . . . , yn }, a set of n feature vectors in the p-dimensional mea-
surement space ΩY . An sth object, s = 1, 2, . . . , n, observed in the process
has vector ys as its numerical representation; ysi is the ith (i = 1, 2, . . . , p)
feature value associated with the sth object. Relational data is a set of n2
numerical relationships, say {rsq }, between pairs of objects. In other words,
rsq represents the extent to which sth and qth objects are related in the sense
of some binary relationship ρ. If the objects that are pairwise related by ρ
are called O = {o1 , o2 , . . . , on }, then ρ : O × O → IR.

1.2.2 Feature selection/extraction


Feature selection/extraction is a process of selecting a map of the form
X = f (Y ), by which a sample y (=[y1 , y2 , . . . , yp ]) in a p-dimensional mea-
surement space ΩY is transformed into a point x (=[x1 , x2 , . . . , xp ]) in a p -
dimensional feature space ΩX , where p < p. The main objective of this task
[55] is to retain/generate the optimum salient characteristics necessary for
the recognition process and to reduce the dimensionality of the measurement
space ΩY so that effective and easily computable algorithms can be devised
for efficient classification. The problem of feature selection/extraction has
two aspects – formulation of a suitable criterion to evaluate the goodness
of a feature set and searching the optimal set in terms of the criterion. In
general, those features are considered to have optimal saliencies for which
interclass/intraclass distances are maximized/minimized. The criterion of a
good feature is that it should be unchanging with any other possible variation
within a class, while emphasizing differences that are important in discrimi-
nating between patterns of different types.
The major mathematical measures so far devised for the estimation of fea-
ture quality are mostly statistical in nature, and can be broadly classified into

© 2004 by Taylor & Francis Group, LLC


Introduction 5

two categories – feature selection in the measurement space and feature selec-
tion in a transformed space. The techniques in the first category generally
reduce the dimensionality of the measurement space by discarding redundant
or least information carrying features. On the other hand, those in the sec-
ond category utilize all the information contained in the measurement space
to obtain a new transformed space, thereby mapping a higher dimensional
pattern to a lower dimensional one. This is referred to as feature extraction.

1.2.3 Classification
The problem of classification is basically one of partitioning the feature
space into regions, one region for each category of input. Thus it attempts to
assign every data point in the entire feature space to one of the possible classes
(say, M ) . In real life, the complete description of the classes is not known.
We have instead a finite and usually smaller number of samples which often
provides partial information for optimal design of feature selector/extractor
or classifying/clustering system. Under such circumstances, it is assumed that
these samples are representative of the classes. Such a set of typical patterns
is called a training set. On the basis of the information gathered from the
samples in the training set, the pattern recognition systems are designed; i.e.,
we decide the values of the parameters of various pattern recognition methods.
Design of a classification or clustering scheme can be made with labeled or
unlabeled data. When the computer is given a set of objects with known
classifications (i.e., labels) and is asked to classify an unknown object based
on the information acquired by it during training, we call the design scheme
supervised learning; otherwise we call it unsupervised learning. Supervised
learning is used for classifying different objects, while clustering is performed
through unsupervised learning.
Pattern classification, by its nature, admits many approaches, sometimes
complementary, sometimes competing, to provide solution of a given problem.
These include decision theoretic approach (both deterministic and probabilis-
tic), syntactic approach, connectionist approach, fuzzy and rough set theoretic
approach and hybrid or soft computing approach.
In the decision theoretic approach, once a pattern is transformed, through
feature evaluation, to a vector in the feature space, its characteristics are ex-
pressed only by a set of numerical values. Classification can be done by using
deterministic or probabilistic techniques [55, 59]. In deterministic classifica-
tion approach, it is assumed that there exists only one unambiguous pattern
class corresponding to each of the unknown pattern vectors. Nearest neighbor
classifier (NN rule) [59] is an example of this category.
In most of the practical problems, the features are usually noisy and the
classes in the feature space are overlapping. In order to model such systems,
the features x1 , x2 , . . . , xi , . . . , xp are considered as random variables in the
probabilistic approach. The most commonly used classifier in such probabilis-
tic systems is the Bayes maximum likelihood classifier [59].

© 2004 by Taylor & Francis Group, LLC


6 Pattern Recognition Algorithms for Data Mining

When a pattern is rich in structural information (e.g., picture recognition,


character recognition, scene analysis), i.e., the structural information plays an
important role in describing and recognizing the patterns, it is convenient to
use syntactic approaches [72] which deal with the representation of structures
via sentences, grammars and automata. In the syntactic method [72], the
ability of selecting and classifying the simple pattern primitives and their
relationships represented by the composition operations is the vital criterion
of making a system effective. Since the techniques of composition of primitives
into patterns are usually governed by the formal language theory, the approach
is often referred to as a linguistic approach. An introduction to a variety of
approaches based on this idea can be found in [72].
A good pattern recognition system should possess several characteristics.
These are on-line adaptation (to cope with the changes in the environment),
handling nonlinear class separability (to tackle real life problems), handling
of overlapping classes/clusters (for discriminating almost similar but different
objects), real-time processing (for making a decision in a reasonable time),
generation of soft and hard decisions (to make the system flexible), verification
and validation mechanisms (for evaluating its performance), and minimizing
the number of parameters in the system that have to be tuned (for reducing
the cost and complexity). Moreover, the system should be made artificially
intelligent in order to emulate some aspects of the human processing system.
Connectionist approaches (or artificial neural network based approaches) to
pattern recognition are attempts to achieve these goals and have drawn the
attention of researchers because of their major characteristics such as adap-
tivity, robustness/ruggedness, speed and optimality.
All these approaches to pattern recognition can again be fuzzy set theo-
retic [24, 105, 200, 285] in order to handle uncertainties, arising from vague,
incomplete, linguistic, overlapping patterns, etc., at various stages of pattern
recognition systems. Fuzzy set theoretic classification approach is developed
based on the realization that a pattern may belong to more than one class,
with varying degrees of class membership. Accordingly, fuzzy decision theo-
retic, fuzzy syntactic, fuzzy neural approaches are developed [24, 34, 200, 204].
More recently, the theory of rough sets [209, 214, 215, 261] has emerged as
another major mathematical approach for managing uncertainty that arises
from inexact, noisy, or incomplete information. It is turning out to be method-
ologically significant to the domains of artificial intelligence and cognitive sci-
ences, especially in the representation of and reasoning with vague and/or
imprecise knowledge, data classification, data analysis, machine learning, and
knowledge discovery [227, 261].
Investigations have also been made in the area of pattern recognition using
genetic algorithms [211]. Like neural networks, genetic algorithms (GAs) [80]
are also based on powerful metaphors from the natural world. They mimic
some of the processes observed in natural evolution, which include cross-over,
selection and mutation, leading to a stepwise optimization of organisms.
There have been several attempts over the last decade to evolve new ap-

© 2004 by Taylor & Francis Group, LLC


Introduction 7

proaches to pattern recognition and deriving their hybrids by judiciously com-


bining the merits of several techniques [190, 204]. Recently, a consolidated
effort is being made in this regard to integrate mainly fuzzy logic, artificial
neural networks, genetic algorithms and rough set theory, for developing an
efficient new paradigm called soft computing [287]. Here integration is done in
a cooperative, rather than a competitive, manner. The result is a more intel-
ligent and robust system providing a human-interpretable, low cost, approxi-
mate solution, as compared to traditional techniques. Neuro-fuzzy approach
is perhaps the most visible hybrid paradigm [197, 204, 287] in soft computing
framework. Rough-fuzzy [209, 265] and neuro-rough [264, 207] hybridizations
are also proving to be fruitful frameworks for modeling human perceptions
and providing means for computing with words. Significance of the recently
proposed computational theory of perceptions (CTP) [191, 289] may also be
mentioned in this regard.

1.3 Knowledge Discovery in Databases (KDD)


Knowledge discovery in databases (KDD) is defined as [65]:
The nontrivial process of identifying valid, novel, potentially use-
ful, and ultimately understandable patterns in data.
In this definition, the term pattern goes beyond its traditional sense to in-
clude models or structure in data. Data is a set of facts F (e.g., cases in a
database), and a pattern is an expression E in a language L describing the
facts in a subset FE (or a model applicable to that subset) of F . E is called a
pattern if it is simpler than the enumeration of all facts in FE . A measure of
certainty, measuring the validity of discovered patterns, is a function C map-
ping expressions in L to a partially or totally ordered measure space MC . An
expression E in L about a subset FE ⊂ F can be assigned a certainty measure
c = C(E, F ). Novelty of patterns can be measured by a function N (E, F ) with
respect to changes in data or knowledge. Patterns should potentially lead to
some useful actions, as measured by some utility function u = U (E, F ) map-
ping expressions in L to a partially or totally ordered measure space MU . The
goal of KDD is to make patterns understandable to humans. This is measured
by a function s = S(E, F ) mapping expressions E in L to a partially or totally
ordered measure space MS .
Interestingness of a pattern combines validity, novelty, usefulness, and un-
derstandability and can be expressed as i = I(E, F, C, N, U, S) which maps
expressions in L to a measure space MI . A pattern E ∈ L is called knowledge
if for some user-specified threshold i ∈ MI , I(E, F, C, N, U, S) > i [65]. One
can select some thresholds c ∈ MC , s ∈ MS , and u ∈ Mu and term a pattern
E knowledge

© 2004 by Taylor & Francis Group, LLC


8 Pattern Recognition Algorithms for Data Mining

iff C(E, F ) > c, and S(E, F ) > s, and U (E, F ) > u. (1.1)


The role of interestingness is to threshold the huge number of discovered
patterns and report only those that may be of some use. There are two ap-
proaches to designing a measure of interestingness of a pattern, viz., objective
and subjective. The former uses the structure of the pattern and is generally
used for computing rule interestingness. However, often it fails to capture all
the complexities of the pattern discovery process. The subjective approach,
on the other hand, depends additionally on the user who examines the pat-
tern. Two major reasons why a pattern is interesting from the subjective
(user-oriented) point of view are as follow [257]:

• Unexpectedness: when it is “surprising” to the user.

• Actionability: when the user can act on it to her/his advantage.

Although both these concepts are important, it has often been observed that
actionability and unexpectedness are correlated. In literature, unexpectedness
is often defined in terms of the dissimilarity of a discovered pattern from a
vocabulary provided by the user.
As an example, consider a database of student evaluations of different
courses offered at some university. This can be defined as EVALUATE (TERM,
YEAR, COURSE, SECTION, INSTRUCTOR, INSTRUCT RATING, COURSE RATING). We
describe two patterns that are interesting in terms of actionability and unex-
pectedness respectively. The pattern that “Professor X is consistently getting
the overall INSTRUCT RATING below the overall COURSE RATING” can be of in-
terest to the chairperson because this shows that Professor X has room for
improvement. If, on the other hand, in most of the course evaluations the
overall INSTRUCT RATING is higher than the COURSE RATING and it turns out
that in most of Professor X’s ratings overall the INSTRUCT RATING is lower
than the COURSE RATING, then such a pattern is unexpected and hence inter-
esting. ✸
Data mining is a step in the KDD process that consists of applying data
analysis and discovery algorithms which, under acceptable computational lim-
itations, produce a particular enumeration of patterns (or generate a model)
over the data. It uses historical information to discover regularities and im-
prove future decisions [161].
The overall KDD process is outlined in Figure 1.1. It is interactive and
iterative involving, more or less, the following steps [65, 66]:

1. Data cleaning and preprocessing: includes basic operations, such as noise


removal and handling of missing data. Data from real-world sources are
often erroneous, incomplete, and inconsistent, perhaps due to operation
error or system implementation flaws. Such low quality data needs to
be cleaned prior to data mining.

© 2004 by Taylor & Francis Group, LLC


Introduction 9
Data Mining (DM)

. Data
Cleaning

. Data
Condensation
Machine Mathematical
Knowledge
Interpretation
. Dimensionality Preprocessed Learning Model
of Data Useful
Huge Reduction .Classification . Knowledge Knowledge
Hetero-
geneous
Data
. Clustering (Patterns) Extraction
Raw . Rule . Knowledge
Data
. Data Generation Evaluation
Wrapping

Knowledge Discovery in Database (KDD)

FIGURE 1.1: The KDD process [189].

2. Data condensation and projection: includes finding useful features and


samples to represent the data (depending on the goal of the task) and
using dimensionality reduction or transformation methods.

3. Data integration and wrapping: includes integrating multiple, heteroge-


neous data sources and providing their descriptions (wrappings) for ease
of future use.

4. Choosing the data mining function(s) and algorithm(s): includes de-


ciding the purpose (e.g., classification, regression, summarization, clus-
tering, discovering association rules and functional dependencies, or a
combination of these) of the model to be derived by the data mining
algorithm and selecting methods (e.g., neural networks, decision trees,
statistical models, fuzzy models) to be used for searching patterns in
data.

5. Data mining: includes searching for patterns of interest in a particular


representational form or a set of such representations.

6. Interpretation and visualization: includes interpreting the discovered


patterns, as well as the possible visualization of the extracted patterns.
One can analyze the patterns automatically or semiautomatically to
identify the truly interesting/useful patterns for the user.

7. Using discovered knowledge: includes incorporating this knowledge into


the performance system, taking actions based on knowledge.

Thus, KDD refers to the overall process of turning low-level data into high-
level knowledge. Perhaps the most important step in the KDD process is data
mining. However, the other steps are also important for the successful appli-
cation of KDD in practice. For example, steps 1, 2 and 3, mentioned above,

© 2004 by Taylor & Francis Group, LLC


10 Pattern Recognition Algorithms for Data Mining

have been the subject of widespread research in the area of data warehousing.
We now focus on the data mining component of KDD.

1.4 Data Mining


Data mining involves fitting models to or determining patterns from ob-
served data. The fitted models play the role of inferred knowledge. Deciding
whether the model reflects useful knowledge or not is a part of the overall
KDD process for which subjective human judgment is usually required. Typ-
ically, a data mining algorithm constitutes some combination of the following
three components [65].

• The model: The function of the model (e.g., classification, cluster-


ing) and its representational form (e.g., linear discriminants, neural net-
works). A model contains parameters that are to be determined from
the data.

• The preference criterion: A basis for preference of one model or set


of parameters over another, depending on the given data. The criterion
is usually some form of goodness-of-fit function of the model to the
data, perhaps tempered by a smoothing term to avoid overfitting, or
generating a model with too many degrees of freedom to be constrained
by the given data.

• The search algorithm: The specification of an algorithm for find-


ing particular models and parameters, given the data, model(s), and a
preference criterion.

A particular data mining algorithm is usually an instantiation of the model/


preference/search components.

1.4.1 Data mining tasks


The more common model tasks/functions in current data mining practice
include:

1. Association rule discovery: describes association relationship among dif-


ferent attributes. The origin of association rules is in market basket
analysis. A market basket is a collection of items purchased by a cus-
tomer in an individual customer transaction. One common analysis task
in a transaction database is to find sets of items, or itemsets, that fre-
quently appear together. Each pattern extracted through the analysis
consists of an itemset and its support, i.e., the number of transactions

© 2004 by Taylor & Francis Group, LLC


Introduction 11

that contain it. Businesses can use knowledge of these patterns to im-
prove placement of items in a store or for mail-order marketing. The
huge size of transaction databases and the exponential increase in the
number of potential frequent itemsets with increase in the number of at-
tributes (items) make the above problem a challenging one. The a priori
algorithm [3] provided one early solution which was improved by sub-
sequent algorithms using partitioning, hashing, sampling and dynamic
itemset counting.

2. Clustering: maps a data item into one of several clusters, where clusters
are natural groupings of data items based on similarity metrics or prob-
ability density models. Clustering is used in several exploratory data
analysis tasks, customer retention and management, and web mining.
The clustering problem has been studied in many fields, including statis-
tics, machine learning and pattern recognition. However, large data
considerations were absent in these approaches. Recently, several new
algorithms with greater emphasis on scalability have been developed, in-
cluding those based on summarized cluster representation called cluster
feature (Birch [291], ScaleKM [29]), sampling (CURE [84]) and density
joins (DBSCAN [61]).

3. Classification: classifies a data item into one of several predefined cat-


egorical classes. It is used for the purpose of predictive data mining in
several fields, e.g., in scientific discovery, fraud detection, atmospheric
data mining and financial engineering. Several classification methodolo-
gies have already been discussed earlier in Section 1.2.3. Some typical
algorithms suitable for large databases are based on Bayesian techniques
(AutoClass [40]), and decision trees (Sprint [254], RainForest [75]).
4. Sequence analysis [85]: models sequential patterns, like time-series data
[130]. The goal is to model the process of generating the sequence or
to extract and report deviation and trends over time. The framework
is increasingly gaining importance because of its application in bioinfor-
matics and streaming data analysis.
5. Regression [65]: maps a data item to a real-valued prediction variable.
It is used in different prediction and modeling applications.
6. Summarization [65]: provides a compact description for a subset of data.
A simple example would be mean and standard deviation for all fields.
More sophisticated functions involve summary rules, multivariate visu-
alization techniques and functional relationship between variables. Sum-
marization functions are often used in interactive data analysis, auto-
mated report generation and text mining.

7. Dependency modeling [28, 86]: describes significant dependencies among


variables.

© 2004 by Taylor & Francis Group, LLC


12 Pattern Recognition Algorithms for Data Mining

Some other tasks required in some data mining applications are, outlier/
anomaly detection, link analysis, optimization and planning.

1.4.2 Data mining tools


A wide variety and number of data mining algorithms are described in the
literature – from the fields of statistics, pattern recognition, machine learning
and databases. They represent a long list of seemingly unrelated and often
highly specific algorithms. Some representative groups are mentioned below:

1. Statistical models (e.g., linear discriminants [59, 92])

2. Probabilistic graphical dependency models (e.g., Bayesian networks [102])

3. Decision trees and rules (e.g., CART [32])

4. Inductive logic programming based models (e.g., PROGOL [180] and


FOIL [233])

5. Example based methods (e.g., nearest neighbor [7], lazy learning [5] and
case based reasoning [122, 208] methods )

6. Neural network based models [44, 46, 148, 266]

7. Fuzzy set theoretic models [16, 23, 43, 217]

8. Rough set theory based models [137, 123, 227, 176]

9. Genetic algorithm based models [68, 106]

10. Hybrid and soft computing models [175]

The data mining algorithms determine both the flexibility of the model in
representing the data and the interpretability of the model in human terms.
Typically, the more complex models may fit the data better but may also
be more difficult to understand and to fit reliably. Also, each representation
suits some problems better than others. For example, decision tree classifiers
can be very useful for finding structure in high dimensional spaces and are
also useful in problems with mixed continuous and categorical data. However,
they may not be suitable for problems where the true decision boundaries are
nonlinear multivariate functions.

1.4.3 Applications of data mining


A wide range of organizations including business companies, scientific labo-
ratories and governmental departments have deployed successful applications
of data mining. While early adopters of this technology have tended to be
in information-intensive industries such as financial services and direct mail

© 2004 by Taylor & Francis Group, LLC


Introduction 13

Other (11%)
Banking (17%)

Telecom (11%)

Biology/Genetics (8%)

Science Data (8%)

Retail (6%)
eCommerce/Web (15%)

Pharmaceuticals (5%)

Investment/Stocks (4%)
Fraud Detection (8%)
Insurance (6%)

FIGURE 1.2: Application areas of data mining.

marketing, the technology is applicable to any company looking to leverage a


large data warehouse to better manage their operations. Two critical factors
for success with data mining are: a large, well-integrated data warehouse and
a well-defined understanding of the process within which data mining is to be
applied. Several domains where large volumes of data are stored in centralized
or distributed databases include the following.

• Financial Investment: Stock indices and prices, interest rates, credit


card data, fraud detection [151].

• Health Care: Several diagnostic information stored by hospital manage-


ment systems [27].

• Manufacturing and Production: Process optimization and trouble shoot-


ing [94].

• Telecommunication network: Calling patterns and fault management


systems [246].

• Scientific Domain: Astronomical object detection [64], genomic and bi-


ological data mining[15].

• The World Wide Web: Information retrieval, resource location [62, 210].

The results of a recent poll conducted at the www.kdnuggets.com web site


regarding the usage of data mining algorithms in different domains are pre-
sented in Figure 1.2.

© 2004 by Taylor & Francis Group, LLC


14 Pattern Recognition Algorithms for Data Mining

1.5 Different Perspectives of Data Mining


In the previous section we discussed the generic components of a data min-
ing system, common data mining tasks/tools and related principles and issues
that appear in designing a data mining system. At present, the goal of the
KDD community is to develop a unified framework of data mining which
should be able to model typical data mining tasks, be able to discuss the
probabilistic nature of the discovered patterns and models, be able to talk
about data and inductive generalizations of the data, and accept the presence
of different forms of data (relational data, sequences, text, web). Also, the
framework should recognize that data mining is an interactive and iterative
process, where comprehensibility of the discovered knowledge is important
and where the user has to be in the loop [153, 234].
Pattern recognition and machine learning algorithms seem to be the most
suitable candidates for addressing the above tasks. It may be mentioned in this
context that historically the subject of knowledge discovery in databases has
evolved, and continues to evolve, from the intersection of research from such
fields as machine learning, pattern recognition, statistics, databases, artificial
intelligence, reasoning with uncertainties, expert systems, data visualization,
and high-performance computing. KDD systems incorporate theories, algo-
rithms, and methods from all these fields. Therefore, before elaborating the
pattern recognition perspective of data mining, we describe briefly two other
prominent frameworks, namely, the database perspective and the statistical
perspective of data mining.

1.5.1 Database perspective


Since most business data resides in industrial databases and warehouses,
commercial companies view mining as a sophisticated form of database query-
ing [88, 99]. Research based on this perspective seeks to enhance the ex-
pressiveness of query languages (rule query languages, meta queries, query
optimizations), enhance the underlying model of data and DBMSs (the log-
ical model of data, deductive databases, inductive databases, rules, active
databases, semistructured data, etc.) and improve integration with data
warehousing systems (online analytical processing (OLAP), historical data,
meta-data, interactive exploring). The approach also has close links with
search-based perspective of data mining, exemplified by the popular work on
association rules [3] at IBM Almaden.
The database perspective has several advantages including scalability to
large databases present in secondary and tertiary storage, generic nature of
the algorithms (applicability to a wide range of tasks and domains), capability
to handle heterogeneous data, and easy user interaction and visualization of
mined patterns. However, it is still ill-equipped to address the full range of

© 2004 by Taylor & Francis Group, LLC


Introduction 15

knowledge discovery tasks because of its inability to mine complex patterns


and model non-linear relationships (the database models being of limited rich-
ness), unsuitability for exploratory analysis, lack of induction capability, and
restricted scope for evaluating the significance of mined patterns [234].

1.5.2 Statistical perspective


The statistical perspective views data mining as computer automated ex-
ploratory data analysis of (usually) large complex data sets [79, 92]. The
term data mining existed in statistical data analysis literature long before its
current definition in the computer science community. However, the abun-
dance and massiveness of data has provided impetus to development of al-
gorithms which, though rooted in statistics, lays more emphasis on compu-
tational efficiency. Presently, statistical tools are used in all the KDD tasks
like preprocessing (sampling, outlier detection, experimental design), data
modeling (clustering, expectation maximization, decision trees, regression,
canonical correlation etc), model selection, evaluation and averaging (robust
statistics, hypothesis testing) and visualization (principal component analysis,
Sammon’s mapping).
The advantages of the statistical approach are its solid theoretical back-
ground, and ease of posing formal questions. Tasks such as classification and
clustering fit easily into this approach. What seems to be lacking are ways
for taking into account the iterative and interactive nature of the data min-
ing process. Also scalability of the methods to very large, especially tertiary
memory data, is still not fully achieved.

1.5.3 Pattern recognition perspective


At present, pattern recognition and machine learning provide the most
fruitful framework for data mining [109, 161]. Not only do they provide
a wide range of models (linear/non-linear, comprehensible/complex, predic-
tive/descriptive, instance/rule based) for data mining tasks (clustering, clas-
sification, rule discovery), methods for modeling uncertainties (probabilistic,
fuzzy) in the discovered patterns also form part of PR research. Another
aspect that makes pattern recognition algorithms attractive for data mining
is their capability of learning or induction. As opposed to many statisti-
cal techniques that require the user to have a hypothesis in mind first, PR
algorithms automatically analyze data and identify relationships among at-
tributes and entities in the data to build models that allow domain experts
to understand the relationship between the attributes and the class. Data
preprocessing tasks like instance selection, data cleaning, dimensionality re-
duction, handling missing data are also extensively studied in pattern recog-
nition framework. Besides these, other data mining issues addressed by PR
methodologies include handling of relational, sequential and symbolic data
(syntactic PR, PR in arbitrary metric spaces), human interaction (knowledge

© 2004 by Taylor & Francis Group, LLC


16 Pattern Recognition Algorithms for Data Mining

encoding and extraction), knowledge evaluation (description length principle)


and visualization.
Pattern recognition is at the core of data mining systems. However, pat-
tern recognition and data mining are not equivalent considering their original
definitions. There exists a gap between the requirements of a data mining
system and the goals achieved by present day pattern recognition algorithms.
Development of new generation PR algorithms is expected to encompass more
massive data sets involving diverse sources and types of data that will sup-
port mixed-initiative data mining, where human experts collaborate with the
computer to form hypotheses and test them. The main challenges to PR as a
unified framework for data mining are mentioned below.

1.5.4 Research issues and challenges


1. Massive data sets and high dimensionality. Huge data sets create combi-
natorially explosive search spaces for model induction which may make
the process of extracting patterns infeasible owing to space and time
constraints. They also increase the chances that a data mining algo-
rithm will find spurious patterns that are not generally valid.
2. Overfitting and assessing the statistical significance. Data sets used for
mining are usually huge and available from distributed sources. As a
result, often the presence of spurious data points leads to overfitting of
the models. Regularization and resampling methodologies need to be
emphasized for model design.
3. Management of changing data and knowledge. Rapidly changing data,
in a database that is modified/deleted/augmented, may make the previ-
ously discovered patterns invalid. Possible solutions include incremental
methods for updating the patterns.
4. User interaction and prior knowledge. Data mining is inherently an
interactive and iterative process. Users may interact at various stages,
and domain knowledge may be used either in the form of a high level
specification of the model, or at a more detailed level. Visualization of
the extracted model is also desirable.
5. Understandability of patterns. It is necessary to make the discoveries
more understandable to humans. Possible solutions include rule struc-
turing, natural language representation, and the visualization of data
and knowledge.

6. Nonstandard and incomplete data. The data can be missing and/or


noisy.

7. Mixed media data. Learning from data that is represented by a combi-


nation of various media, like (say) numeric, symbolic, images and text.

© 2004 by Taylor & Francis Group, LLC


Introduction 17

8. Integration. Data mining tools are often only a part of the entire decision
making system. It is desirable that they integrate smoothly, both with
the database and the final decision-making procedure.

In the next section we discuss the issues related to the large size of the data
sets in more detail.

1.6 Scaling Pattern Recognition Algorithms to Large


Data Sets
Organizations are amassing very large repositories of customer, operations,
scientific and other sorts of data of gigabytes or even terabytes size. KDD
practitioners would like to be able to apply pattern recognition and machine
learning algorithms to these large data sets in order to discover useful knowl-
edge. The question of scalability asks whether the algorithm can process large
data sets efficiently, while building from them the best possible models.
From the point of view of complexity analysis, for most scaling problems
the limiting factor of the data set has been the number of examples and
their dimension. A large number of examples introduces potential problems
with both time and space complexity. For time complexity, the appropriate
algorithmic question is what is the growth rate of the algorithm’s run time as
the number of examples and their dimensions increase? As may be expected,
time-complexity analysis does not tell the whole story. As the number of
instances grows, space constraints become critical, since, almost all existing
implementations of a learning algorithm operate with training set entirely in
main memory. Finally, the goal of a learning algorithm must be considered.
Evaluating the effectiveness of a scaling technique becomes complicated if
degradation in the quality of the learning is permitted. Effectiveness of a
technique for scaling pattern recognition/learning algorithms is measured in
terms of the above three factors, namely, time complexity, space complexity
and quality of learning.
Many diverse techniques, both general and task specific, have been proposed
and implemented for scaling up learning algorithms. An excellent survey of
these methods is provided in [230]. We discuss here some of the broad cat-
egories relevant to the book. Besides these, other hardware-driven (parallel
processing, distributed computing) and database-driven (relational represen-
tation) methodologies are equally effective.

1.6.1 Data reduction


The simplest approach for coping with the infeasibility of learning from
a very large data set is to learn from a reduced/condensed representation

© 2004 by Taylor & Francis Group, LLC


18 Pattern Recognition Algorithms for Data Mining

of the original massive data set [18]. The reduced representation should be
as faithful to the original data as possible, for its effective use in different
mining tasks. At present the following categories of reduced representations
are mainly used:
• Sampling/instance selection: Various random, deterministic and den-
sity biased sampling strategies exist in statistics literature. Their use
in machine learning and data mining tasks has also been widely stud-
ied [37, 114, 142]. Note that merely generating a random sample from
a large database stored on disk may itself be a non-trivial task from
a computational viewpoint. Several aspects of instance selection, e.g.,
instance representation, selection of interior/boundary points, and in-
stance pruning strategies, have also been investigated in instance-based
and nearest neighbor classification frameworks [279]. Challenges in de-
signing an instance selection algorithm include accurate representation
of the original data distribution, making fine distinctions at different
scales and noticing rare events and anomalies.
• Data squashing: It is a form of lossy compression where a large data
set is replaced by a small data set and some accompanying quantities,
while attempting to preserve its statistical information [60].
• Indexing data structures: Systems such as kd-trees [22], R-trees, hash
tables, AD-trees, multiresolution kd-trees [54] and cluster feature (CF)-
trees [29] partition the data (or feature space) into buckets recursively,
and store enough information regarding the data in the bucket so that
many mining queries and learning tasks can be achieved in constant or
linear time.
• Frequent itemsets: They are often applied in supermarket data analysis
and require that the attributes are sparsely valued [3].
• DataCubes: Use a relational aggregation database operator to represent
chunks of data [82].
The last four techniques fall into the general class of representation called
cached sufficient statistics [177]. These are summary data structures that lie
between the statistical algorithms and the database, intercepting the kinds of
operations that have the potential to consume large time if they were answered
by direct reading of the data set. Case-based reasoning [122] also involves a
related approach where salient instances (or descriptions) are either selected
or constructed and stored in the case base for later use.

1.6.2 Dimensionality reduction


An important problem related to mining large data sets, both in dimension
and size, is of selecting a subset of the original features [141]. Preprocess-
ing the data to obtain a smaller set of representative features, retaining the

© 2004 by Taylor & Francis Group, LLC


Introduction 19

optimal/salient characteristics of the data, not only decreases the processing


time but also leads to more compactness of the models learned and better
generalization.
Dimensionality reduction can be done in two ways, namely, feature selec-
tion and feature extraction. As mentioned in Section 1.2.2 feature selection
refers to reducing the dimensionality of the measurement space by discarding
redundant or least information carrying features. Different methods based
on indices like divergence, Mahalanobis distance, Bhattacharya coefficient are
available in [30]. On the other hand, feature extraction methods utilize all the
information contained in the measurement space to obtain a new transformed
space, thereby mapping a higher dimensional pattern to a lower dimensional
one. The transformation may be either linear, e.g., principal component anal-
ysis (PCA) or nonlinear, e.g., Sammon’s mapping, multidimensional scaling.
Methods in soft computing using neural networks, fuzzy sets, rough sets and
evolutionary algorithms have also been reported for both feature selection and
extraction in supervised and unsupervised frameworks. Some other methods
including those based on Markov blankets [121], wrapper approach [117], and
Relief [113], which are applicable to data sets with large size and dimension,
have been explained in Section 3.3.

1.6.3 Active learning


Traditional machine learning algorithms deal with input data consisting
of independent and identically distributed (iid) samples. In this framework,
the number of samples required (sample complexity) by a class of learning
algorithms to achieve a specified accuracy can be theoretically determined [19,
275]. In practice, as the amount of data grows, the increase in accuracy slows,
forming the learning curve. One can hope to avoid this slow-down in learning
by employing selection methods for sifting through the additional examples
and filtering out a small non-iid set of relevant examples that contain essential
information. Formally, active learning studies the closed-loop phenomenon of
a learner selecting actions or making queries that influence what data are
added to its training set. When actions/queries are selected properly, the
sample complexity for some problems decreases drastically, and some NP-
hard learning problems become polynomial in computation time [10, 45].

1.6.4 Data partitioning


Another approach to scaling up is to partition the data, avoiding the need
to run algorithms on very large data sets. The models learned from individ-
ual partitions are then combined to obtain the final ensemble model. Data
partitioning techniques can be categorized based on whether they process sub-
sets sequentially or concurrently. Several model combination strategies also
exist in literature [77], including boosting, bagging, ARCing classifiers, com-
mittee machines, voting classifiers, mixture of experts, stacked generalization,

© 2004 by Taylor & Francis Group, LLC


20 Pattern Recognition Algorithms for Data Mining

Bayesian sampling, statistical techniques and soft computing methods. The


problems of feature partitioning and modular task decomposition for achiev-
ing computational efficiency have also been studied.

1.6.5 Granular computing


Granular computing (GrC) may be regarded as a unified framework for the-
ories, methodologies and techniques that make use of granules (i.e., groups,
classes or clusters of objects in a universe) in the process of problem solv-
ing. In many situations, when a problem involves incomplete, uncertain and
vague information, it may be difficult to differentiate distinct elements and
one is forced to consider granules. On the other hand, in some situations
though detailed information is available, it may be sufficient to use granules
in order to have an efficient and practical solution. Granulation is an impor-
tant step in the human cognition process. From a more practical point of
view, the simplicity derived from granular computing is useful for designing
scalable data mining algorithms [138, 209, 219]. There are two aspects of
granular computing, one deals with formation, representation and interpreta-
tion of granules (algorithmic aspect) while the other deals with utilization of
granules for problem solving (semantic aspect). Several approaches for gran-
ular computing have been suggested in literature including fuzzy set theory
[288], rough set theory [214], power algebras and interval analysis. The rough
set theoretic approach is based on the principles of set approximation and
provides an attractive framework for data mining and knowledge discovery.

1.6.6 Efficient search algorithms


The most straightforward approach to scaling up machine learning is to
produce more efficient algorithms or to increase the efficiency of existing al-
gorithms. As mentioned earlier the data mining problem may be framed as
a search through a space of models based on some fitness criteria. This view
allows for three possible ways of achieving scalability.

• Restricted model space: Simple learning algorithms (e.g., two-level trees,


decision stump) and constrained search involve a “smaller” model space
and decrease the complexity of the search process.
• Knowledge encoding: Domain knowledge encoding, providing an initial
solution close to the optimal one, results in fast convergence and avoid-
ance of local minima. Domain knowledge may also be used to guide the
search process for faster convergence.
• Powerful algorithms and heuristics: Strategies like greedy search, di-
vide and conquer, and modular computation are often found to provide
considerable speed-ups. Programming optimization (efficient data struc-
tures, dynamic search space restructuring) and the use of genetic algo-

© 2004 by Taylor & Francis Group, LLC


Introduction 21

rithms, randomized algorithms and parallel algorithms may also obtain


approximate solutions much faster compared to conventional algorithms.

1.7 Significance of Soft Computing in KDD


Soft computing [287] is a consortium of methodologies which works syner-
gistically and provides in one form or another flexible information processing
capabilities for handling real life ambiguous situations. Its aim is to exploit
the tolerance for imprecision, uncertainty, approximate reasoning and partial
truth in order to achieve tractability, robustness, low cost solutions, and close
resemblance to human-like decision making. In other words, it provides the
foundation for the conception and design of high MIQ (Machine IQ) systems
and therefore forms the basis of future generation computing systems.
In the last section we have discussed various strategies for handling the
scalibility issue in data mining. Besides scalibility other challenges include
modeling user interaction and prior knowledge, handling nonstandard, mixed
media and incomplete data, and evaluating and visualizing the discovered
knowledge. While the scalibility property is important for data mining tasks,
the significance of the above issues is more with respect to the knowledge
discovery aspect of KDD. Soft computing methodolgies, having flexible in-
formation processing capability for handling real life ambiguous situations,
provide a suitable framework for addressing the latter issues [263, 175].
The main constituents of soft computing, at this juncture, as mentioned
in Section 1.2.3, include fuzzy logic, neural networks, genetic algorithms, and
rough sets. Each of them contributes a distinct methodology, as stated below,
for addressing different problems in its domain.
Fuzzy sets, which constitute the oldest component of soft computing, are
suitable for handling the issues related to understandability of patterns, in-
complete/noisy data, mixed media information and human interaction and
can provide approximate solutions faster. They have been mainly used in
clustering, discovering association rules and functional dependencies, summa-
rization, time series analysis, web applications and image retrieval.
Neural networks are suitable in data-rich environments and are typically
used for extracting embedded knowledge in the form of rules, quantitative
evaluation of these rules, clustering, self-organization, classification and re-
gression. They have an advantage, over other types of machine learning algo-
rithms, for scaling [21].
Neuro-fuzzy hybridization exploits the characteristics of both neural net-
works and fuzzy sets in generating natural/linguistic rules, handling imprecise
and mixed mode data, and modeling highly nonlinear decision boundaries.
Domain knowledge, in natural form, can be encoded in the network for im-

© 2004 by Taylor & Francis Group, LLC


22 Pattern Recognition Algorithms for Data Mining

proved performance.
Genetic algorithms provide efficient search algorithms to select a model,
from mixed media data, based on some preference criterion/objective function.
They have been employed in regression and in discovering association rules.
Rough sets are suitable for handling different types of uncertainty in data and
have been mainly utilized for extracting knowledge in the form of rules.
Other hybridizations typically enjoy the generic and application-specific
merits of the individual soft computing tools that they integrate. Data mining
functions modeled by such systems include rule extraction, data summariza-
tion, clustering, incorporation of domain knowledge, and partitioning. Case-
based reasoning (CBR), a novel AI problem-solving paradigm, has recently
drawn the attention of both soft computing and data mining communities.
A profile of its theory, algorithms, and potential applications is available in
[262, 195, 208].
A review on the role of different soft computing tools in data mining prob-
lems is provided in Appendix A.

1.8 Scope of the Book


This book has eight chapters describing various theories, methodologies,
and algorithms along with extensive experimental results, addressing certain
pattern recognition tasks essential for data mining. Tasks considered include
data condensation, feature selection, case generation, clustering, classification,
and rule generation/evaluation. Various methodologies have been described
using both classical and soft computing approaches (integrating fuzzy logic,
artificial neural networks, rough sets, genetic algorithms). The emphasis of
the methodologies is on handling data sets that are large (both in size and di-
mension) and involve classes that are overlapping, intractable and/or having
nonlinear boundaries. Several strategies based on data reduction, dimen-
sionality reduction, active learning, granular computing and efficient search
heuristics are employed for dealing with the issue of ‘scaling-up’ in learning
problem. The problems of handling linguistic input and ambiguous output
decision, learning of overlapping/intractable class structures, selection of op-
timal parameters, and discovering human comprehensible knowledge (in the
form of linguistic rules) are addressed in a soft computing framework.
The effectiveness of the algorithms is demonstrated on different real life data
sets, mainly large in dimension and/or size, taken from varied domains, e.g.,
geographical information systems, remote sensing imagery, population census,
speech recognition, and cancer management. Superiority of the models over
several related ones is found to be statistically significant.
In Chapter 2, the problem of data condensation is addressed. After provid-

© 2004 by Taylor & Francis Group, LLC


Introduction 23

ing a brief review of diferent data condensation algorithms, such as condensed


nearest neighbor rule, learning vector quantization and Astrahan’s method, a
generic multiscale data reduction methodology is described. It preserves the
salient characteristics of the original data set by representing the probability
density underlying it. The representative points are selected in a multires-
olution fashion, which is novel with respect to the existing density based
approaches. A scale parameter (k) is used in non-parametric density estima-
tion so that the data can be viewed at varying degrees of detail depending on
the value of k. This type of multiscale representation is desirable in various
data mining applications. At each scale the representation gives adequate
importance to different regions of the feature space based on the underlying
probability density.
It is observed experimentally that the multiresolution approach helps to
achieve lower error with similar condensation ratio compared to several related
schemes. The reduced set obtained is found to be effective for a number
of mining tasks such as classification, clustering and rule generation. The
algorithm is also found to be efficient in terms of sample complexity, in the
sense that the error level decreases rapidly with the increase in size of the
condensed set.
Chapter 3 deals with the task of feature selection. First a brief review on
feature selection and extraction methods, including the filter and wrapper
approaches, is provided. Then it describes, in detail, an unsupervised feature
selection algorithm suitable for data sets, large in both dimension and size.
Conventional methods of feature selection involve evaluating different feature
subsets using some index and then selecting the best among them. The index
usually measures the capability of the respective subsets in classification or
clustering depending on whether the selection process is supervised or unsu-
pervised. A problem of these methods, when applied to large data sets, is the
high computational complexity involved in searching.
The unsupervised algorithm described in Chapter 3 digresses from the afore-
said conventional view and is based on measuring similarity between features
and then removing the redundancy therein. This does not need any search
and, therefore, is fast. Since the method achieves dimensionality reduction
through removal of redundant features, it is more related to feature selection
for compression rather than for classification.
The method involves partitioning of the original feature set into some dis-
tinct subsets or clusters so that the features within a cluster are highly similar
while those in different clusters are dissimilar. A single feature from each such
cluster is then selected to constitute the resulting reduced subset. The algo-
rithm is generic in nature and has the capability of multiscale representation
of data sets.
Superiority of the algorithm, over related methods, is demonstrated exten-
sively on different real life data with dimension ranging from 4 to 649. Com-
parison is made on the basis of both clustering/classification performance and
redundancy reduction. Studies on effectiveness of the maximal information

© 2004 by Taylor & Francis Group, LLC


24 Pattern Recognition Algorithms for Data Mining

compression index and the effect of scale parameter are also presented.
While Chapters 2 and 3 deal with some preprocessing tasks of data mining,
Chapter 4 is concerned with its classification/learning aspect. Here we present
two active learning strategies for handling the large quadratic programming
(QP) problem of support vector machine (SVM) classifier design. The first
one is an error-driven incremental method for active support vector learning.
The method involves selecting a chunk of q new points, having equal number of
correctly classified and misclassified points, at each iteration by resampling the
data set, and using it to update the current SV set. The resampling strategy
is computationally superior to random chunk selection, while achieving higher
classification accuracy. Since it allows for querying multiple instances at each
iteration, it is computationally more efficient than those that are querying for
a single example at a time.
The second algorithm deals with active support vector learning in a statis-
tical query framework. Like the previous algorithm, it also involves queries
for multiple instances at each iteration. The intermediate statistical query
oracle, involved in the learning process, returns the value of the probability
that a new example belongs to the actual support vector set. A set of q new
points is selected according to the above probability and is used along with
the current SVs to obtain the new SVs. The probability is estimated using a
combination of two factors: the margin of the particular example with respect
to the current hyperplane, and the degree of confidence that the current set
of SVs provides the actual SVs. The degree of confidence is quantified by a
measure which is based on the local properties of each of the current support
vectors and is computed using the nearest neighbor estimates.
The methodology in the second part has some more advantages. It not only
queries for the error points (or points having low margin) but also a number of
other points far from the separating hyperplane (interior points). Thus, even if
a current hypothesis is erroneous there is a scope for its being corrected owing
to the interior points. If only error points were selected the hypothesis might
have actually been worse. The ratio of selected points having low margin and
those far from the hyperplane is decided by the confidence factor, which varies
adaptively with iteration. If the current SV set is close to the optimal one, the
algorithm focuses only on the low margin points and ignores the redundant
points that lie far from the hyperplane. On the other hand, if the confidence
factor is low (say, in the initial learning phase) it explores a higher number
of interior points. Thus, the trade-off between efficiency and robustness of
performance is adequately handled in this framework. Also, the efficiency of
most of the existing active SV learning algorithms depends on the sparsity
ratio (i.e., the ratio of the number of support vectors to the total number
of data points) of the data set. Due to the adaptive nature of the query in
the proposed algorithm, it is likely to be efficient for a wide range of sparsity
ratio.
Experimental results have been presented for five real life classification prob-
lems. The number of patterns ranges from 351 to 495141, dimension from

© 2004 by Taylor & Francis Group, LLC


Introduction 25

9 to 34, and the sparsity ratio from 0.01 to 0.51. The algorithms, particularly
the second one, are found to provide superior performance in terms of classi-
fication accuracy, closeness to the optimal SV set, training time and margin
distribution, as compared to several related algorithms for incremental and
active SV learning. Studies on effectiveness of the confidence factor, used in
statistical queries, are also presented.
In the previous three chapters all the methodologies described for data
condensation, feature selection and active learning are based on classical ap-
proach. The next three chapters (Chapters 5 to 7) emphasize demonstrating
the effectiveness of integrating different soft computing tools, e.g., fuzzy logic,
artificial neural networks, rough sets and genetic algorithms for performing
certain tasks in data mining.
In Chapter 5 methods based on the principle of granular computing in
rough fuzzy framework are described for efficient case (representative class
prototypes) generation of large data sets. Here, fuzzy set theory is used for
linguistic representation of patterns, thereby producing a fuzzy granulation of
the feature space. Rough set theory is used to obtain the dependency rules
which model different informative regions in the granulated feature space.
The fuzzy membership functions corresponding to the informative regions are
stored as cases along with the strength values. Case retrieval is made using a
similarity measure based on these membership functions. Unlike the existing
case selection methods, the cases here are cluster granules, and not the sam-
ple points. Also, each case involves a reduced number of relevant (variable)
features. Because of this twofold information compression the algorithm has
a low time requirement in generation as well as retrieval of cases. Superior-
ity of the algorithm in terms of classification accuracy, and case generation
and retrieval time is demonstrated experimentally on data sets having large
dimension and size.
In Chapter 6 we first describe, in brief, some clustering algorithms suit-
able for large data sets. Then an integration of a minimal spanning tree
(MST) based graph-theoretic technique and expectation maximization (EM)
algorithm with rough set initialization is described for non-convex clustering.
Here, rough set initialization is performed using dependency rules generated
on a fuzzy granulated feature space. EM provides the statistical model of the
data and handles the associated uncertainties. Rough set theory helps in faster
convergence and avoidance of the local minima problem, thereby enhancing
the performance of EM. MST helps in determining non-convex clusters. Since
it is applied on Gaussians rather than the original data points, the time re-
quirement is very low. Comparison with related methods is made in terms
of a cluster quality measure and computation time. Its effectiveness is also
demonstrated for segmentation of multispectral satellite images into different
landcover types.
A rough self-organizing map (RSOM) with fuzzy discretization of feature
space is described in Chapter 7. Discernibility reducts obtained using rough
set theory are used to extract domain knowledge in an unsupervised frame-

© 2004 by Taylor & Francis Group, LLC


26 Pattern Recognition Algorithms for Data Mining

work. Reducts are then used to determine the initial weights of the network,
which are further refined using competitive learning. Superiority of this net-
work in terms of quality of clusters, learning time and representation of data is
demonstrated quantitatively through experiments over the conventional SOM
with both random and linear initializations. A linguistic rule generation algo-
rithm has been described. The extracted rules are also found to be superior
in terms of coverage, reachability and fidelity. This methodology is unique in
demonstrating how rough sets could be integrated with SOM, and it provides
a fast and robust solution to the initialization problem of SOM learning.
While granular computing is performed in rough-fuzzy and neuro-rough
frameworks in Chapters 5 and 6 and Chapter 7, respectively, the same is done
in Chapter 8 in an evolutionary rough-neuro-fuzzy framework by a synergis-
tic integration of all the four soft computing components. After explaining
different ensemble learning techniques, a modular rough-fuzzy multilayer per-
ceptron (MLP) is described in detail. Here fuzzy sets, rough sets, neural
networks and genetic algorithms are combined with modular decomposition
strategy. The resulting connectionist system achieves gain in terms of perfor-
mance, learning time and network compactness for classification and linguistic
rule generation.
Here, the role of the individual components is as follows. Fuzzy sets han-
dle uncertainties in the input data and output decision of the neural network
and provide linguistic representation (fuzzy granulation) of the feature space.
Multilayer perceptron is well known for providing a connectionist paradigm for
learning and adaptation. Rough set theory is used to extract domain knowl-
edge in the form of linguistic rules, which are then encoded into a number of
fuzzy MLP modules or subnetworks. Genetic algorithms (GAs) are used to
integrate and evolve the population of subnetworks as well as the fuzzifica-
tion parameters through efficient searching. A concept of variable mutation
operator is introduced for preserving the localized structure of the consti-
tuting knowledge-based subnetworks, while they are integrated and evolved.
The nature of the mutation operator is determined by the domain knowledge
extracted by rough sets.
The modular concept, based on a “divide and conquer” strategy, provides
accelerated training, preserves the identity of individual clusters, reduces the
catastrophic interference due to overlapping regions, and generates a com-
pact network suitable for extracting a minimum number of rules with high
certainty values. A quantitative study of the knowledge discovery aspect is
made through different rule evaluation indices, such as interestingness, cer-
tainty, confusion, coverage, accuracy and fidelity. Different well-established
algorithms for generating classification and association rules are described in
this regard for convenience. These include a priori, subset, MofN and dynamic
itemset counting methods.
The effectiveness of the modular rough-fuzzy MLP and its rule extraction
algorithm is extensively demonstrated through experiments along with com-
parisons. In some cases the rules generated are also validated by domain

© 2004 by Taylor & Francis Group, LLC


Introduction 27

experts. The network model, besides having significance in soft computing


research, has potential for application to large-scale problems involving knowl-
edge discovery tasks, particularly related to mining of linguistic classification
rules.
Two appendices are included for the convenience of readers. Appendix A
provides a review on the role of different soft computing tools in KDD. Ap-
pendix B describes the different data sets used in the experiments.

© 2004 by Taylor & Francis Group, LLC


Chapter 2
Multiscale Data Condensation

2.1 Introduction
The current popularity of data mining and data warehousing, as well as the
decline in the cost of disk storage, has led to a proliferation of terabyte data
warehouses [66]. Mining a database of even a few gigabytes is an arduous
task for machine learning techniques and requires advanced parallel hardware
and algorithms. An approach for dealing with the intractable problem of
learning from huge databases is to select a small subset of data for learning
[230]. Databases often contain redundant data. It would be convenient if
large databases could be replaced by a small subset of representative patterns
so that the accuracy of estimates (e.g., of probability density, dependencies,
class boundaries) obtained from such a reduced set should be comparable to
that obtained using the entire data set.
The simplest approach for data reduction is to draw the desired number
of random samples from the entire data set. Various statistical sampling
methods such as random sampling, stratified sampling, and peepholing [37]
have been in existence. However, naive sampling methods are not suitable
for real world problems with noisy data, since the performance of the algo-
rithms may change unpredictably and significantly [37]. Better performance
is obtained using uncertainty sampling [136] and active learning [241], where a
simple classifier queries for informative examples. The random sampling ap-
proach effectively ignores all the information present in the samples not chosen
for membership in the reduced subset. An advanced condensation algorithm
should include information from all samples in the reduction process.
Some widely studied schemes for data condensation are built upon classi-
fication-based approaches, in general, and the k-NN rule, in particular [48].
The effectiveness of the condensed set is measured in terms of the classification
accuracy. These methods attempt to derive a minimal consistent set, i.e.,
a minimal set which correctly classifies all the original samples. The very
first development of this kind is the condensed nearest neighbor rule (CNN)
of Hart [91]. Other algorithms in this category including the popular IB3,
IB4 [4], reduced nearest neighbor and iterative condensation algorithms are
summarized in [279]. Recently a local asymmetrically weighted similarity
metric (LASM) approach for data compression [239] is shown to have superior

29
© 2004 by Taylor & Francis Group, LLC
30 Pattern Recognition Algorithms for Data Mining

performance compared to conventional k-NN classification-based methods.


Similar concepts of data reduction and locally varying models based on neural
networks and Bayes classifier are discussed in [226] and [144] respectively.
The classification-based condensation methods are, however, specific to (i.e.,
dependent on) the classification tasks and the models (e.g., k-NN, perceptron)
used. Data condensation of more generic nature is performed by classical vec-
tor quantization methods [83] using a set of codebook vectors which minimize
the quantization error. An effective and popular method of learning the vec-
tors is to use the self-organizing map [118]. However, if the self-organizing
map is to be used as a pattern classifier, the codebook vectors may be fur-
ther refined using the learning vector quantization algorithms [118]. These
methods are seen to approximate the density underlying the data [118]. Since
learning is inherent in the methodologies, the final solution is dependent on
initialization, choice of learning parameters, and the nature of local minima.
Another group of generic data condensation methods are based on the
density-based approaches, which consider the density function of the data for
the purpose of condensation rather than minimizing the quantization error.
These methods do not involve any learning process and therefore are deter-
ministic (i.e., for a given input data set the output condensed set is fixed).
Here one estimates the density at a point and selects the points having ‘higher’
densities, while ensuring a minimum separation between the selected points.
These methods bear resemblance to density-based clustering techniques like
the DBSCAN algorithm [61], popular for spatial data mining. DBSCAN is
based on the principle that a cluster point contains in its neighborhood a
minimum number of samples; i.e., the cluster point has density above a cer-
tain threshold. The neighborhood radius and the density threshold are user
specified. Astrahan [13] proposed a classical data reduction algorithm of this
type in 1971, in which he used a hypersphere (disc) of radius d1 about a point
to obtain an estimate of density at that point. The points are sorted based
on these estimated densities, and the densest point is selected, while rejecting
all points that lie within another disc of radius d2 about the selected point.
The process is repeated until all the samples are covered. However, selecting
the values of d1 and d2 is a non-trivial problem. A partial solution using a
minimal spanning tree-based method is described in [39]. Though the above
approaches select the points based on the density criterion, they do not di-
rectly attempt to represent the original distribution. The selected points are
distributed evenly over the entire feature space irrespective of the distribution.
A constant separation is used for instance pruning. Interestingly, Fukunaga
[74] suggested a non-parametric algorithm for selecting a condensed set based
on the criterion that density estimates obtained with the original set and the
reduced set are close. The algorithm is, however, search-based and requires
large computation time.
Efficiency of condensation algorithms may be improved by adopting a mul-
tiresolution representation approach. A multiresolution framework for instan-
ce-based learning and regression has been studied in [54] and [178] respectively.

© 2004 by Taylor & Francis Group, LLC


Exploring the Variety of Random
Documents with Different Content
authority whatever in his own country; for as serki-n-turáwa he had
to levy the tax of ten mithkáls on every camel-load of merchandise,
and this he is said to have done with some degree of severity. After
a long conversation on the steps of the terrace, we parted, the best
possible friends.
Not so pleasant to me, though not without interest, was the visit
of another great man—Belróji, the támberi or war-chieftain of the
Ighólar Im-esághlar. He was still in his prime, but my Kél-owí (who
were always wrangling like children) got up a desperate fight with
him in my very room, which was soon filled with clouds of dust; and
the young Slimán entered during the row, and joining in it, it
became really frightful. The Kél-owí were just like children; when
they went out they never failed to put on all their finery, which they
threw off as soon as they came within doors, resuming their old dirty
clothes.
It was my custom in the afternoon, when the sun had set behind
the opposite buildings, to walk up and down in front of our house;
and while so doing to-day I had a long conversation with two chiefs
of the Itísan on horseback, who came to see me, and avowed their
sincere friendship and regard. They were fine, tall men, but rather
slim, with a noble expression of countenance and of light colour.
Their dress was simple but handsome, and arranged with great care.
All the Tuarek, from Ghát as far as Háusa, and from Alákkos to
Timbúktu, are passionately fond of the tobes and trousers called
“tailelt” (the Guinea-fowl), or “filfil” (the pepper), on account of their
speckled colour. They are made of silk and cotton interwoven, and
look very neat. The lowest part of the trousers, which forms a
narrow band about two inches broad, closing rather tightly, is
embroidered in different colours. None of the Tuarek of pure blood
would, I think, degrade themselves by wearing on their head the red
cap.
Monday, October 21.—Early in the morning I went with Hámma to
take leave of the Sultan, who had been too busy for some days to
favour me with an audience; and I urged my friend to speak of the
treaty though I was myself fully aware of the great difficulty which
so complicated a paper, written in a form entirely unknown to the
natives, and which must naturally be expected to awaken their
suspicion, would create, and of the great improbability of its being
signed while the Sultan was pressed with a variety of business. On
the way to the fáda we met Áshu, the present serki-n-turáwa, a
large-sized man, clad in an entirely white dress, which may not
improbably be a sign of his authority over the white men (Turáwa).
He is said to be a very wealthy man. He replied to my compliments
with much kindness, entered into conversation with me about the
difference of our country and theirs, and ordered one of his
companions to take me to a small garden which he had planted near
his house in the midst of the town, in order to see what plants we
had in common with them. Of course there was nothing like our
plants; and my cicerone conceived rather a poor idea of our country
when he heard that all the things which they had we had not—
neither senna, nor bamia, nor indigo, nor cotton, nor Guinea-corn,
nor, in short, the most beautiful of all trees of the creation, as he
thought—the talha, or Mimosa ferruginea; and he seemed rather
incredulous when told that we had much finer plants than they.
We then went to the fáda. The Sultan seemed quite ready for
starting. He was sitting in the courtyard of his palace, surrounded by
a multitude of people and camels, while the loud murmuring noise of
a number of schoolboys who were learning the Kurán proceeded
from the opposite corner, and prevented my hearing the
conversation of the people. The crowd and the open locality were, of
course, not very favourable to my last audience, and it was
necessarily a cold one. Supported by Hámma, I informed the Sultan
that I expected still to receive a letter from him to the Government
under whose auspices I was travelling, expressive of the pleasure
and satisfaction he had felt in being honoured with a visit from one
of the mission, and that he would gladly grant protection to any
future traveller who should happen to visit his country. The Sultan
promised that such a letter should be written; however, the result
proved that either he had not quite understood what I meant, or,
what is more probable, that in his precarious situation he felt himself
not justified in writing to a Christian government, especially as he
had received no letter from it.
When I had returned to my quarters, Hámma brought me three
letters, in which ʿAbd el Káder recommended my person and my
luggage to the care of the Governors of Kanó, Kátsena, and Dáura,
and which were written in rather incorrect Arabic, and in nearly the
same terms. They were as follows:—

“In the name of God, etc.


“From the Emír of Ahír, ʿAbd el Káder, son of the Sultan
Mohammed el Bákeri, to the Emír of Dáura, son of the late
Emír of Dáura, Is-hhák. The mercy of God upon the eldest
companions of the Prophet, and His blessing upon the
Khalífa; ‘Amín.’ The most lasting blessing and the highest
wellbeing to you without end. I send this message to you
with regard to a stranger, my guest, of the name of ʿAbd el
Kerím, who came to me, and is going to the Emír el Mumenín
[the Sultan of Sókoto], in order that, when he proceeds to
you, you may protect him and treat him well, so that none of
the freebooters and evil-doers may hurt him or his property,
but that he may reach the Emír el Mumenín. Indeed, we
wrote this on account of the freebooters, in order that you
may protect him against them in the most efficacious manner.
Farewell.”

These letters were all sealed with the seal of the Sultan.
Hámma showed me also another letter which he had received
from the Sultan, and which I think interesting enough to be here
inserted, as it is a faithful image of the turbulent state of the country
at that time, and as it contains the simple expression of the sincere
and just proceedings of the new Sultan. Its purport was as follows,
though the language in which it is written is so incorrect that several
passages admit of different interpretations:—
“In the name of God, etc.
“From the Commander, the faithful Minister of Justice, the
Sultan ʿAbd el Káder, son of the Sultan Mohammed el Bákeri,
to the chiefs of all the tribe of Eʾ Núr, and Hámed, and Sëis,
and all those among you who have large possessions, perfect
peace to you.
“Your eloquence, compliments, and information are
deserving of praise. We have seen the auxiliaries sent to us
by your tribe, and we have taken energetic measures with
them against the marauders, who obstruct the way of the
caravans of devout people, and the intercourse of those who
travel, as well as those who remain at home. On this account
we desire to receive aid from you against their incursions.
The people of the Kél-fadaye, they are the marauders. We
should not have prohibited their chiefs to exercise rule over
them, except for three things: first, because I am afraid they
will betake themselves from the Aníkel [the community of the
people of Aír] to the Awelímmiden; secondly, in order that
they may not make an alliance with them against us, for they
are all marauders; and thirdly, in order that you may approve
of their paying us the tribute. Come, then, to us quickly. You
know that what the hand holds it holds only with the aid of
the fingers; for without the fingers the hand can seize
nothing.
“We therefore will expect your determination, that is to say
your coming, after the departure of the salt-caravan of the
Itísan, fixed among you for the fifteenth of the month. God!
God is merciful and answereth prayer! Come therefore to us,
and we will tuck up our sleeves, and drive away the
marauders, and fight valiantly against them as God (be He
glorified!) hath commanded.
“Lo, corruption hath multiplied on the face of the earth!
May the Lord not question us on account of the poor and
needy, orphans and widows, according to His word: ‘You are
all herdsmen, and ye shall all be questioned respecting your
herds, whether ye have indeed taken good care of them or
dried them up.’
“Delay not, therefore, but hasten to our residence, where
we are all assembled; for ‘zeal in the cause of religion is the
duty of all;’ or send thy messenger to us quickly with a
positive answer; send thy messenger as soon as possible.
Farewell!”

The whole population was in alarm, and everybody who was able
to bear arms prepared for the expedition. About sunset the “égehen”
left the town, numbering about four hundred men, partly on camels,
partly on horseback, besides the people on foot. Bóro as well as
Áshu accompanied the Sultan, who this time was himself mounted
on a camel. They went to take their encampment near that of
Astáfidet, in Tagúrast, ʿAbd el Káder pitching a tent of grey colour,
and in size like that of a Turkish aghá, in the midst of the Kél-gerés,
the Kél-ferwán, and the Emgedesíye; while Astáfidet, who had no
tent, was surrounded by the Kél-owí. The Sultan was kind and
attentive enough not to forget me even now; and having heard that
I had not yet departed, Hámma not having finished his business in
the town, he sent me some wheat, a large botta with butter and
vegetables (chiefly melons and cucumbers), and the promise of
another sheep.
In the evening the drummer again went his rounds through the
town, proclaiming the strict order of the Sultan that everybody
should lay in a large supply of provisions. Although the town in
general had become very silent when deserted by so many people,
our house was kept in constant bustle, and in the course of the night
three mehára came from the camp, with people who could get no
supper there, and sought it with us. Bóro sent a messenger to me
early the next morning, urgently begging for a little powder, as the
“Mehárebín” of the Imghád had sent off their camels and other
property, and were determined to resist the army of the Sultan.
However, I could send him but very little. My amusing friend
Mohammed spent the whole day with us, when he went to join the
ghazzia. I afterwards learnt that he obtained four head of cattle as
his share. There must be considerable herds of cattle in the more
favoured valleys of Asben; for the expedition had nothing else to live
upon, as Mohammed afterwards informed me, and slaughtered an
immense quantity of them. Altogether, the expedition was
successful, and the Fádë-ang and many tribes of the Imghád lost
almost all their property. Even the influential Háj Beshír was
punished, on account of his son having taken part in the expedition
against us. I received also the satisfactory information that ʿAbd el
Káder had taken nine camels from the man who retained my méheri;
but I gained nothing thereby, neither my own camel being returned
nor another given me in its stead. The case was the same with all
our things; but nevertheless the proceeding had a good effect,
seeing that people were punished expressly for having robbed
Christians, and thus the principle was established that it was not less
illegal to rob Christians than it was to rob Mohammedans, both
creeds being placed, as far as regards the obligations of peace and
honesty, on equally favourable terms.
Tuesday, October 22.—I spent the whole of Tuesday in my house,
principally in taking down information which I received from the
intelligent Ghadámsi merchant Mohammed, who, having left his
native town from fear of the Turks, had resided six years in Ágades,
and was a well-informed man.
Wednesday, October 23.—My old friend the blacksmith Hámmeda,
and the tall Elíyas, went off this morning with several camels laden
with provisions, while Hámma still stayed behind to finish the
purchases; for on account of the expedition, and the insecure state
of the road to Damerghú, it had been difficult to procure provisions
in sufficient quantity. Our house therefore became almost as silent
and desolate as the rest of the town; but I found a great advantage
in remaining a few days longer, for my chivalrous friend and
protector, who, as long as the Sultan and the great men were
present, had been very reserved and cautious, had now no further
scruple about taking me everywhere, and showing me the town
“within and without.”
We first visited the house of Ídder,
a broker, who lived at a short distance
to the south from our house, and had
also lodged Háj ʿAbdúwa during his
stay here. It was a large, spacious
dwelling, well arranged with a view to
comfort and privacy, according to the
conception and customs of the
inhabitants, while our house (being a
mere temporary residence for Ánnur’s
people occasionally visiting the town)
was a dirty, comfortless abode. We
entered first a vestibule, about twenty-five feet long and nine broad,
having on each side a separate space marked off by that low kind of
balustrade mentioned in my description of the Sultan’s house. This
vestibule or ante-room was followed by a second room of larger size
and irregular arrangement; opposite the entrance it opened into
another apartment, which, with two doors, led into a spacious inner
courtyard, which was very irregularly circumscribed by several rooms
projecting into it, while to the left it was occupied by an enormous
bedstead (1). These bedsteads are a most characteristic article of
furniture in all the dwellings of the Sónghay. In Ágades they are
generally very solidly built of thick boards, and furnished with a
strong canopy resting upon four posts, covered with mats on the top
and on three sides, the remaining side being shut in with boards.
Such a canopied bed looks like a little house by itself. On the wall of
the first chamber, which on the right projected into the courtyard,
several lines of large pots had been arranged, one above the other
(2), forming so many warm nests for a number of turtle-doves which
were playing all about the courtyard; while on the left, in the half-
decayed walls of two other rooms (3), about a dozen goats were
fastened each to a separate pole. The background of the courtyard
contained several rooms, and in front of it a large shade (4) had
been built of mats, forming a rather pleasant and cool resting-place.
Numbers of children were gambolling about, who gave to the whole
a very cheerful appearance. There is something very peculiar in
these houses, which are constructed evidently with a view to
comfort and quiet enjoyment.
We then went to visit a female friend of Hámma, who lived in the
south quarter of the town, in a house which likewise bespoke much
comfort; but here, on account of the number of inmates, the
arrangement was different, the second vestibule being furnished on
each side with a large bedstead instead of mats, though here also
there was in the courtyard an immense bedstead. The courtyard was
comparatively small, and a long corridor on the left of it led to an
inner courtyard or “tsakangída,” which I was not allowed to see. The
mistress of the house was still a very comely person, although she
had borne several children. She had a fine figure, though rather
under the middle size, and a fair complexion. I may here remark that
many of the women of Ágades are not a shade darker than Arab
women in general. She wore a great quantity of silver ornaments,
and was well dressed in a gown of coloured cotton and silk. Hámma
was very intimate with her, and introduced me to her as his friend
and protégé, whom she ought to value as highly as himself. She was
married, but her husband was residing in Kátsena, and she did not
seem to await his return in the Penelopean style. The house had as
many as twenty inmates, there being no less than six children, I
think, under five years of age, and among them a very handsome
little girl, the mother’s favourite; besides, there were six or seven
full-grown slaves. The children were all naked, but wore ornaments
of beads and silver.
After we had taken leave of this Emgedesíye lady, we followed the
street towards the south, where there were some very good houses,
although the quarter in general was in ruins; and here I saw the
very best and most comfortable-looking dwelling in the town. All the
pinnacles were ornamented with ostrich eggs. One will often find in
an eastern town, after the first impression of its desolate appearance
is gone by, many proofs that the period of its utter prostration is not
yet come, but that even in the midst of the ruins there is still a good
deal of ease and comfort. Among the ruins of the southern quarter
are to be seen the pinnacled walls of a building of immense
circumference and considerable elevation; but unfortunately I could
not learn from Hámma for what purpose it had been used; however,
it was certainly a public building, and probably a large khán rather
than the residence of the chief. With its high, towering walls, it still
forms a sort of outwork on the south side of the town, where in
general the wall is entirely destroyed, and the way is everywhere
open. Hámma had a great prejudice against this desolate quarter.
Even the more intelligent Mohammedans are often afraid to enter
former dwelling-places of men, believing them to be haunted by
spirits; but he took me to some inhabited houses, which were all
built on the same principle as that described, but varying greatly in
depth and in the size of the courtyard; the staircases (abi-n-háwa)
leading to the upper story are in the courtyard, and are rather
irregularly built of stones and clay. In some of them young ostriches
were running about. The inhabitants of all the houses seemed to
have the same cheerful disposition, and I was glad to find scarcely a
single instance of misery. I give here the ground-plan of another
house.
The artisans who work in leather (an
occupation left entirely to females) seem to live
in a quarter by themselves, which originally was
quite separated from the rest of the town by a
sort of gate; but I did not make a sufficient
survey of this quarter to mark it distinctly on the
ground-plan of the town. We also visited some
of the mat-makers.
Our maimólo of the other day, who had
discovered that we had slaughtered our sheep,
paid us a visit in the evening, and for a piece of meat entertained
me with a clever performance on his instrument, accompanied with
a song. Hámma spent his evening with our friend the Emgedesíye
lady, and was kind enough to beg me to accompany him. This I
declined, but gave him a small present to take to her.
I had a fair sample of the state of morals in Ágades the following
day, when five or six girls and women came to pay me a visit in our
house, and with much simplicity invited me to make merry with
them, there being now, as they said, no longer reason for reserve,
“as the Sultan was gone.” It was indeed rather amusing to see what
conclusions they drew from the motto “Serki yátafi.” Two of them
were tolerably pretty and well-formed, with fine black hair hanging
down in plaits or tresses, lively eyes, and very fair complexion. Their
dress was decent, and that of one of them even elegant, consisting
of an under-gown reaching from the neck to the ankles, and an
upper one drawn over the head, both of white colour; but their
demeanour was very free, and I too clearly understood the caution
requisite in a European who would pass through these countries
unharmed and respected by the natives, to allow myself to be
tempted by these wantons. It would be better for a traveller in these
regions, both for his own comfort and for the respect felt for him by
the natives, if he could take his wife with him; for these simple
people do not understand how a man can live without a partner. The
Western Tuarek, who in general are very rigorous in their manners,
and quite unlike the Kél-owí, had nothing to object against me
except my being a bachelor. But as it is difficult to find a female
companion for such journeys, and as by marrying a native he would
expose himself to much trouble and inconvenience on the score of
religion, he will do best to maintain the greatest austerity of
manners with regard to the other sex, though he may thereby
expose himself to a good deal of derision from some of the lighter-
hearted natives. The ladies, however, became so troublesome that I
thought it best to remain at home for a few days, and was thus
enabled at the same time to note down the information which I had
been able to pick up. During these occupations I was greatly pleased
with the companionship of a diminutive species of finches which
frequent all the rooms in Ágades, and, as I may add from later
experience, in Timbúktu also; the male, with its red neck, in
particular looks extremely pretty. The poults were just about to
fledge.
Sunday, October 27.—There was one very characteristic building in
the town, which, though a most conspicuous object from the terrace
of our house, I had never yet investigated with sufficient accuracy.
This was the mesállaje, or high tower rising over the roof of the
mosque. The reason why this building in particular (the most famous
and remarkable one in the town) had been hitherto observed by me
only from a distance, and in passing by, must be obvious. Difference
of religious creed repelled me from it; and so long as the town was
full of strangers, some of them very fanatical, it was dangerous for
me to approach it too closely. I had often inquired whether it would
not be possible to ascend the tower without entering the mosque;
but I had always received for answer that the entrance was locked
up. As soon, however, as the Sultan was gone, and when the town
became rather quiet, I urged Hámma to do his best that I might
ascend to the top of this curious building, which I represented to
him as a matter of the utmost importance to me, since it would
enable me not only to control my route by taking a few angles of the
principal elevations round the valley Aúderas, but also to obtain a
distant view over the country towards the west and south, which it
was not my good luck to visit myself. To-day Hámma promised me
that he would try what could be done.
Having once more visited the lively house of Ídder, we took our
way over the market-places, which were now rather dull. The
vultures looked out with visible greediness and eagerness from the
pinnacles of the ruined walls around for their wonted food—their
share of offal during these days, when so many people were absent,
being of course much reduced, though some of them probably had
followed their fellow-citizens on the expedition. So few people being
in the streets, the town had a more ruined look than ever, and the
large heap of rubbish accumulated on the south side of the butchers’
market seemed to me more disgusting than before. We kept along
the principal street between Dígi and Arrafíya, passing the deep well
Shedwánka on our right, and on the other side a school, which
resounded with the shrill voices of about fifty little boys repeating
with energy and enthusiasm the verses of the Kurán, which their
master had written for them upon their little wooden tablets. Having
reached the open space in front of the mosque, and there being
nobody to disturb me, I could view at my leisure this simple but
curious building, which in the subsequent course of my journey
became still more interesting to me, as I saw plainly that it was built
on exactly the same principle as the tower which rises over the
sepulchre of the famed conqueror Háj Mohammed Áskiá (the
“Ischia” of Leo).
The mesállaje starts up from the platform or terrace formed by the
roof of the mosque, which is extremely low, resting apparently, as
we shall see, in its interior, upon four massive pillars. It is square,
and measures at its base about thirty feet, having a small lean-to, on
its east side, on the terrace of the mosque, where most probably
there was formerly the entrance. From this the tower rises
(decreasing in width, and with a sort of swelling or entasis in the
middle of its elevation, something like the beautiful model adopted
by nature in the deléb palm, and imitated by architects in the
columns of the Ionic and Corinthian orders) to a height of from
ninety to ninety-five feet. It measures at its summit not more than
about eight feet in width. The interior is lighted by seven openings
on each side. Like most of the houses in Ágades, it is built entirely of
clay; and in order to strengthen a building so lofty and of so soft a
material, its four walls are united by thirteen layers of boards of the
dúm-tree, crossing the whole tower in its entire breadth and width,
and coming out on each side from three to four feet, while at the
same time they afford the only means of getting to the top. Its
purpose is to serve as a watch-tower, or at least was so at a former
time, when the town, surrounded by a strong wall and supplied with
water, was well capable of making resistance, if warned in due time
of an approaching danger. But at present it seems rather to be kept
in repair only as a decoration of the town.
The mesállaje in its present state was only six years old at the
time of my visit (in 1850), and perhaps was not even quite finished
in the interior, as I was told that the layers of boards were originally
intended to support a staircase of clay. About fifty paces from the
south-western corner of the mosque, the ruins of an older tower are
seen still rising to a considerable height, though leaning much to one
side, more so than the celebrated Tower of Pisa, and most probably
in a few years it will give way to an attack of storm and rain. This
more ancient tower seems to have stood quite detached from the
mosque.
Having sufficiently surveyed the exterior of the tower, and made a
sketch of it, I accompanied my impatient companion into the interior
of the mosque, into which he felt no scruple in conducting me. The
lowness of the structure had already surprised me from without; but
I was still more astonished when I entered the interior, and saw that
it consisted of low, narrow naves, divided by pillars of immense
thickness, the reason of which it is not possible at present to
understand, as they have nothing to support but a roof of dúm-tree
boards, mats, and a layer of clay; but I think it scarcely doubtful that
originally these naves were but the vaults or cellars of a grand
superstructure, designed but not executed; and this conjecture
seems to be confirmed by all that at present remains of the mosque.
The gloomy halls were buried in a mournful silence, interrupted only
by the voice of a solitary man, seated on a dirty mat at the western
wall of the tower, and reading diligently the torn leaves of a
manuscript. Seeing that it was the kádhi, we went up to him and
saluted him most respectfully; but it was not in the most cheerful
and amiable way that he received our compliments—mine in
particular—continuing to read, and scarcely raising his eyes from the
sheets before him. Hámma then asked for permission to ascend the
tower, but received a plain and unmistakable refusal, the thing being
impossible, there being no entrance to the tower at present. It was
shut up, he said, on account of the Kél-gerés, who used to ascend
the tower in great numbers. Displeased with his uncourteous
behaviour, and seeing that he was determined not to permit me to
climb the tower, were it ever so feasible, we withdrew and called
upon the imám, who lives in a house attached to these vaults, and
which looked a little neater from having been whitewashed;
however, he had no power to aid us in our purpose, but rather
confirmed the statement of the kádhi. This is the principal mosque
of the town, and seems to have been always so, although there are
said to have been formerly as many as seventy mosques, of which
ten are still in use. They deserve no mention, however, with the
exception of three, the Msíd Míli, Msíd Éheni, and Msíd el Mékki. I
will only add here that the Emgedesíye, so far as their very slender
stock of theological learning and doctrine entitles them to rank with
any sect, are Malekíye, as well as the Kél-owí.
Resigning myself to the disappointment of not being able to
ascend the tower, I persuaded my friend to take a longer walk with
me round the northern quarter of the town. But I forgot to mention
that besides Hámma, I had another companion of a very different
character. This was Zúmmuzuk, a reprobate of the worst description,
and whose features bore distinct impress of the vile and brutal
passions which actuated him; yet being a clever fellow, and (as the
illegitimate son, or “dan néma,” of an Emgédesi woman) fully master
of the peculiar idiom of Ágades, he was tolerated not only by the old
chief Ánnur, who employed him as interpreter, but even by me. How
insolent the knave could be I shall soon have occasion to mention.
With this fellow, therefore, and with Hámma, I continued my walk,
passing the kófa-n-alkáli, and then, from the ruins of the quarter
Ben-Gottára, turning to the north. Here the wall of the town is in a
tolerable state of preservation, but very weak and insufficient,
though it is kept in repair, even to the pinnacles, on account of its
surrounding the palace of the Sultan. Not far from this is an open
space called Azarmádarangh, “the place of execution,” where
occasionally the head of a rebellious chieftain or a murderer is cut
off by the “dóka;” but as far as I could learn, such things happen
very seldom. Even on the north side, two gates are in a tolerable
state of preservation.
Having entered the town from this side, we went to visit the
quarter of the leather-workers, which, as I stated before, seems to
have formed originally a regular ward; all this handicraft, with the
exception of saddle-work, is carried on by women, who work with
great neatness. Very beautiful provision-bags are made here,
although those which I brought back from Timbúktu are much
handsomer. We saw also some fine specimens of mats, woven of a
very soft kind of grass, and dyed of various colours. Unfortunately, I
had but little with me wherewith to buy; and even if I had been able
to make purchases, the destination of our journey being so distant,
there was not much hope of carrying the things safely to Europe.
The blacksmiths’ work of Ágades is also interesting, although showy
and barbarous, and not unlike the work with which the Spaniards
used to adorn their long daggers.
Monday, October 28.—During all this time I prosecuted inquiries
with regard to several subjects connected with the geography and
ethnography of this quarter of the world. I received several visits
from Emgédesi tradesmen, many of whom are established in the
northern provinces of Háusa, chiefly in Kátsena and Tasáwa, where
living is infinitely cheaper than in Ágades. All these I found to be
intelligent men, having been brought up in the centre of intercourse
between a variety of tribes and nations of the most different
organization, and, through the web of routes which join here,
receiving information of distant regions. Several of them had even
made the pilgrimage, and thus come in contact with the relatively
high state of civilization in Egypt and near the coast; and I shall not
easily forget the enlightened view which the mʿallem Háj
Mohammed ʿOmár, who visited me several times, took of Islamism
and Christianity. The last day of my stay in Ágades, he reverted to
the subject of religion, and asked me, in a manner fully expressive of
his astonishment, how it came to pass that the Christians and
Moslemín were so fiercely opposed to one another, although their
creeds, in essential principles, approximated so closely. To this I
replied by saying that I thought the reason was that the great
majority both of Christians and Moslemín paid less regard to the
dogmas of their creeds than to external matters, which have very
little or no reference to religion itself. I also tried to explain to him
that in the time of Mohammed Christianity had entirely lost that
purity which was its original character, and that it had been mixed up
with many idolatrous elements, from which it was not entirely
disengaged till a few centuries ago, while the Mohammedans had
scarcely any acquaintance with Christians except those of the old
sects of the Jacobites and Nestorians. Mutually pleased with our
conversation, we parted from each other with regret.
In the afternoon I was agreeably surprised by the arrival of the
Tinýlkum Ibrahim, for the purpose of supplying his brother’s house
with what was wanted; and being determined to make only one
day’s stay in the town, he had learned with pleasure that we were
about to return by way of Áfasás, the village whither he himself was
going. I myself had cherished this hope, as all the people had
represented that place as one of the largest in the country, and as
pleasantly situated. Hámma had promised to take me this way on
our return to Tin-téllust; but having stayed so much longer in the
town than he had intended, and being afraid of arriving too late for
the salt-caravan of the Kél-owí on their way to Bilma, which he was
to supply with provisions, he changed his plan, and determined to
return by the shortest road. Meanwhile he informed me that the old
chief would certainly not go with us to Zínder till the salt-caravan
had returned from Bilma.
Fortunately, in the course of the 29th a small caravan with corn
arrived from Damerghú, and Hámma completed his purchases. He
had, however, first to settle a disagreeable affair; for our friend
Zúmmuzuk had bought, in Hámma’s name, several things for which
payment was now demanded. Hámma flew into a terrible rage, and
nearly finished the rogue. My Arab and Tawáti friends, who heard
that we were to start the following day, though they were rather
busy buying corn, came to take leave of me, and I was glad to part
from all of them in friendship. But before bidding farewell to this
interesting place, I shall make a few general observations on its
history.
CHAPTER XVIII.
HISTORY OF ÁGADES

Previously to Mr. Cooley’s perspicuous inquiries into the Negroland


of the Arabs, this place was identified with Aúdaghost, merely on
account of a supposed similarity of name. But Ágades, or rather
Égedesh, is itself a pure Berber word, in no way connected with
Aúdaghost. It is of very frequent occurrence, particularly among the
Awelímmiden, and means “family,” and the name was well chosen
for a town consisting of mixed elements. Moreover, while we find
Aúdaghost in the far west in the twelfth century, we have the
distinct statement of Marmol that Ágades was founded a hundred
and sixty years before the time when he wrote (that is to say, in
1460), the truth of which statement, harmonizing as it does with
Leo’s more general account, that it was a modern town, we have no
reason to doubt. Neither of these authors tells us who built it; but as
we know that the great Sónghay conqueror Háj Mohammed Áskiá,
who conquered the town of Ágades in the year of the Hejra 921, or
1515 of our era, expelled from it the five Berber tribes who,
according to the information collected by me during my stay in
Ágades, and which I shall soon lay before my readers, must have
been long resident in the town, it appears highly probable that these
Berbers were its founders. And if this be assumed, there will be no
difficulty in explaining why the language of the natives of the place
at present is a dialect of the Sónghay language, as it is most
probable that this great and enlightened conqueror, after he had
driven out the old inhabitants, established in this important place a
new colony of his own people. In a similar way we find the Sónghay
nation, which seems not to have originally extended to a great
distance eastward of Gágho or Gógo, now extending into the very
heart of Kébbi, although we shall find other people speaking the
same language in the neighbourhood of Ágades, and perhaps may
be able in the course of our researches to trace some connection
between the Sónghay and ancient Egypt.
It is therefore highly probable that those five Berber tribes formed
the settlement in question as an entrepôt for their commerce with
Negroland, though the foundation of such a grand settlement on the
border of the desert presumes that they had at that time a
preponderating influence in all these regions; and the whole affair is
so peculiar that its history could not fail to gratify curiosity if more
could be known of it. From Bello’s account, it would appear that
they, or at least one of these tribes (the Aújila), conquered the
whole of Aïr.
It is certainly remarkable to see people from five places, separated
from each other by immense tracts, and united only by the bond of
commerce and interest, founding a large colony far away from their
homes and on the very border of the desert. For, according to all
that I could learn by the most sedulous inquiries in Ágades, those
tribes belonged to the Gurára of Tawat, to the Tafimáta, to the Beni
Wazít and the Tésko of Ghadámes, to the once powerful and
numerous tribe of the Masráta, and finally to the Aújila; and as the
names of almost all these different tribes, and of their divisions, are
still attached to localities of the town, we can scarcely doubt the
correctness of this information, and must suppose that Sultan Bello
was mistaken in referring the five tribes (settled in Ágades) to Aújila
alone.
Though nothing is related about the manner in which Háj
Mohammed Áskiá took possession of the town, except that it is
stated distinctly that he drove out the five tribes, it seems, from the
traditions current in Ágades, that a considerable number of the
Berbers, with five hundred “jákhfa” (cages mounted on camels, such
as only wealthy people can afford to keep for carrying their wives),
left the town, but were all massacred. But no one who regards with
the least attention the character of the present population of the
town can doubt for a moment that a considerable number of the
Berber population remained behind, and in course of time mixed
with the Sónghay colonists; for, even if we set aside the
consideration of the language (which is greatly intermixed with
Berber words), there is evidently much Berber blood in the
population even at the present day, a fact which is more evident in
the females than in the males.
It is a pity that Leo says nothing about the language spoken in
Ágades; for he lived just at the very period during which the town,
from a Berber settlement, became a Negro town. His expression
certainly implies that he regarded it as a Negro town. But, while
well-informed in general respecting the great conquests of
Mohammed Áskiá (or, as he calls him, Ischia, whom he erroneously
styles King of Timbúktu), he does not once mention his expedition
against Ágades, of which he might have heard as easily as of those
against Kátsena and Kanó, which preceded the former only by two
years. From his account it would seem that the town was then in a
very flourishing state, full of foreign merchants and slaves, and that
the king, though he paid a tribute of one hundred and fifty thousand
ducats to the King of Timbúktu (Gágho), enjoyed a great degree of
independence, at least from that quarter, and had even a military
force of his own. Besides, it is stated expressly that he belonged to
the Berber race. But it would almost seem as if Leo, in this passage,
represented the state of things as it was when he visited the town,
before Áskiá’s time, and not at the date when he wrote, though the
circumstance of the tribute payable to that king may have been
learnt from later information. In general, the great defect in Leo’s
description is that the reader has no exact dates to which to refer
the several statements, and that he cannot be sure how far the
author speaks as an eye-witness, and how far from information.
Of course it is possible that the Berbers found a Sónghay
population, if not in the place itself, which most probably did not
exist before the time of their arrival, yet in the district around it; and
it would seem that there existed in ancient times, in the celebrated
Valley of Ír-n-allem, a small town of which some vestiges are said to
remain at the present day, as well as two or three date-trees, the
solitary remains of a large plantation. From this town, tradition says,
the present inhabitants of Ágades were transplanted. But be this as
it may, it is certain that the same dialect of the Sónghay language
which is spoken in Ágades is also still spoken in a few places in the
neighbourhood, by the tribe of the Íghdalén, or Ighedálen, whose
whole appearance, especially their long hair, shows them to be a
mixed race of Sónghay and Berbers, and there is some reason to
suppose that they belonged originally to the Zenága or Senhája.
These people live in and around Íngal, a small town four days’
journey from Ágades, on the road to Sókoto, and in and around
Tegídda, a place three days’ journey from Íngal, and about five from
Ágades west-south-west. This latter place is of considerable interest,
being evidently identical with the town of the same name mentioned
by Ebn Khaldún and by Ebn Batúta as a wealthy place, lying
eastward from Gógo, on the road to Egypt, and in intimate
connection and friendly intercourse with the Mzáb and Wárgela. It
was governed by a Berber chief, with the title of Sultan. This place,
too, was for some time subject to Gógo, or rather to the empire of
Méle or Málli, which then comprised Sónghay, in the latter part of
the fourteenth century; and the circumstance that here too the
Sónghay language is still spoken may be best explained by referring
it to colonization, since it is evident that Áskiá, when he took
possession of Ágades, must have occupied Tegídda also, which lay
on the road from Gógo to that place. However, I will not indulge in
conjectures, and will merely enter into historical questions so far as
they contribute to furnish a vivid and coherent picture of the tribes
and countries with which my journey brought me into contact. I will
therefore only add that this place, Tegídda or Tekádda, was famous,
in the time of Ebn Batúta, for its copper mines, the ores of which
were exported as far as Bórnu and Góber, while at present nothing is
known of the existence of copper hereabouts; but a very good
species of salt of red colour (já-n-gísherí), which is far superior to
that of Bilma, is obtained here, as well as in Íngal. But I recommend
this point to the inquiry of future travellers. I have mentioned above
the presence of loadstone on the border of Aír.
Having thus attempted to elucidate and illustrate the remarkable
fact that the language of Ágades is derived from and akin to the
Sónghay—a fact which of course appeared to me more surprising
before I discovered, in the course of 1853, that this language
extends eastward far beyond the so-called Niger—I return once
more to the settlement of the Berbers in Ágades. It is evident that
this settlement, if it was of the nature described above, was made
for the purpose of serving as a great commercial entrepôt for the
commerce with another country; and if we duly consider the
statements made by el Bekri, Ebn Batúta, Leo, Ca da Mosto, and by
the author of the “History of Sónghay,” with regard to the
importance of the market of Gógo, and if we pay due attention to
that circuitous route which led from Gógo by way of Tegídda, not
only to Egypt, but even to Tawát, there cannot be the least doubt
that Ágades was founded by those Berber tribes with the distinct
purpose that it might serve them as a secure abode and fortified
magazine in their commercial intercourse with that splendid capital
of the Sónghay empire, the principal article of which was gold, which
formed also the chief article in the former commerce of Ágades. For
Ágades had its own standard weight of this precious metal, the
mithkál, which even at the present day regulates the circulating
medium. And this mithkál of Ágades is totally different from the
standard of the same name which is in use in Timbúktu, the latter
being, in regard to the value of the Spanish dollar, as 1⅓ to 1, and
the former only as ⅖ to 1. But for wholesale business a greater
weight was in use, called “kárruwe,” the smaller kárruwe containing
thirty-three mithákel, or mithkáls, and a third, equal to two rottls and
a sixth, while the larger kárruwe contained a hundred mithkáls, and
was equal to six rottls and a half.
The importance of the trade of Ágades, and the wealth of the
place in general, appear very clearly from the large tribute, of a
hundred and fifty thousand ducats, which the King of Ágades was
able to pay to that of Sónghay, especially if we bear in mind that
Leo, in order to give an idea of the great expense which this same
King of Sónghay had incurred on his pilgrimage to Mekka, states in
another passage that having spent all he took with him, he
contracted a debt amounting to that very sum. As for the King of
Ágades, his situation was at that time just what it is now; and we
cannot better describe his precarious position, entirely dependent on
the caprice and intrigues of the influential chiefs of the Tuarek, than
by using the very words of Leo, “Alle volte scacciano il re e pongono
qualche suo parente in luogo di lui, nè usano ammazzar alcuno; e
quel che più contenta gli abitatori del diserto è fatto re in Agadez.”
Unfortunately, we are not able to fix a date for that very peculiar
covenant between the different tribes with regard to the installation
of the Sultan of Ágades, and the establishing of the principle that he
must belong to a certain family, which is regarded as of sheríf
nobility, and lives not in Ágades, nor even in the country of Aír, but
in a town of Góber. I was once inclined to think that this was an
arrangement made in consequence of the power and influence
which the Emír of Sókoto had arrogated to himself; but I have now
reason to doubt this, for even the grandfather of ʿAbd el Káder was
Sultan. Certainly even now, when the power of the Fulfúlde or
Féllani empire is fast crumbling to pieces, the Emír of Sókoto has a
certain influence upon the choice of the Sultan of Ágades. Of this
fact I myself became witness during my stay in Sókoto in April,
1853, when Hámed eʾ Rufäy was once more sent out to succeed
ʿAbd el Káder. Indeed, Ittegáma, ʿAbd el Káder’s brother, who
thought that I enjoyed the favour and confidence of the Emír, called
upon me (as I shall relate in due time) expressly to entreat me most
urgently to exert my influence in order to restore my former host to
his authority.
I have described already in what way the union of the tribes of the
Itísan, the Kél-gerés, and the Kél-owí is expressed in installing the
Sultan; but though without the presence and assent of the former
the new prince could never arrive at his place of residence, the final
decision seems to rest with the chief Ánnur, the inhabitants of the
town having no voice in the matter. The Sultan is rather a chief of
the Tuarek tribes residing in Ágades than the ruler of Ágades. How
difficult and precarious his position must be may be easily conceived
if it be considered that these tribes are generally at war with one
another; the father of Hámed eʾ Rufäy was even killed by the Kél-
gerés. Nevertheless, if he be an intelligent and energetic man, his
influence in the midst of this wild conflict and struggle of clashing
interests and inclinations must be very beneficial.
What the revenue of the Sultan may at present amount to it is
difficult to say. His means and income consist chiefly in the presents
which he receives on his accession to authority, in a contribution of
one bullock’s hide or kulábu (being about the value of half a Spanish
dollar) from each family, in a more considerable but rather uncertain
tribute levied upon the Imghád, in the tax of ten mithkáls or four
Spanish dollars which he levies on each camel-load of foreign
merchandise which enters the town of Ágades (articles of food being
exempt from charge), in a small tribute derived from the salt
brought from Bilma, and in the fines levied on lawless people and
marauders, and often on whole tribes. Thus it is very probable that
the expedition which ʿAbd el Káder undertook immediately after his
accession, against the tribes who had plundered us, enriched him
considerably. As for the inhabitants of Ágades themselves, I was
assured that they do not pay him any tribute at all, but are only
obliged to accompany him on his expeditions. Of course in earlier
times, when the commerce of the town was far greater than at
present, and when the Imghád (who had to provide him with cattle,
corn, fruit, and vegetables) were strictly obedient, his income far
exceeded that of the present day. When taken altogether it is
certainly considerably under twenty thousand dollars. His title is
Amanókal, or Amanókal Imakóren, in Temáshight, Kókoy bére in the
Emgédesi, and Babá-n-Serkí in the Háusa language.
The person second in authority in the town, and in certain
respects the Vizier, is now, and apparently was also in ancient times,
the “kókoy gerégeré” (i.e. master of the courtyard or the interior of
the palace). This is his real indigenous character, while the
foreigners, who regarded him only in his relation to themselves,
called him Sheikh el ʿArab, or, in the Háusa language, Serkí-n-
turáwa (the Chief of the Whites), and this is the title by which he is
generally known. For it was he who had to levy the tax on the
merchandise imported into the town, an office which in former
times, when a considerable trade was carried on, was of great
importance. But the chief duty of the “serkí-n-turáwa,” at the present
time, is to accompany annually the salt-caravan of the Kél-gerés,
which supplies the western part of Middle Sudán with the salt of
Bilma, from Ágades to Sókoto, and to protect it on the road as well
as to secure it against exorbitant exactions on the part of the Fúlbe
of Sókoto. For this trouble he receives one “kántu,” that is to say the
eighth part (eight kántu weighing three Turkish kantars or quintals)
of a middle-sized camel-load, a contribution which forms a
considerable income in this country, probably of from eight to ten
thousand Spanish dollars, the caravan consisting generally of some
thousand camels, not all equally laden, and the kántu of salt fetching
in Sudán from five thousand to seven and eight thousand kurdí or
shells, which are worth from two to three dollars. Under such
circumstances those officers, who at the same time trade on their
own account, cannot but amass considerable wealth. Mohammed
Bóro as well as Áshu are very rich, considering the circumstances of
the country.
After having escorted the salt-caravan to Sókoto, and settled the
business with the Emír of this place, the serkí-n-turáwa in former
times had to go to Kanó, where he received a small portion of the
six hundred kurdí, the duty levied on each slave brought to the
slave-market, after which he returned to Ágades with the Kél-gerés
that had frequented the market of Kanó. I had full opportunity, in
the further course of my journey, to convince myself that such is not
now the case; but I cannot say what is the reason of this custom
having been discontinued, though it may be the dangerous state of
the road between Sókoto and Kanó. Mohammed Bóro, the former
serkí-n-turáwa, has still residences as well in Kanó and Zínder as in
Sókoto and Ágades. From what I have said it is clear that at present
the serkí-n-turáwa has much more to do with the Tuarek and Fúlbe
than with the Arabs, and at the same time is a sort of mediator
between Agades and Sókoto. Of the other persons in connection
with the Sultan, the “kókoy kaina” or “bába-n-serkí” (the chief
eunuch), at present Ámagay, the fádawa-n-serkí (the aides-de-camp
of the Sultan), as well as the kádhi or alkáli, and the war-chief Sídi
Ghalli, I have spoken in the diary of my residence in the place.
I have already stated above that the southern part of the town,
which at present is almost entirely deserted, formed the oldest
quarter, while katánga, or “báki-n-bírni,” seems to have been its
northern limit. Within these limits the town was about two miles in
circuit, and when thickly peopled may have contained about thirty
thousand inhabitants; but after the northern quarter was added the
whole town had a circuit of about three miles and a half, and may
easily have mustered as many as fifty thousand inhabitants, or even
more. The highest degree of power seems to have been attained
before the conquest of the town by Mohammed Áskiá in the year
1515, though it is said to have been a considerable and wealthy
place till about sixty years ago (reckoned from 1850), when the
greatest part of the inhabitants emigrated to the neighbouring towns
of Háusa, chiefly Kátsena, Tasáwa, Marádi, and Kanó. The exact
circumstances which brought about this deplorable desertion and
desolation of the place I was not able to learn; and the date of the
event cannot be made to coincide with the period of the great
revolution effected in Middle Sudán by the rising of the Jihádi, “the
Reformer,” ʿOthmán da-n-Fódiye, which it preceded by more than
fifteen years; but it coincides with or closely follows upon an event
which I shall have to dwell upon in the further course of my
proceedings. This is the conquest of Gáo, or Gógo (the former
capital of the Sónghay empire, and which since 1591 had become a
province of the empire of Morocco), by the Tuarek. As we have seen
above that Ágades had evidently been founded as an entrepôt for
the great trade with this most flourishing commercial place on the
Ísa, or Niger, at that time the centre of the gold trade, of course the
ransacking and wholesale destruction of this town could not but
affect in the most serious manner the wellbeing of Agades, cutting
away the very roots through which it received life.
1, House where I lodged; 2, Great Mosque, or Mesállaje; 3, Palace, or Fáda; 4,
Káswa-n-delélti, or Tama-n-lókoy; 5, Káswa-n-rákoma; 6, Katánga; 7, Erárar-n-
zákan; 8, Mohammed Bóro’s house; 9, House of the Kádhi; 10, Well Shedwánka;
11, Pools of Stagnant Water; 12, Kófa-n-Alkáli; 13, Masráta Hogúme; 14, Suburb
of Ben Gottára.

At present I still think that I was not far wrong in estimating the
number of the inhabited houses at from six hundred to seven
hundred, and the population at about seven thousand, though it
must be borne in mind that, as the inhabitants have still preserved
their trading character, a great many of the male inhabitants are
always absent from home, a circumstance which reduces the armed
force of the place to about six hundred. A numerical element,
capable of controlling the estimated amount of the population, is
offered by the number of from two hundred and fifty to three
hundred well-bred boys, who at the time of my visit were learning a
little reading and writing, in five or six schools scattered over the
town; for it is not every boy who is sent to school, but only those
belonging to families in easy circumstances, and they are all about
the same age, from eight to ten years old.
With regard to the names of the quarters of the town, which are
interesting from an historical point of view, I was not able to learn
exactly the application of each of the names; and I am sure very few
even of the inhabitants themselves can now tell the limits of the
quarters, on account of the desolate state of many of them. The
principal names which can be laid down with certainty in the plan
are Masráta, Gobetáren, Gáwa-Ngírsu, Dígi or Dégi, Katánga,
Terjemán, and Arrafía, which comprise the south-western quarter of
the town. The names of the other quarters, which I attempted to lay
down on the plan sent to Government together with my report, I
now deem it prudent to withdraw, as I afterwards found that there
was some uncertainty about them. I therefore collect here, for the
information of future travellers, the names of the other quarters of
the place, besides those mentioned above and marked in the plan—
Lárelóg, Churúd, Hásena, Amaréwuël, Imurdán (which name, I was
assured afterwards, has nothing in common with the name of the
tribe of the Imghád), Tafimáta (the quarter where the tribe of the
same name lived), Yobímme (“yobu-mé” meaning the mouth of the
market), Dégi-n-béne, or the Upper Dégi, and Bosenrára. Kachíyu
(not Kachín) seems to have been originally the name of a pool, as I
was assured that, besides the three ponds still visible, there were
formerly seven others, namely Kudúru, Kachíyu, Chikinéwan,
Lángusúgázará, Kurungúsu, and Rabafáda, this latter in the square
of the palace.
The whole ground upon which the town is built (being the edge of
a tableland which coincides with the transition from granite to
sandstone) seems to be greatly impregnated with salt at a certain
depth, of which not only the ponds, but even the wells bear
evidence, two of the three wells still in use having saltish water, and
only that of Shedwánka being, as to taste, free from salt, though it
is still regarded as unwholesome, and all the water used for drinking
is brought from the wells outside the walls. Formerly, it is said, there
were nine wells inside the town.
From what I have said above, it may be concluded that the
commerce of Ágades is now inconsiderable. Its characteristic feature
is that no kind of money whatever is current in the market—neither
gold, nor silver, nor kurdí, nor shells; while strips of cotton, or
gábagá (the Kanúri, and not the Háusa term being employed in this
case, because the small quantity of this stuff which is current is
imported from the north-western province of Bórnu), are very rare,
and indeed form almost as merely nominal a standard as the
mithkál. Nevertheless the value of the mithkál is divided into ten
rijáls, or érjel, which measure means eight drʿa, or cubits, of
gábagá. The real standard of the market, I must repeat, is millet or
dukhn (“géro” in Háusa, “éneli” in Temáshight, Pennisetum
typhoïdeum), durra, or Holcus sorghum, being scarcely ever brought
to market. And it is very remarkable, that with this article a man may
buy everything at a much cheaper rate than with merchandise,
which in general fetches a low price in the place; at least it did so
during my stay, when the market had been well stocked with
everything in demand, by the people who had come along with us.
English calico of very good quality was sold by me at 20 per cent.
less than it had been bought for at Múrzuk. Senna in former times
formed an article of export of some importance; but the price which
it fetches on the coast has so decreased that it scarcely pays the
carriage, the distance from the coast being so very great; and it
scarcely formed at all an article in request here, nor did we meet on
our whole journey a single camel laden with it, though it grows in
considerable quantities in the valleys hereabouts.
Ágades is in no respect a place of resort for wealthy merchants,
not even Arabs, while with regard to Europe its importance at
present consists in its lying on the most direct road to Sókoto and
that part of Sudán. In my opinion it would form for a European
agent a very good and comparatively healthy place from which to
open relations with Central Africa. The native merchants seem only
to visit the markets of Kátsena, Tasáwa, Marádi, Kanó, and Sókoto,
and, as far as I was able to learn, never go to the northern markets
of Ghát or Múrzuk, unless on a journey to Mekka, which several of
them have made. Neither does there seem to exist any intercourse
at present with Gágho, or Gógo, or with Timbúktu; but the Arabs of
Azawád and those parts, when undertaking a pilgrimage, generally
go by way of Ágades.
I must here add, that I did not observe that the people of Ágades
use manna in their food, nor that it is collected in the neighbourhood
of the town; but I did not inquire about it on the spot, not having
taken notice of the passage of Leo relating to it.
My stay in Ágades was too short to justify my entering into detail
about the private life of the people, but all that I saw convinced me
that, although open to most serious censure on the part of the
moralist, it presented many striking features of cheerfulness and
happiness, and nothing like the misery which is often met with in
towns which have declined from their former glory. It still contains
many active germs of national life, which are most gratifying to the
philosophic traveller. The situation, on an elevated plateau, cannot
but be healthy, as the few waterpools, of small dimensions, are
incapable of infecting the air. The disease which I have mentioned in
my diary as prevalent at the time of my sojourn was epidemic.
Besides, it must be borne in mind that the end of the rainy season
everywhere in the tropical regions is the most unhealthy period of
the year.

You might also like