0% found this document useful (1 vote)
356 views

Mark Stamp - Introduction To Machine Learning With Applications in Information Security - Previewpdf

This document provides an introduction and overview of a book about machine learning and its applications in information security. It includes the table of contents which lists the chapters and their topics, such as introductions to hidden Markov models, profile hidden Markov models, principal component analysis, and support vector machines. The series editors and published titles in the series are also noted. The document establishes the scope and aims of the book and machine learning series.

Uploaded by

Shiva N
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
356 views

Mark Stamp - Introduction To Machine Learning With Applications in Information Security - Previewpdf

This document provides an introduction and overview of a book about machine learning and its applications in information security. It includes the table of contents which lists the chapters and their topics, such as introductions to hidden Markov models, profile hidden Markov models, principal component analysis, and support vector machines. The series editors and published titles in the series are also noted. The document establishes the scope and aims of the book and machine learning series.

Uploaded by

Shiva N
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

INTRODUCTION TO

MACHINE
LEARNING with
APPLICATIONS
in INFORMATION
SECURITY
Chapman & Hall/CRC
Machine Learning & Pattern Recognition Series

SERIES EDITORS

Ralf Herbrich Thore Graepel


Amazon Development Center Microsoft Research Ltd.
Berlin, Germany Cambridge, UK

AIMS AND SCOPE

This series reflects the latest advances and applications in machine learning and pattern rec-
ognition through the publication of a broad range of reference works, textbooks, and hand-
books. The inclusion of concrete examples, applications, and methods is highly encouraged.
The scope of the series includes, but is not limited to, titles in the areas of machine learning,
pattern recognition, computational intelligence, robotics, computational/statistical learning
theory, natural language processing, computer vision, game AI, game theory, neural networks,
computational neuroscience, and other relevant topics, such as machine learning applied to
bioinformatics or cognitive science, which might be proposed by potential contributors.

PUBLISHED TITLES

BAYESIAN PROGRAMMING
Pierre Bessière, Emmanuel Mazer, Juan-Manuel Ahuactzin, and Kamel Mekhnacha
UTILITY-BASED LEARNING FROM DATA
Craig Friedman and Sven Sandow
HANDBOOK OF NATURAL LANGUAGE PROCESSING, SECOND EDITION
Nitin Indurkhya and Fred J. Damerau
COST-SENSITIVE MACHINE LEARNING
Balaji Krishnapuram, Shipeng Yu, and Bharat Rao
COMPUTATIONAL TRUST MODELS AND MACHINE LEARNING
Xin Liu, Anwitaman Datta, and Ee-Peng Lim
MULTILINEAR SUBSPACE LEARNING: DIMENSIONALITY REDUCTION OF
MULTIDIMENSIONAL DATA
Haiping Lu, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos
MACHINE LEARNING: An Algorithmic Perspective, Second Edition
Stephen Marsland
SPARSE MODELING: THEORY, ALGORITHMS, AND APPLICATIONS
Irina Rish and Genady Ya. Grabarnik
A FIRST COURSE IN MACHINE LEARNING, SECOND EDITION
Simon Rogers and Mark Girolami
INTRODUCTION TO MACHINE LEARNING WITH APPLICATIONS IN
INFORMATION SECURITY
Mark Stamp
Chapman & Hall/CRC
Machine Learning & Pattern Recognition Series

INTRODUCTION TO

MACHINE
LEARNING with
APPLICATIONS
in INFORMATION
SECURITY

Mark Stamp
San Jose State University
California
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2018 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-138-62678-2 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize
to copyright holders if permission to publish in this form has not been obtained. If any copyright material
has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information storage
or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization
that provides licenses and registration for a variety of users. For organizations that have been granted a
photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
To Melody, Austin, and Miles.
Contents

Preface xiii

About the Author xv

Acknowledgments xvii

1 Introduction 1
1.1 What Is Machine Learning? . . . . . . . . . . . . . . . . . . . 1
1.2 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Necessary Background . . . . . . . . . . . . . . . . . . . . . . 4
1.4 A Few Too Many Notes . . . . . . . . . . . . . . . . . . . . . 4

I Tools of the Trade 5

2 A Revealing Introduction to Hidden Markov Models 7


2.1 Introduction and Background . . . . . . . . . . . . . . . . . . 7
2.2 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 The Three Problems . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.1 HMM Problem 1 . . . . . . . . . . . . . . . . . . . . . 14
2.4.2 HMM Problem 2 . . . . . . . . . . . . . . . . . . . . . 14
2.4.3 HMM Problem 3 . . . . . . . . . . . . . . . . . . . . . 14
2.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 The Three Solutions . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.1 Solution to HMM Problem 1 . . . . . . . . . . . . . . 15
2.5.2 Solution to HMM Problem 2 . . . . . . . . . . . . . . 16
2.5.3 Solution to HMM Problem 3 . . . . . . . . . . . . . . 17
2.6 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . 20
2.7 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.8 All Together Now . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.9 The Bottom Line . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

vii
viii CONTENTS

3 A Full Frontal View of Profile Hidden Markov Models 37


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Overview and Notation . . . . . . . . . . . . . . . . . . . . . 39
3.3 Pairwise Alignment . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Multiple Sequence Alignment . . . . . . . . . . . . . . . . . . 46
3.5 PHMM from MSA . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6 Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.7 The Bottom Line . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4 Principal Components of Principal Component Analysis 63


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.1 A Brief Review of Linear Algebra . . . . . . . . . . . . 64
4.2.2 Geometric View of Eigenvectors . . . . . . . . . . . . 68
4.2.3 Covariance Matrix . . . . . . . . . . . . . . . . . . . . 70
4.3 Principal Component Analysis . . . . . . . . . . . . . . . . . 73
4.4 SVD Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.5 All Together Now . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.5.1 Training Phase . . . . . . . . . . . . . . . . . . . . . . 80
4.5.2 Scoring Phase . . . . . . . . . . . . . . . . . . . . . . . 82
4.6 A Numerical Example . . . . . . . . . . . . . . . . . . . . . . 83
4.7 The Bottom Line . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5 A Reassuring Introduction to Support Vector Machines 95


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 Constrained Optimization . . . . . . . . . . . . . . . . . . . . 102
5.2.1 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . 104
5.2.2 Lagrangian Duality . . . . . . . . . . . . . . . . . . . . 108
5.3 A Closer Look at SVM . . . . . . . . . . . . . . . . . . . . . . 110
5.3.1 Training and Scoring . . . . . . . . . . . . . . . . . . . 112
5.3.2 Scoring Revisited . . . . . . . . . . . . . . . . . . . . . 114
5.3.3 Support Vectors . . . . . . . . . . . . . . . . . . . . . 115
5.3.4 Training and Scoring Re-revisited . . . . . . . . . . . . 116
5.3.5 The Kernel Trick . . . . . . . . . . . . . . . . . . . . . 117
5.4 All Together Now . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.5 A Note on Quadratic Programming . . . . . . . . . . . . . . . 121
5.6 The Bottom Line . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
CONTENTS ix

6 A Comprehensible Collection of Clustering Concepts 133


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.2 Overview and Background . . . . . . . . . . . . . . . . . . . . 133
6.3 �-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.4 Measuring Cluster Quality . . . . . . . . . . . . . . . . . . . . 141
6.4.1 Internal Validation . . . . . . . . . . . . . . . . . . . . 143
6.4.2 External Validation . . . . . . . . . . . . . . . . . . . 148
6.4.3 Visualizing Clusters . . . . . . . . . . . . . . . . . . . 150
6.5 EM Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.5.1 Maximum Likelihood Estimator . . . . . . . . . . . . 154
6.5.2 An Easy EM Example . . . . . . . . . . . . . . . . . . 155
6.5.3 EM Algorithm . . . . . . . . . . . . . . . . . . . . . . 159
6.5.4 Gaussian Mixture Example . . . . . . . . . . . . . . . 163
6.6 The Bottom Line . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

7 Many Mini Topics 177


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.2 �-Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . 177
7.3 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.4 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
7.4.1 Football Analogy . . . . . . . . . . . . . . . . . . . . . 182
7.4.2 AdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.5 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . 186
7.6 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . 192
7.7 Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . 202
7.8 Naı̈ve Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
7.9 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . 205
7.10 Conditional Random Fields . . . . . . . . . . . . . . . . . . . 208
7.10.1 Linear Chain CRF . . . . . . . . . . . . . . . . . . . . 209
7.10.2 Generative vs Discriminative Models . . . . . . . . . . 210
7.10.3 The Bottom Line on CRFs . . . . . . . . . . . . . . . 213
7.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

8 Data Analysis 219


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
8.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . 220
8.3 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
8.4 ROC Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8.5 Imbalance Problem . . . . . . . . . . . . . . . . . . . . . . . . 228
8.6 PR Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8.7 The Bottom Line . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
x CONTENTS

II Applications 235

9 HMM Applications 237


9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9.2 English Text Analysis . . . . . . . . . . . . . . . . . . . . . . 237
9.3 Detecting Undetectable Malware . . . . . . . . . . . . . . . . 240
9.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 240
9.3.2 Signature-Proof Metamorphic Generator . . . . . . . . 242
9.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 243
9.4 Classic Cryptanalysis . . . . . . . . . . . . . . . . . . . . . . . 245
9.4.1 Jakobsen’s Algorithm . . . . . . . . . . . . . . . . . . 245
9.4.2 HMM with Random Restarts . . . . . . . . . . . . . . 251

10 PHMM Applications 261


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
10.2 Masquerade Detection . . . . . . . . . . . . . . . . . . . . . . 261
10.2.1 Experiments with Schonlau Dataset . . . . . . . . . . 262
10.2.2 Simulated Data with Positional Information . . . . . . 265
10.3 Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . 269
10.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 270
10.3.2 Datasets and Results . . . . . . . . . . . . . . . . . . . 271

11 PCA Applications 277


11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
11.2 Eigenfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
11.3 Eigenviruses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
11.3.1 Malware Detection Results . . . . . . . . . . . . . . . 280
11.3.2 Compiler Experiments . . . . . . . . . . . . . . . . . . 282
11.4 Eigenspam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
11.4.1 PCA for Image Spam Detection . . . . . . . . . . . . . 285
11.4.2 Detection Results . . . . . . . . . . . . . . . . . . . . . 285

12 SVM Applications 289


12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
12.2 Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . 289
12.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 290
12.2.2 Experimental Results . . . . . . . . . . . . . . . . . . 293
12.3 Image Spam Revisited . . . . . . . . . . . . . . . . . . . . . . 296
12.3.1 SVM for Image Spam Detection . . . . . . . . . . . . 298
12.3.2 SVM Experiments . . . . . . . . . . . . . . . . . . . . 300
12.3.3 Improved Dataset . . . . . . . . . . . . . . . . . . . . 304
CONTENTS xi

13 Clustering Applications 307


13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
13.2 �-Means for Malware Classification . . . . . . . . . . . . . . 307
13.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 308
13.2.2 Experiments and Results . . . . . . . . . . . . . . . . 309
13.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 313
13.3 EM vs �-Means for Malware Analysis . . . . . . . . . . . . . 314
13.3.1 Experiments and Results . . . . . . . . . . . . . . . . 314
13.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 317

Annotated Bibliography 319

Index 338
Preface

“Perhaps it hasn’t one,” Alice ventured to remark.


“Tut, tut, child!” said the Duchess.
“Everything’s got a moral, if only you can find it.”
— Lewis Carroll, Alice in Wonderland

For the past several years, I’ve been teaching a class on “Topics in Information
Security.” Each time I taught this course, I’d sneak in a few more machine
learning topics. For the past couple of years, the class has been turned on
its head, with machine learning being the focus, and information security
only making its appearance in the applications. Unable to find a suitable
textbook, I wrote a manuscript, which slowly evolved into this book.
In my machine learning class, we spend about two weeks on each of the
major topics in this book (HMM, PHMM, PCA, SVM, and clustering). For
each of these topics, about one week is devoted to the technical details in
Part I, and another lecture or two is spent on the corresponding applica-
tions in Part II. The material in Part I is not easy—by including relevant
applications, the material is reinforced, and the pace is more reasonable.
I also spend a week covering the data analysis topics in Chapter 8 and
several of the mini topics in Chapter 7 are covered, based on time constraints
and student interest.1
Machine learning is an ideal subject for substantive projects. In topics
classes, I always require projects, which are usually completed by pairs of stu-
dents, although individual projects are allowed. At least one week is allocated
to student presentations of their project results.
A suggested syllabus is given in Table 1. This syllabus should leave time
for tests, project presentations, and selected special topics. Note that the
applications material in Part II is intermixed with the material in Part I.
Also note that the data analysis chapter is covered early, since it’s relevant
to all of the applications in Part II.
1
Who am I kidding? Topics are selected based on my interests, not student interest.

xiii
xiv PREFACE

Table 1: Suggested syllabus

Chapter Hours Coverage


1. Introduction 1 All
2. Hidden Markov Models 3 All
9. HMM Applications 2 All
8. Data Analysis 3 All
3. Profile Hidden Markov Models 3 All
10. PHMM Applications 2 All
4. Principal Component Analysis 3 All
11. PCA Applications 2 All
5. Support Vector Machines 3 All
12. SVM Applications 3 All
6. Clustering 3 All
13. Clustering Applications 2 All
7. Mini-topics 6 LDA and selected topics
Total 36

My machine learning class is taught at the beginning graduate level. For


an undergraduate class, it might be advisable to slow the pace slightly. Re-
gardless of the level, labs would likely be helpful. However, it’s important to
treat labs as supplemental to—as opposed to a substitute for—lectures.
Learning challenging technical material requires studying it multiple times
in multiple different ways, and I’d say that the magic number is three. It’s no
accident that students who read the book, attend the lectures, and conscien-
tiously work on homework problems learn this material well. If you are trying
to learn this subject on your own, the author has posted his lecture videos
online, and these might serve as a (very poor) substitute for live lectures.2
I’m also a big believer in learning by programming—the more code that you
write, the better you will learn machine learning.

Mark Stamp
Los Gatos, California
April, 2017

2
In my experience, in-person lectures are infinitely more valuable than any recorded or
online format. Something happens in live classes that will never be fully duplicated in any
dead (or even semi-dead) format.
About the Author

My work experience includes more than seven years at the National Security
Agency (NSA), which was followed by two years at a small Silicon Valley
startup company. Since 2002, I have been a card-carrying member of the
Computer Science faculty at San Jose State University (SJSU).
My love affair with machine learning began during the early 1990s, when
I was working at the NSA. In my current job at SJSU, I’ve supervised vast
numbers of master’s student projects, most of which involve some combination
of information security and machine learning. In recent years, students have
become even more eager to work on machine learning projects, which I would
like to ascribe to the quality of the book that you have before you and my
magnetic personality, but instead, it’s almost certainly a reflection of trends
in the job market.
I do have a life outside of work.3 Recently, kayak fishing and sailing my
Hobie kayak in the Monterey Bay have occupied most of my free time. I also
ride my mountain bike through the local hills and forests whenever possible.
In case you are a masochist, a more complete autobiography can be found at

http://www.sjsu.edu/people/mark.stamp/

If you have any comments or questions about this book (or anything else)
you can contact me via email at [email protected]. And if you happen
to be local, don’t hesitate to stop by my office to chat.

3
Of course, here I am assuming that what I do for a living could reasonably be classified
as work. My wife (among others) has been known to dispute that assumption.

xv
Acknowledgments

The first draft of this book was written while I was on sabbatical during the
spring 2014 semester. I first taught most of this material in the fall semester
of 2014, then again in fall 2015, and yet again in fall 2016. After the third
iteration, I was finally satisfied that the manuscript had the potential to be
book-worthy.
All of the students in these three classes deserve credit for helping to
improve the book to the point where it can now be displayed in public without
excessive fear of ridicule. Here, I’d like to single out the following students
for their contributions to the applications in Part II.

Topic Students
HMM Sujan Venkatachalam, Rohit Vobbilisetty
PHMM Lin Huang, Swapna Vemparala
PCA Ranjith Jidigam, Sayali Deshpande, Annapurna Annadatha
SVM Tanuvir Singh, Annapurna Annadatha
Clustering Chinmayee Annachhatre, Swathi Pai, Usha Narra

Extra special thanks go to Annapurna Annadatha and Fabio Di Troia.


In addition to her major contributions to two of the applications chapters,
Annapurna helped to improve the end-of-chapter exercises. Fabio assisted
with most of my recent students’ projects and he is a co-author on almost
all of my recent papers. I also want to thank Eric Filiol, who suggested
broadening the range of applications. This was excellent advice that greatly
improved the book.
Finally, I want to thank Randi Cohen and Veronica Rodriguez at the
Taylor & Francis Group. Without their help, encouragement, and patience,
this book would never have been published.
A textbook is like a large software project, in that it must contain bugs.
All errors in this book are solely the responsibility of your humble scribe.
Please send me any errors that you find, and I will keep an updated errata
list on the textbook website.

xvii
Chapter 1

Introduction

I took a speed reading course and read War and Peace in twenty minutes.
It involves Russia.
— Woody Allen

1.1 What Is Machine Learning?


For our purposes, we’ll view machine learning as a form of statistical discrim-
ination, where the “machine” does the heavy lifting. That is, the computer
“learns” important information, saving us humans from the hard work of
trying to extract useful information from seemingly inscrutable data.
For the applications considered in this book, we typically train a model,
then use the resulting model to score samples. If the score is sufficiently high,
we classify the sample as being of the same type as was used to train the
model. And thanks to the miracle of machine learning, we don’t have to
work too hard to perform such classification. Since the model parameters are
(more-or-less) automatically extracted from training data, machine learning
algorithms are sometimes said to be data driven.
Machine learning techniques can be successfully applied to a wide range
of important problems, including speech recognition, natural language pro-
cessing, bioinformatics, stock market analysis, information security, and the
homework problems in this book. Additional useful applications of machine
learning seem to be found on a daily basis—the set of potential applications
is virtually unlimited.
It’s possible to treat any machine learning algorithm as a black box and, in
fact, this is a major selling points of the field. Many successful machine learn-
ers simply feed data into their favorite machine learning black box, which,
surprisingly often, spits out useful results. While such an approach can work,

1
2 INTRODUCTION

the primary goal of this book is to provide the reader with a deeper un-
derstanding of what is actually happening inside those mysterious machine
learning black boxes.
Why should anyone care about the inner workings of machine learning al-
gorithms when a simple black box approach can—and often does—suffice? If
you are like your curious author, you hate black boxes, and you want to know
how and why things work as they do. But there are also practical reasons
for exploring the inner sanctum of machine learning. As with any technical
field, the cookbook approach to machine learning is inherently limited. When
applying machine learning to new and novel problems, it is often essential to
have an understanding of what is actually happening “under the covers.” In
addition to being the most interesting cases, such applications are also likely
to be the most lucrative.
By way of analogy, consider a medical doctor (MD) in comparison to a
nurse practitioner (NP).1 It is often claimed that an NP can do about 80%
to 90% of the work that an MD typically does. And the NP requires less
training, so when possible, it is cheaper to have NPs treat people. But, for
challenging or unusual or non-standard cases, the higher level of training of
an MD may be essential. So, the MD deals with the most challenging and
interesting cases, and earns significantly more for doing so. The aim of this
book is to enable the reader to earn the equivalent of an MD in machine
learning.
The bottom line is that the reader who masters the material in this book
will be well positioned to apply machine learning techniques to challenging
and cutting-edge applications. Most such applications would likely be beyond
the reach of anyone with a mere black box level of understanding.

1.2 About This Book


The focus of this book is on providing a reasonable level of detail for a reason-
ably wide variety of machine learning algorithms, while constantly reinforcing
the material with realistic applications. But, what constitutes a reasonable
level of detail? I’m glad you asked.
While the goal here is for the reader to obtain a deep understanding of
the inner workings of the algorithms, there are limits.2 This is not a math
book, so we don’t prove theorems or otherwise dwell on mathematical theory.
Although much of the underlying math is elegant and interesting, we don’t
spend any more time on the math than is absolutely necessary. And, we’ll
1
A physician assistant (PA) is another medical professional that is roughly comparable
to a nurse practitioner.
2
However, these limits are definitely not of the kind that one typically finds in a calculus
book.
1.2 ABOUT THIS BOOK 3

sometimes skip a few details, and on occasion, we might even be a little bit
sloppy with respect to mathematical niceties. The goal here is to present
topics at a fairly intuitive level, with (hopefully) just enough detail to clarify
the underlying concepts, but not so much detail as to become overwhelming
and bog down the presentation.3
In this book, the following machine learning topics are covered in chapter-
length detail.

Topic Where
Hidden Markov Models (HMM) Chapter 2
Profile Hidden Markov Models (PHMM) Chapter 3
Principal Component Analysis (PCA) Chapter 4
Support Vector Machines (SVM) Chapter 5
Clustering (�-Means and EM) Chapter 6

Several additional topics are discussed in a more abbreviated (section-length)


format. These mini-topics include the following.

Topic Where
�-Nearest Neighbors (�-NN) Section 7.2
Neural Networks Section 7.3
Boosting and AdaBoost Section 7.4
Random Forest Section 7.5
Linear Discriminant Analysis (LDA) Section 7.6
Vector Quantization (VQ) Section 7.7
Naı̈ve Bayes Section 7.8
Regression Analysis Section 7.9
Conditional Random Fields (CRF) Section 7.10

Data analysis is critically important when evaluating machine learning ap-


plications, yet this topic is often relegated to an afterthought. But that’s
not the case here, as we have an entire chapter devoted to data analysis and
related issues.
To access the textbook website, point your browser to

http://www.cs.sjsu.edu/~stamp/ML/

where you’ll find links to PowerPoint slides, lecture videos, and other relevant
material. An updated errata list is also available. And for the reader’s benefit,
all of the figures in this book are available in electronic form, and in color.
3
Admittedly, this is a delicate balance, and your unbalanced author is sure that he didn’t
always achieve an ideal compromise. But you can rest assured that it was not for lack of
trying.
4 INTRODUCTION

In addition, extensive malware and image spam datasets can be found on


the textbook website. These or similar datasets were used in many of the
applications discussed in Part II of this book.

1.3 Necessary Background


Given the title of this weighty tome, it should be no surprise that most of
the examples are drawn from the field of information security. For a solid
introduction to information security, your humble author is partial to the
book [137]. Many of the machine learning applications in this book are
specifically focused on malware. For a thorough—and thoroughly enjoyable—
introduction to malware, Aycock’s book [12] is the clear choice. However,
enough background is provided so that no outside resources should be neces-
sary to understand the applications considered here.
Many of the exercises in this book require some programming, and basic
computing concepts are assumed in a few of the application sections. But
anyone with a modest amount of programming experience should have no
trouble with this aspect of the book.
Most machine learning techniques do ultimately rest on some fancy math.
For example, hidden Markov models (HMM) build on a foundation of dis-
crete probability, principal component analysis (PCA) is based on sophisti-
cated linear algebra, Lagrange multipliers (and calculus) are used to show
how and why a support vector machine (SVM) really works, and statistical
concepts abound. We’ll review the necessary linear algebra, and generally
cover relevant math and statistics topics as needed. However, we do assume
some knowledge of differential calculus—specifically, finding the maximum
and minimum of “nice” functions.

1.4 A Few Too Many Notes


Note that the applications presented in this book are largely drawn from your
author’s industrious students’ research projects. Note also that the applica-
tions considered here were selected because they illustrate various machine
learning techniques in relatively straightforward scenarios. In particular, it is
important to note that applications were not selected because they necessarily
represent the greatest academic research in the history of academic research.
It’s a noteworthy (and unfortunate) fact of life that the primary function of
much academic research is to impress the researcher’s (few) friends with his
or her extreme cleverness, while eschewing practicality, utility, and clarity.
In contrast, the applications presented here are supposed to help demystify
machine learning techniques.
References
Y. Altun , I. Tsochantaridis , and T. Hofmann , Hidden Markov support vector machines, Proceedings of the Twentieth International
Conference on Machine Learning (ICML-2003), Washington DC, 2003, http://cs.brown.edu/~th/papers/AltTsoHof-ICML2003.pdf
C. Annachhatre , Hidden Markov models for malware classification, Master’s Report, Department of Computer Science, San Jose State
University, 2013, http://scholarworks.sjsu.edu/etd_projects/328/
C. Annachhatre , T. H. Austin , and M. Stamp , Hidden Markov models for malware classification, Journal of Computer Virology and
Hacking Techniques, 11(2):59–73, 2015, http://link.springer.com/article/10.1007/s11416-014-0215-x
A. S. Annadatha , Image spam analysis, Master’s Report, Department of Computer Science, San Jose State University, 2016,
http://scholarworks.sjsu.edu/etd_projects/486/
A. S. Annadatha , Improved spam image dataset, https://www.dropbox.com/s/7zh7r9dopuh554e/New_Spam.zip?dl=0
A. S. Annadatha and M. Stamp , Image spam analysis and detection, to appear in Journal of Computer Virology and Hacking Techniques
G. Arfken , Diagonalization of matrices, in Mathematical Methods for Physicists, 3rd edition, Academic Press, pp. 217–229, 1985
P. Asokarathinam , 2D shear, http://cs.fit.edu/ŵds/classes/cse5255/thesis/shear/shear.html
S. Attaluri , S. McGhee , and M. Stamp , Profile hidden Markov models and metamorphic virus detection, Journal in Computer Virology
5(2):151–169, 2009, http://www.springerlink.com/content/3153113q2667q36w/
D. Austin , We recommend a singular value decomposition, http://www.ams.org/samplings/feature-column/fcarc-svd
T. H. Austin , E. Filiol , S. Josse , and M. Stamp , Exploring hidden Markov models for virus analysis: A semantic approach, Proceedings of
46th Hawaii International Conference on System Sciences (HICSS 46), January 7–10, 2013
J. Aycock , Computer Viruses and Malware, Advances in Information Security, Vol. 22, Springer-Verlag, 2006
S. Balakrishnama and A. Ganapathiraju , Linear discriminant analysis — A brief tutorial, 2007,
http://www.music.mcgill.ca/~ich/classes/mumt611_07/classifiers/lda_theory.pdf
D. Baysa , R. M. Low , and M. Stamp , Structural entropy and metamorphic malware, Journal of Computer Virology and Hacking
Techniques, 9(4):179–192, 2013, http://link.springer.com/article/10.1007/s11416-013-0185-4
K. P. Bennett and C. Campbell , Support vector machines: Hype or hallelujah?, SIGKDD Explorations, 2(2):1–13, December 2000
T. Berg-Kirkpatrick and D. Klein , Decipherment with a million random restarts, http://www.cs.berkeley.edu/~tberg/papers/emnlp2013.pdf
R. Berwick , An idiot’s guide to support vector machines (SVMs), 2003, http://www.svms.org/tutorials/Berwick2003.pdf
J. Borello and L. Me , Code obfuscation techniques for metamorphic viruses, Journal in Computer Virology 4(3): 211–220, 2008,
http://www.springerlink.com/content/233883w3r2652537
A. P. Bradley , The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognition,
30:1145–1159, 1997
L. Breiman and A. Cutler , Random forestsTM , https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
The Brown corpus of standard American English, available for download at http://www.cs.toronto.edu/~gpenn/csc401/a1res.html
Buster sandbox analyzer, http://bsa.isoftware.nl/
J. Canny , A computational approach to edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, (6):679–698,
1986
R. Canzanese , M. Kam , and S. Mancoridis , Toward an automatic, online behavioral malware classification system,
https://www.cs.drexel.edu/~spiros/papers/saso2013.pdf
R. L. Cave and L. P. Neuwirth , Hidden Markov models for English, in J. D. Ferguson , editor, Hidden Markov Models for Speech, IDA-
CRD, Princeton, NJ, October 1980, http://cs.sjsu.edu/~stamp/RUA/CaveNeuwirth/
S. Cesare and Y. Xiang , Classification of malware using structured control flow, 8th Australasian Symposium on Parallel and Distributed
Computing, pp. 61–70, 2010
E. Chen , How does randomization in a random forest work?, Quora, https://www.quora.com/How-does-randomization-in-a-random-forest-
work
E. Chen , Introduction to conditional random fields, http://blog.echen.me/2012/01/03/introduction-to-conditional-random-fields/
E. Chen , What are the advantages of different classification algorithms?, Quora, https://www.quora.com/What-are-the-advantages-of-
different-classification-algorithms/answer/Edwin-Chen-1
C. Collberg , C. Thomborson , and D. Low , Manufacturing cheap, resilient, and stealthy opaque constructs, Principles of Programming
Languages, POPL98, San Diego, California, January 1998
R. Collobert and S. Bengio , Links between perceptrons, MLPs and SVMs, Proceedings of the 21st International Conference on Machine
Learning, Banff, Canada, 2004, http://ronan.collobert.com/pub/matos/2004_links_icml.pdf
N. Cristianini and J. Shawe-Taylor , An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge
University Press, 2000
The Curse of Frankenstein, IMDb, http://www.imdb.com/title/tt0050280/
Cygwin, Cygwin utility files, 2015, http://www.cygwin.com/
J. Davis and M. Goadrich , The relationship between precision-recall and ROC curves, http://www.autonlab.org/icml_documents/camera-
ready/030_The_Relationship_Bet.pdf
R. DeCook , The bivariate normal, http://homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch5_pt3_2013.pdf
J. De Doná , Lagrangian duality, 2004, http://www.eng.newcastle.edu.au/eecs/cdsc/books/cce/Slides/Duality.pdf
W. Deng , Q. Liu , H. Cheng , and Z. Qin , A malware detection framework based on Kolmogorov complexity, Journal of Computational
Information Systems 7(8):2687–2694, 2011, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.469.1934&rep=rep1&type=pdf
P. Desai , Towards an undetectable computer virus, Master’s Report, Department of Computer Science, San Jose State University, 2008,
http://scholarworks.sjsu.edu/etd_projects/90/
P. Deshpande , Metamorphic detection using function call graph analysis. Master’s Report, San Jose State University, Department of
Computer Science, 2013, http://scholarworks.sjsu.edu/etd_projects/336
S. Deshpande , Eigenvalue analysis for metamorphic detection, Master’s Report, San Jose State University, Department of Computer
Science, 2012, http://scholarworks.sjsu.edu/etd_projects/279/
S. Deshpande , Y. Park , and M. Stamp , Eigenvalue analysis for metamorphic detection, Journal of Computer Virology and Hacking
Techniques, 10(1):53–65, 2014, http://link.springer.com/article/10.1007/s11416-013-0193-4
A. Dhavare , R. M. Low and M. Stamp , Efficient cryptanalysis of homophonic substitution ciphers, Cryptologia, 37(3):250–281, 2013,
http://www.tandfonline.com/doi/abs/10.1080/01611194.2013.797041
T. G. Dietterich , Machine learning for sequential data: A review, in T. Caelli (ed.), Structural, Syntactic, and Statistical Pattern Recognition,
Lecture Notes in Computer Science 2396, pp. 15–30, Springer, http://web.engr.oregonstate.edu/~tgd/publications/mlsd-ssspr.pdf
C. B. Do and S. Batzoglou , What is the expectation maximization algorithm?, Nature Biotechnology, 26(8):897–899, August 2008,
http://ai.stanford.edu/~chuongdo/papers/em_tutorial.pdf
M. Dredze , R. Gevaryahu , and A. Elias-Bachrach , Learning fast classifiers for image spam, CEAS 2007
R. Durbin , S. Eddy , A. Krogh , and G. Mitchison , Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids,
Cambridge University Press, 1988
C. Elkan , Log-linear models and conditional random fields, 2012, http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf
Expectation-maximization algorithm, Wikipedia, http://en.wikipedia.org/wiki/Expectationmaximization_algorithm
A. A. Farag and S. Y. Elhabian , A tutorial on data reduction: linear discriminant analysis (LDA),
http://www.di.univr.it/documenti/OccorrenzaIns/matdid/matdid437773.pdf
D. Feng and R. F. Doolittle , Progressive sequence alignment as a prerequisite to correct phylogenetic trees, Journal of Molecular
Evolution, 25(4):351–360, 1987
E. Filiol , Metamorphism, formal grammars and undecidable code mutation, International Journal of Computer Science, 2:70–75, 2007
K. Fukuda and H. Tamada , A dynamic birthmark from analyzing operand stack runtime behavior to detect copied software, in Proceedings
of SNPD’13, pp. 505–510, July 2013, Honolulu, Hawaii
Y. Gao , Image spam hunter dataset, 2008, http://www.cs.northwestern.edu/~yga751/ML/ISH.htm
Y. Gao , M. Yang , X. Zhao , B. Pardo , Y. Wu , T. N. Pappas , and A. Choudhary , Image spam hunter, Acoustics, speech and signal
processing (ICASSP 2008), pp. 1765–1768
A. Gersho and R. M. Gray , Vector Quantization and Signal Compression, Springer, 1992
GNU accounting utilities, http://www.gnu.org/software/acct/
I. Guyon , J. Weston , S. Barnhill , and V. Vapnik , Gene selection for cancer classification using support vector machines, Machine
Learning, 46(1–3):389–422, 2002
Harebot. M. , Panda Security, http://www.pandasecurity.com/usa/homeusers/security-info/220319/Harebot.M/
N. Harris , Visualizing DBSCAN clustering, https://www.naftaliharris.com/blog/visualizing-dbscanclustering/
L. Huang , A study on masquerade detection, Master’s Report, Department of Computer Science, San Jose State University, 2010,
http://scholarworks.sjsu.edu/etd_projects/9/
L. Huang and M. Stamp , Masquerade detection using profile hidden Markov models, Computers & Security, 30(8):732–747, November
2011, http://www.sciencedirect.com/science/article/pii/S0167404811001003
IDA Pro, http://www.hex-rays.com/idapro/
Introduction to support vector machines, OpenCV Tutorials, 2014,
http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html
A. K. Jain and R. C. Dubes , Algorithms for Clustering Data, Prentice-Hall, 1988, http://www.cse.msu.edu/~jain/Clustering_Jain_Dubes.pdf
T. Jakobsen , A fast method for the cryptanalysis of substitution ciphers, Cryptologia, 19(3):265–274, 1995
R. K. Jidigam , Metamorphic detection using singular value decomposition, Master’s Report, Department of Computer Science, San Jose
State University, 2013, http://scholarworks.sjsu.edu/etd_projects/330/
R. K. Jidigam , T. H. Austin , and M. Stamp , Singular value decomposition and metamorphic detection, Journal of Computer Virology and
Hacking Techniques, 11(4):203–216, 2015, http://link.springer.com/article/10.1007/s11416-014-0220-0
R. Jin , Cluster validation, 2008, http://www.cs.kent.edu/~jin/DM08/ClusterValidation.pdf
K. Jones , A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, 28(1):11–21, 1972
A. Kalbhor , T. H. Austin , E. Filiol , S. Josse , and M. Stamp , Dueling hidden Markov models for virus analysis, Journal of Computer
Virology and Hacking Techniques, 11(2):103–118, May 2015, http://link.springer.com/article/10.1007/s11416-014-0232-9
Kaspersky Lab, http://support.kaspersky.com/viruses/rogue?qid=208286454
S. Kazi and M. Stamp , Hidden Markov models for software piracy detection, Information Security Journal: A Global Perspective,
22(3):140–149, 2013, http://www.tandfonline.com/doi/abs/10.1080/19393555.2013.787474
E. Kim , Everything you wanted to know about the kernel trick (but were too afraid to ask), http://www.eric-kim.net/eric-
kimnet/posts/1/kernel_trick.html
D. Klein , Lagrange multipliers without permanent scarring, http://nlp.cs.berkeley.edu/tutorials/lagrange-multipliers.pdf
D. Knowles , Lagrangian duality for dummies, http://cs.stanford.edu/people/davidknowles//lagrangian_duality.pdf
D. Knuth , Art of Computer Programming, volume 3
Y. Ko , Maximum entropy Markov models and CRFs, http://web.donga.ac.kr/yjko/usefulthings/MEMM&CRF.pdf
S. Kolter and M. Maloof , Learning to detect and classify malicious executables in the wild, Journal of Machine Learning Research,
7:2721–2744, 2006
D. Kriesel , A brief introduction to neural networks, http://www.dkriesel.com/_media/science/neuronalenetze-enzeta2-2col-dkrieselcom.pdf
A. Lad , EM algorithm for estimating a Gaussian mixture model, http://www.cs.cmu.edu/~alad/em/
J. Lafferty , A. McCallum , and F. Pereira , Conditional random fields: Probabilistic models for segmenting and labeling sequence data,
http://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers
A. Lakhotia , A. Walenstein , C. Miles , and A. Singh , VILO: A rapid learning nearest-neighbor classifier for malware triage, Journal in
Computer Virology, 9(3):109–123, 2013
M. Law , A simple introduction to support vector machines, 2011, http://www.cise.ufl.edu/class/cis4930sp11dtm/notes/intro_svm_new.pdf
J. Lee , T. H. Austin , and M. Stamp , Compression-based analysis of metamorphic malware, International Journal of Security and
Networks, 10(2):124–136, 2015, http://www.inderscienceonline.com/doi/abs/10.1504/IJSN.2015.070426
A. Liaw and M. Wiener , Classification and regression by randomForest, R News, 2/3:18–22, December 2002
http://www.bios.unc.edu/~dzeng/BIOS740/randomforest.pdf
Y. Lin and Y. Jeon , Random forests and adaptive nearest neighbors, Technical Report 1055, Department of Statistics, University of
Wisconsin, 2002, https://www.stat.wisc.edu/sites/default/files/tr1055.pdf
D. Lin and M. Stamp , Hunting for undetectable metamorphic viruses, Journal in Computer Virology, 7(3):201–214, August 2011,
http://www.springerlink.com/content/3231224064522083/
Malicia Project, 2015, http://malicia-project.com/
Marvin the Martian, IMDb, http://www.imdb.com/character/ch0030547/
S. McKenzie , CNN, Who was the real fifth Beatle?, March 2016, http://www.cnn.com/2016/03/09/entertainment/who-was-real-fifth-beatle/
Mean vector and covariance matrix, NIST, http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc541.htm
The Mental Driller, Metamorphism in practice or “How I made MetaPHOR and what I’ve learnt,” 2002,
http://download.adamas.ai/dlbase/Stuff/VX%20Heavens%20Library/vmd01.html
B. Mirkin , Choosing the number of clusters, http://www.hse.ru/data/2011/06/23/1215441450/noc.pdf
MITRE, Malware attribute enumeration and characterization, 2013, http://maec.mitre.org
E. Mooi and M. Sarstedt , Cluster analysis, Chapter 9 in A Concise Guide to Market Research: The Process, Data, and Methods Using
IBM SPSS Statistics, Springer 2011
A. W. Moore , K-means and hierarchical clustering, 2001, http://www.autonlab.org/tutorials/kmeans11.pdf
J. Morrow , Linear least squares problems, https://www.math.washington.edu/~morrow/498_13/demmelsvd.pdf
G. Myles and C. S. Collberg , k-gram based software birthmarks, Proceedings of ACM Symposium on Applied Computing, pp. 314–318,
March 2005, Santa Fe, New Mexico
A. Nappa , M. Zubair Rafique , and J. Caballero , Driving in the cloud: An analysis of drive-by download operations and abuse reporting,
Proceedings of the 10th Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Berlin, Germany, July 2013
U. Narra , Clustering versus SVM for malware detection, Master’s Report, Department of Computer Science, San Jose State University,
2015, http://scholarworks.sjsu.edu/etd_projects/405/
U. Narra , F. Di Troia , V. A. Corrado , T. H. Austin , and M. Stamp , Clustering versus SVM for malware detection, Journal of Computer
Virology and Hacking Techniques, 2(4):213–224, November 2016, http://link.springer.com/article/10.1007/s11416-015-0253-z
Next Generation Virus Construction Kit (NGVCK), http://vxheaven.org/vx.php?id=tn02
A. Ng and J. Duchi , The simplified SMO algorithm, http://cs229.stanford.edu/materials/smo.pdf
M. Nielson , How the backpropagation algorithm works, http://neuralnetworksanddeeplearning.com/chap2.html
S. Pai , A comparison of malware clustering techniques, Master’s Report, Department of Computer Science, San Jose State University,
2015, http://scholarworks.sjsu.edu/etd_projects/404/
S. Pai , F. Di Troia , V. A. Corrado , T. H. Austin , and M. Stamp , Clustering for malware classification, Journal of Computer Virology and
Hacking Techniques, 2016, http://link.springer.com/article/10.1007%2Fs11416-016-0265-3
J. C. Platt , Sequential minimal optimization: A fast algorithm for training support vector machines, Microsoft Research, 1998,
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-98-14.pdf
P. Ponnambalam , Measuring malware evolution, Master’s Report, Department of Computer Science, San Jose State University, 2015,
http://scholarworks.sjsu.edu/etd_projects/449/
N. Ponomareva , P. Rosso , F. Pla , and A. Molina , Conditional random fields vs. hidden Markov models in a biomedical named entity
recognition task, http://users.dsic.upv.es/~prosso/resources/PonomarevaEtAl_RANLP07.pdf
R. C. Prim , Shortest connection networks and some generalizations, Bell System Technical Journal, 36(6):1389–1401, November 1957
A. Quattoni , Tutorial on conditional random fields for sequence prediction,
http://www.cs.upc.edu/~aquattoni/AllMyPapers/crf_tutorial_talk.pdf
L. R. Rabiner , A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE,
77(2):257–286, February 1989, http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf
M. Ramek , Mathematics tutorial: The Jacobi method, http://fptchlx02.tu-graz.ac.at/cgi-
bin/access.com?c1=0000&c2=0000&c3=0000&file=0638
H. Rana and M. Stamp , Hunting for pirated software using metamorphic analysis, Information Security Journal: A Global Perspective,
23(3):68–85, 2014, http://www.tandfonline.com/doi/abs/10.1080/19393555.2014.975557
S. Raschka , Linear discriminant analysis — bit by bit, 2014, http://sebastianraschka.com/Articles/2014_python_lda.html
R. Rojas , AdaBoost and the Super Bowl of classifiers: A tutorial introduction to adaptive boosting, http://www.inf.fu-berlin.de/inst/ag-
ki/adaboost4.pdf
N. Runwal , R. M. Low , and M. Stamp , Opcode graph similarity and metamorphic detection, Journal in Computer Virology, 8(1–2): 37–52,
2012, http://www.springerlink.com/content/h0g1768766071046/
M. Saleh , A. Mohamed , and A. Nabi , Eigenviruses for metamorphic virus recognition, IET Information Security, 5(4):191–198, 2011
M. Schonlau , Masquerading user data, http://www.schonlau.net/intrusion.html
M. Schonlau , et al., Computer intrusion: Detecting masquerades, Statistical Science, 15(1):1–17, 2001
Security Shield, Microsoft Malware Protection Center,
http://www.microsoft.com/security/portal/threat/encyclopedia/Entry.aspx?Name=SecurityShield
A. A. Shabalin , K-means clustering, http://shabal.in/visuals/kmeans/1.html
C. Shalizi , Logistic regression, Chapter 12 in Advanced Data Analysis from an Elementary Point of View,
http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
C. Shalizi , Principal component analysis, https://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch18.pdf
G. Shanmugam , R. M. Low , and M. Stamp , Simple substitution distance and metamorphic detection, Journal of Computer Virology and
Hacking Techniques, 9(3):159–170, 2013, http://link.springer.com/article/10.1007/s11416-013-0184-5
J. Shlens , A tutorial on principal component analysis, http://www.cs.cmu.edu/~elaw/papers/pca.pdf
Singular value decomposition, Wolfram MathWorld, http://mathworld.wolfram.com/SingularValueDecomposition.html
T. Singh , Support vector machines and metamorphic malware detection, Master’s Report, San Jose State University, 2015,
http://scholarworks.sjsu.edu/etd_projects/409/
T. Singh , F. Di Troia , V. A. Corrado , T. H. Austin , and M. Stamp Support vector machines and malware detection, Journal of Computer
Virology and Hacking Techniques, 12(4):203–212, November 2016, http://link.springer.com/article/10.1007/s11416-015-0252-0
F. Skulason , A. Solomon , and V. Bontchev , CARO naming scheme, 1991, http://www.caro.org/naming/scheme.html
Smart HDD, 2015, http://support.kaspersky.com/viruses/rogue?qid=208286454
I. Sorokin , Comparing files using structural entropy, Journal in Computer Virology, 7(4):259–265, 2011
D. Spinellis , Reliable identification of bounded-length viruses is NP-complete, IEEE Transactions on Information Theory, 49(1):280–284,
January 2003
S. Sridhara and M. Stamp , Metamorphic worm that carries its own morphing engine, Journal of Computer Virology and Hacking
Techniques, 9(2): 49–58, 2013, http://link.springer.com/article/10.1007/s11416-012-0174-z
Stack Exchange, Making sense of principal component analysis, eigenvectors & eigenvalues,
http://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues
M. Stamp , Information Security: Principles and Practice, second edition, Wiley, 2011
M. Stamp , Heatmaps for the paper [3], 2014, http://cs.sjsu.edu/~stamp/heatmap/heatmapsBig.pdf
Support vector machines (SVMs), The Karush-Kuhn-Tucker (KKT) conditions, http://www.svms.org/kkt/
C. Sutton and A. McCallum , An introduction to conditional random fields, Foundations and Trends in Machine Learning, 4(4):267–373,
2011, http://homepages.inf.ed.ac.uk/csutton/publications/crftut-fnt.pdf
M. Swimmer , Response to the proposal for a “C virus” database, 1990, ACM SIGSAC Review, 8:1–5,
http://www.odysci.com/article/1010112993890087
Symantec Trend Report, 2016, https://www.symantec.com/security_response/publications/monthlythreatreport.jsp#Spam
P. Szor and P. Ferrie , Hunting for metamorphic, Symantec Security Response,
http://www.symantec.com/avcenter/reference/hunting.for.metamorphic.pdf
Talking Heads, Once in a lifetime, http://www.azlyrics.com/lyrics/talkingheads/onceinalifetime.html
H. Tamada , K. Okamoto , M. Nakamura , A. Monden , and K. Matsumoto , Design and evaluation of dynamic software birthmarks based
on API calls, Nara Institute of Science and Technology, Technical Report, 2007
H. Tamada , K. Okamoto , M. Nakamura , A. Monden , and K. Matsumoto , Dynamic software birthmarks to detect the theft of Windows
applications, International Symposium on Future Software Technology 2004 (ISFST 2004), October 2004, Xian, China
P.-N. Tan , M. Steinbach , and V. Kumar , Cluster analysis: Basic concepts and algorithms, in Introduction to Data Mining, Addison-
Wesley, 2005, http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf
A. H. Toderici and M. Stamp , Chi-squared distance and metamorphic virus detection, Journal of Computer Virology and Hacking
Techniques, 9(1):1–14, 2013, http://link.springer.com/article/10.1007/s11416-012-0171-2
Trojan.Cridex, Symantec, 2012, http://www.symantec.com/security_response/writeup.jsp?docid=2012-012103-0840-99
Trojan.Zbot,, Symantec, 2010, http://www.symantec.com/security_response/writeup.jsp?docid=2010-011016-3514-99
Trojan.ZeroAccess, Symantec, 2013, http://www.symantec.com/security_response/writeup.jsp?docid=2011-071314-0410-99
M. A. Turk and A. P. Pentland , Eigenfaces for recognition, Journal of Cognitive Neuroscience, 3(1):71–86, 1991
R. Unterberger , AllMusic, Creedence Clearwater Revival biography, http://www.allmusic.com/artist/creedence-clearwater-revival-
mn0000131627/biography
Vector quantization, http://www.data-compression.com/vq.shtml
S. Vemparala , Malware detection using dynamic analysis, Master’s Report, Department of Computer Science, San Jose State University,
2010, http://scholarworks.sjsu.edu/etd_projects/403/
S. Vemparala , F. Di Troia , V. A. Corrado , T. H. Austin , and M. Stamp . Malware detection using dynamic birthmarks, 2nd International
Workshop on Security & Privacy Analytics (IWSPA 2016), co-located with ACM CODASPY 2016, March 9–11, 2016
S. Venkatachalam , Detecting undetectable computer viruses, Master’s Report, Department of Computer Science, San Jose State
University, 2010, http://scholarworks.sjsu.edu/etd_projects/156/
R. Vobbilisetty , Classic cryptanalysis using hidden Markov models, Master’s Report, Department of Computer Science, San Jose State
University, 2015, http://scholarworks.sjsu.edu/etd_projects/407/
R. Vobbilisetty , F. Di Troia , R. M. Low , V. A. Corrado , and M. Stamp , Classic cryptanalysis using hidden Markov models, to appear in
Cryptologia, http://www.tandfonline.com/doi/abs/10.1080/01611194.2015.1126660?journalCode=ucry20
VX Heavens, http://vxheaven.org/
H. M. Wallach , Conditional random fields: An introduction, 2004, www.inference.phy.cam.ac.uk/hmw26/papers/crf_intro.pdf
X. Wang , Y. Jhi , S. Zhu , and P. Liu , Detecting software theft via system call based birthmarks, in Proceedings of 25th Annual Computer
Security Applications Conference, December 2009, Honolulu, Hawaii
M. Welling , Fisher linear discriminant analysis, http://www.ics.uci.edu/~welling/classnotes/papers_class/Fisher-LDA.pdf
Wheel of Fortune, http://www.wheeloffortune.com
Winwebsec, Microsoft, Malware Protection Center,
http://www.microsoft.com/security/portal/threat/encyclopedia/Entry.aspx?Name=Rogue:Win32/Winwebsec
W. Wong and M. Stamp , Hunting for metamorphic engines, Journal in Computer Virology, 2(3):211–229, 2006,
http://www.springerlink.com/content/448852234n14u112/
P. Zbitskiy , Code mutation techniques by means of formal grammars and automatons, Journal in Computer Virology, 5(3):199–207, 2009,
http://link.springer.com/article/10.1007/s11416-009-0121-9
Y. Zhou and M. Inge , Malware detection using adaptive data compression, AISec’08 Proceedings of the 1st ACM Workshop on AISec, pp.
53–60, 2008
X. Zhou , X. Sun , G. Sun , and Y. Yang , A combined static and dynamic software birthmark based on component dependence graph,
Proceedings of International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 1416–1421, August 2008,
Harbin, China

You might also like