0% found this document useful (0 votes)
332 views

AlgoXY Elementary Algorithms

It’s quite often to be asked ‘Is algorithm useful?’. Some programmers said that they seldom used any serious data structures or algorithms in real work such as commercial application developing. Even when they need some of them, there have already been provided in libraries. For example, the C++ standard template library (STL) provides sort and selection algorithms as well as the vector, queue, and set data structures. It seems that knowing about how to use the library as a tool is quite enough. Instead of answering this question directly, I would like to say algorithms and data structures are critical in solving ‘interesting problems’, while if the problem is useful is another thing. Let’s start with two problems. It looks like both of them can be solved in brute-force way even by a fresh programmer.

Uploaded by

elidrissia0
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
332 views

AlgoXY Elementary Algorithms

It’s quite often to be asked ‘Is algorithm useful?’. Some programmers said that they seldom used any serious data structures or algorithms in real work such as commercial application developing. Even when they need some of them, there have already been provided in libraries. For example, the C++ standard template library (STL) provides sort and selection algorithms as well as the vector, queue, and set data structures. It seems that knowing about how to use the library as a tool is quite enough. Instead of answering this question directly, I would like to say algorithms and data structures are critical in solving ‘interesting problems’, while if the problem is useful is another thing. Let’s start with two problems. It looks like both of them can be solved in brute-force way even by a fresh programmer.

Uploaded by

elidrissia0
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 749

AlgoXYElementary Algorithms

Larry LIU Xinyu


1
June 7, 2014
1
Larry LIU Xinyu
Edition: 0.6180339887498949
Email: [email protected]
2
Contents
0.1 Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
0.2 The smallest free ID problem, the power of algorithm . . . . . . 10
0.2.1 Improvement 1 . . . . . . . . . . . . . . . . . . . . . . . . 11
0.2.2 Improvement 2, Divide and Conquer . . . . . . . . . . . . 12
0.2.3 Expressiveness V.S. Performance . . . . . . . . . . . . . . 14
0.3 The number puzzle, power of data structure . . . . . . . . . . . . 14
0.3.1 The brute-force solution . . . . . . . . . . . . . . . . . . . 15
0.3.2 Improvement 1 . . . . . . . . . . . . . . . . . . . . . . . . 16
0.3.3 Improvement 2 . . . . . . . . . . . . . . . . . . . . . . . . 18
0.4 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 21
0.5 Structure of the contents . . . . . . . . . . . . . . . . . . . . . . . 21
0.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
I Trees 25
1 Binary search tree, the hello world data structure 29
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.2 Data Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.4 Traversing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.5 Querying a binary search tree . . . . . . . . . . . . . . . . . . . . 37
1.5.1 Looking up . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.5.2 Minimum and maximum . . . . . . . . . . . . . . . . . . . 38
1.5.3 Successor and predecessor . . . . . . . . . . . . . . . . . . 38
1.6 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.7 Randomly build binary search tree . . . . . . . . . . . . . . . . . 44
1.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2 The evolution of insertion sort 49
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3 Improvement 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.4 Improvement 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.5 Final improvement by binary search tree . . . . . . . . . . . . . . 55
2.6 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3
4 CONTENTS
3 Red-black tree, not so complex as it was thought 59
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1.1 Exploit the binary search tree . . . . . . . . . . . . . . . . 59
3.1.2 How to ensure the balance of the tree . . . . . . . . . . . 60
3.1.3 Tree rotation . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2 Denition of red-black tree . . . . . . . . . . . . . . . . . . . . . 64
3.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.5 Imperative red-black tree algorithm . . . . . . . . . . . . . . . 77
3.6 More words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4 AVL tree 83
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.1 How to measure the balance of a tree? . . . . . . . . . . . 83
4.2 Denition of AVL tree . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.1 Balancing adjustment . . . . . . . . . . . . . . . . . . . . 88
4.3.2 Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . 92
4.4 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.5 Imperative AVL tree algorithm . . . . . . . . . . . . . . . . . . 94
4.6 Chapter note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5 Trie and Patricia with Functional and imperative implementa-
tion 101
5.1 abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3 Integer Trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3.1 Denition of Integer Trie . . . . . . . . . . . . . . . . . . 103
5.3.2 Insertion of integer trie . . . . . . . . . . . . . . . . . . . 105
5.3.3 Look up in integer binary trie . . . . . . . . . . . . . . . . 111
5.4 Integer Patricia Tree . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.4.1 Denition of Integer Patricia tree . . . . . . . . . . . . . . 115
5.4.2 Insertion of Integer Patricia tree . . . . . . . . . . . . . . 118
5.4.3 Look up in Integer Patricia tree . . . . . . . . . . . . . . . 129
5.5 Alphabetic Trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.5.1 Denition of alphabetic Trie . . . . . . . . . . . . . . . . . 132
5.5.2 Insertion of alphabetic trie . . . . . . . . . . . . . . . . . 135
5.5.3 Look up in alphabetic trie . . . . . . . . . . . . . . . . . . 142
5.6 Alphabetic Patricia Tree . . . . . . . . . . . . . . . . . . . . . . . 145
5.6.1 Denition of alphabetic Patricia Tree . . . . . . . . . . . 146
5.6.2 Insertion of alphabetic Patricia Tree . . . . . . . . . . . . 147
5.6.3 Look up in alphabetic Patricia tree . . . . . . . . . . . . . 157
5.7 Trie and Patricia used in Industry . . . . . . . . . . . . . . . . . 162
5.7.1 e-dictionary and word auto-completion . . . . . . . . . . . 162
5.7.2 T9 input method . . . . . . . . . . . . . . . . . . . . . . . 173
5.8 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5.9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5.9.1 Prerequisite software . . . . . . . . . . . . . . . . . . . . . 184
5.9.2 Haskell source les . . . . . . . . . . . . . . . . . . . . . . 185
5.9.3 C++/C source les . . . . . . . . . . . . . . . . . . . . . 185
CONTENTS 5
5.9.4 Python source les . . . . . . . . . . . . . . . . . . . . . . 185
5.9.5 Scheme/Lisp source les . . . . . . . . . . . . . . . . . . . 186
5.9.6 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6 Sux Tree with Functional and imperative implementation 189
6.1 abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.3 Sux Trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
6.3.1 Trivial construction methods of Sux Tree . . . . . . . . 192
6.3.2 On-line construction of sux Trie . . . . . . . . . . . . . 192
6.3.3 Alternative functional algorithm . . . . . . . . . . . . . . 200
6.4 Sux Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.4.1 On-line construction of sux tree . . . . . . . . . . . . . . 200
6.5 Sux tree applications . . . . . . . . . . . . . . . . . . . . . . . . 220
6.5.1 String/Pattern searching . . . . . . . . . . . . . . . . . . . 220
6.5.2 Find the longest repeated sub-string . . . . . . . . . . . . 224
6.5.3 Find the longest common sub-string . . . . . . . . . . . . 231
6.5.4 Find the longest palindrome in a string . . . . . . . . . . 237
6.5.5 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
6.6 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 239
6.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
6.7.1 Prerequisite software . . . . . . . . . . . . . . . . . . . . . 239
6.7.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7 B-Trees with Functional and imperative implementation 243
7.1 abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
7.3 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
7.4 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
7.4.1 Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
7.4.2 Split before insert method . . . . . . . . . . . . . . . . . . 250
7.4.3 Insert then x method . . . . . . . . . . . . . . . . . . . . 256
7.5 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
7.5.1 Merge before delete method . . . . . . . . . . . . . . . . . 262
7.5.2 Delete and x method . . . . . . . . . . . . . . . . . . . . 274
7.6 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
7.6.1 Imperative search algorithm . . . . . . . . . . . . . . . . . 285
7.6.2 Functional search algorithm . . . . . . . . . . . . . . . . . 287
7.7 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 289
7.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
7.8.1 Prerequisite software . . . . . . . . . . . . . . . . . . . . . 289
7.8.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
II Heaps 293
8 Binary Heaps with Functional and imperative implementation297
8.1 abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
8.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
8.3 Implicit binary heap by array . . . . . . . . . . . . . . . . . . . . 298
6 CONTENTS
8.3.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
8.3.2 Heapify . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
8.3.3 Build a heap . . . . . . . . . . . . . . . . . . . . . . . . . 305
8.3.4 Basic heap operations . . . . . . . . . . . . . . . . . . . . 309
8.3.5 Heap sort . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
8.4 Leftist heap and Skew heap, explicit binary heaps . . . . . . . . . 318
8.4.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
8.4.2 Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
8.4.3 Basic heap operations . . . . . . . . . . . . . . . . . . . . 323
8.4.4 Heap sort by Leftist Heap . . . . . . . . . . . . . . . . . . 325
8.4.5 Skew heaps . . . . . . . . . . . . . . . . . . . . . . . . . . 326
8.5 Splay heap, another explicit binary heap . . . . . . . . . . . . . . 330
8.5.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
8.5.2 Basic heap operations . . . . . . . . . . . . . . . . . . . . 337
8.5.3 Heap sort . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
8.6 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 341
8.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
8.7.1 Prerequisite software . . . . . . . . . . . . . . . . . . . . . 342
9 From grape to the world cup, the evolution of selection sort 347
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
9.2 Finding the minimum . . . . . . . . . . . . . . . . . . . . . . . . 349
9.2.1 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
9.2.2 Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
9.2.3 performance of the basic selection sorting . . . . . . . . . 352
9.3 Minor Improvement . . . . . . . . . . . . . . . . . . . . . . . . . 353
9.3.1 Parameterize the comparator . . . . . . . . . . . . . . . . 353
9.3.2 Trivial ne tune . . . . . . . . . . . . . . . . . . . . . . . 354
9.3.3 Cock-tail sort . . . . . . . . . . . . . . . . . . . . . . . . . 355
9.4 Major improvement . . . . . . . . . . . . . . . . . . . . . . . . . 359
9.4.1 Tournament knock out . . . . . . . . . . . . . . . . . . . . 359
9.4.2 Final improvement by using heap sort . . . . . . . . . . . 367
9.5 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
10 Binomial heap, Fibonacci heap, and pairing heap 371
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
10.2 Binomial Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
10.2.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
10.2.2 Basic heap operations . . . . . . . . . . . . . . . . . . . . 376
10.3 Fibonacci Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
10.3.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
10.3.2 Basic heap operations . . . . . . . . . . . . . . . . . . . . 388
10.3.3 Running time of pop . . . . . . . . . . . . . . . . . . . . . 397
10.3.4 Decreasing key . . . . . . . . . . . . . . . . . . . . . . . . 399
10.3.5 The name of Fibonacci Heap . . . . . . . . . . . . . . . . 401
10.4 Pairing Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
10.4.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
10.4.2 Basic heap operations . . . . . . . . . . . . . . . . . . . . 404
10.5 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 410
CONTENTS 7
III Queues and Sequences 415
11 Queue, not so simple as it was thought 419
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
11.2 Queue by linked-list and circular buer . . . . . . . . . . . . . . 420
11.2.1 Singly linked-list solution . . . . . . . . . . . . . . . . . . 420
11.2.2 Circular buer solution . . . . . . . . . . . . . . . . . . . 423
11.3 Purely functional solution . . . . . . . . . . . . . . . . . . . . . . 426
11.3.1 Paired-list queue . . . . . . . . . . . . . . . . . . . . . . . 426
11.3.2 Paired-array queue - a symmetric implementation . . . . 429
11.4 A small improvement, Balanced Queue . . . . . . . . . . . . . . . 430
11.5 One more step improvement, Real-time Queue . . . . . . . . . . 432
11.6 Lazy real-time queue . . . . . . . . . . . . . . . . . . . . . . . . . 439
11.7 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 442
12 Sequences, The last brick 445
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
12.2 Binary random access list . . . . . . . . . . . . . . . . . . . . . . 446
12.2.1 Review of plain-array and list . . . . . . . . . . . . . . . . 446
12.2.2 Represent sequence by trees . . . . . . . . . . . . . . . . . 446
12.2.3 Insertion to the head of the sequence . . . . . . . . . . . . 448
12.3 Numeric representation for binary random access list . . . . . . . 453
12.3.1 Imperative binary access list . . . . . . . . . . . . . . . . 456
12.4 Imperative paired-array list . . . . . . . . . . . . . . . . . . . . . 459
12.4.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
12.4.2 Insertion and appending . . . . . . . . . . . . . . . . . . . 460
12.4.3 random access . . . . . . . . . . . . . . . . . . . . . . . . 460
12.4.4 removing and balancing . . . . . . . . . . . . . . . . . . . 461
12.5 Concatenate-able list . . . . . . . . . . . . . . . . . . . . . . . . . 463
12.6 Finger tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
12.6.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
12.6.2 Insert element to the head of sequence . . . . . . . . . . . 469
12.6.3 Remove element from the head of sequence . . . . . . . . 472
12.6.4 Handling the ill-formed nger tree when removing . . . . 473
12.6.5 append element to the tail of the sequence . . . . . . . . . 478
12.6.6 remove element from the tail of the sequence . . . . . . . 479
12.6.7 concatenate . . . . . . . . . . . . . . . . . . . . . . . . . . 481
12.6.8 Random access of nger tree . . . . . . . . . . . . . . . . 486
12.7 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 498
IV Sorting and Searching 501
13 Divide and conquer, Quick sort V.S. Merge sort 505
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
13.2 Quick sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
13.2.1 Basic version . . . . . . . . . . . . . . . . . . . . . . . . . 506
13.2.2 Strict weak ordering . . . . . . . . . . . . . . . . . . . . . 507
13.2.3 Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
13.2.4 Minor improvement in functional partition . . . . . . . . 511
8 CONTENTS
13.3 Performance analysis for quick sort . . . . . . . . . . . . . . . . . 513
13.3.1 Average case analysis . . . . . . . . . . . . . . . . . . . 514
13.4 Engineering Improvement . . . . . . . . . . . . . . . . . . . . . . 517
13.4.1 Engineering solution to duplicated elements . . . . . . . . 517
13.5 Engineering solution to the worst case . . . . . . . . . . . . . . . 524
13.6 Other engineering practice . . . . . . . . . . . . . . . . . . . . . . 528
13.7 Side words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
13.8 Merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
13.8.1 Basic version . . . . . . . . . . . . . . . . . . . . . . . . . 530
13.9 In-place merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . 537
13.9.1 Naive in-place merge . . . . . . . . . . . . . . . . . . . . . 537
13.9.2 in-place working area . . . . . . . . . . . . . . . . . . . . 538
13.9.3 In-place merge sort V.S. linked-list merge sort . . . . . . . 543
13.10Nature merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . 545
13.11Bottom-up merge sort . . . . . . . . . . . . . . . . . . . . . . . . 551
13.12Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
13.13Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
14 Searching 557
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
14.2 Sequence search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
14.2.1 Divide and conquer search . . . . . . . . . . . . . . . . . . 558
14.2.2 Information reuse . . . . . . . . . . . . . . . . . . . . . . . 578
14.3 Solution searching . . . . . . . . . . . . . . . . . . . . . . . . . . 605
14.3.1 DFS and BFS . . . . . . . . . . . . . . . . . . . . . . . . . 605
14.3.2 Search the optimal solution . . . . . . . . . . . . . . . . . 641
14.4 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
V Appendix 673
Appendices
A Lists 679
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
A.2 List Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
A.2.1 Empty list . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
A.2.2 Access the element and the sub list . . . . . . . . . . . . . 680
A.3 Basic list manipulation . . . . . . . . . . . . . . . . . . . . . . . . 681
A.3.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . 681
A.3.2 Empty testing and length calculating . . . . . . . . . . . . 682
A.3.3 indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
A.3.4 Access the last element . . . . . . . . . . . . . . . . . . . 684
A.3.5 Reverse indexing . . . . . . . . . . . . . . . . . . . . . . . 685
A.3.6 Mutating . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
A.3.7 sum and product . . . . . . . . . . . . . . . . . . . . . . . 697
A.3.8 maximum and minimum . . . . . . . . . . . . . . . . . . . 701
A.4 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
A.4.1 mapping and for-each . . . . . . . . . . . . . . . . . . . . 705
A.4.2 reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
CONTENTS 9
A.5 Extract sub-lists . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
A.5.1 take, drop, and split-at . . . . . . . . . . . . . . . . . . . 713
A.5.2 breaking and grouping . . . . . . . . . . . . . . . . . . . . 715
A.6 Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
A.6.1 folding from right . . . . . . . . . . . . . . . . . . . . . . . 720
A.6.2 folding from left . . . . . . . . . . . . . . . . . . . . . . . 723
A.6.3 folding in practice . . . . . . . . . . . . . . . . . . . . . . 725
A.7 Searching and matching . . . . . . . . . . . . . . . . . . . . . . . 726
A.7.1 Existence testing . . . . . . . . . . . . . . . . . . . . . . . 726
A.7.2 Looking up . . . . . . . . . . . . . . . . . . . . . . . . . . 727
A.7.3 nding and ltering . . . . . . . . . . . . . . . . . . . . . 728
A.7.4 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
A.8 zipping and unzipping . . . . . . . . . . . . . . . . . . . . . . . . 732
A.9 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 735
GNU Free Documentation License 739
1. APPLICABILITY AND DEFINITIONS . . . . . . . . . . . . . . . 739
2. VERBATIM COPYING . . . . . . . . . . . . . . . . . . . . . . . . 741
3. COPYING IN QUANTITY . . . . . . . . . . . . . . . . . . . . . . 741
4. MODIFICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . 742
5. COMBINING DOCUMENTS . . . . . . . . . . . . . . . . . . . . . 743
6. COLLECTIONS OF DOCUMENTS . . . . . . . . . . . . . . . . . 744
7. AGGREGATION WITH INDEPENDENT WORKS . . . . . . . . 744
8. TRANSLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
9. TERMINATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
10. FUTURE REVISIONS OF THIS LICENSE . . . . . . . . . . . . 745
11. RELICENSING . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
ADDENDUM: How to use this License for your documents . . . . . . 746
10 Preface
Preface
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
0.1 Why?
Its quite often to be asked Is algorithm useful?. Some programmers said that
they seldom used any serious data structures or algorithms in real work such
as commercial application developing. Even when they need some of them,
there have already been provided in libraries. For example, the C++ standard
template library (STL) provides sort and selection algorithms as well as the
vector, queue, and set data structures. It seems that knowing about how to use
the library as a tool is quite enough.
Instead of answering this question directly, I would like to say algorithms
and data structures are critical in solving interesting problems, while if the
problem is useful is another thing.
Lets start with two problems. It looks like both of them can be solved in
brute-force way even by a fresh programmer.
0.2 The smallest free ID problem, the power of
algorithm
This problem is discussed in Chapter 1 of Richard Birds book [1]. Its common
that many applications and systems use ID (identier) to manage the objects
and entities. At any time, some IDs are used, and some of them are available for
using. When some client tries to acquire a new ID, we want to always allocate
it the smallest available one. Suppose ID is no-negative integers and all IDs in
using are maintained in a list (or an array) which is not ordered. For example:
[18, 4, 8, 9, 16, 1, 14, 7, 19, 3, 0, 5, 2, 11, 6]
How can you nd the smallest free ID, which is 10, from the list?
It seems the solution is quite easy without need any serious algorithms.
1: function Min-Free(A)
2: x 0
3: loop
4: if x / A then
5: return x
6: else
7: x x + 1
Where the / is realized like below. Here we use notation [a, b) in Math to
dene a range from a to b with b excluded.
1: function / (x, X)
2: for i [1, LENGTH(X)) do
3: if x = X[i] then
4: return False
5: return True
0.2. THE SMALLEST FREE ID PROBLEM, THE POWER OF ALGORITHM11
Some languages do provide handy tool which wrap this linear time process.
For example in Python, this algorithm can be directly translate as the following.
def br ut e f o r c e ( l s t ) :
i = 0
while True :
i f i not in l s t :
return i
i = i + 1
It seems this problem is trivial, However, There will be tons of millions of
IDs in a large system. The speed of this solution is poor in such case. It takes
O(N
2
) time, where N is the length of the ID list. In my computer (2 Cores 2.10
GHz, with 2G RAM), a C program with this solution takes average 5.4 seconds
to search a minimum free number among 100,000 IDs
1
. And it takes more than
8 minutes to handle a million of numbers.
0.2.1 Improvement 1
The key idea to improve the solution is based on a fact that for a series of N
numbers x
1
, x
2
, ..., x
N
, if there are free numbers, not all of the x
i
are in range
[0, N); otherwise the list is exactly one permutation of 0, 1, ..., N 1 and N
should be returned as the minimum free number respectively. It means that
max(x
i
) N 1. And we have the following fact.
minfree(x
1
, x
2
, ..., x
N
) N (1)
One solution is to use an array of N + 1 ags to mark either a number in
range [0, N] is free.
1: function Min-Free(A)
2: F [False, False, ..., False] where LENGTH(F) = N + 1
3: for x A do
4: if x < N then
5: F[x] True
6: for i [0, N] do
7: if F[i] = False then
8: return i
Line 2 initializes a ag array all of False values. This takes O(N) time.
Then the algorithm scans all numbers in A and mark the relative ag to True if
the value is less than N, This step also takes O(N) time. Finally, the algorithm
performs a linear time search to nd the rst ag with False value. So the total
performance of this algorithm is O(N). Note that we use a N + 1 ags instead
of N ags to cover the special case that sorted(A) = [0, 1, 2, ..., N 1].
Although the algorithm only takes O(N) time, it needs extra O(N) spaces
to store the ags.
This solution is much faster than the brute force one. In my computer,
the relevant Python program takes average 0.02 second when deal with 100,000
numbers.
We havent ne tune this algorithm yet. Observe that each time we have to
allocate memory to create a Nelement array of ags, and release the memory
1
All programs can be downloaded along with this series posts.
12 Preface
when nish. The memory allocation and release is very expensive that they cost
a lot of processing time.
There are two ways which can provide minor improvement to this solution.
One is to allocate the ags array in advance and reuse it for all left calls of
nding the smallest free number. The other is to use bit-wise ags instead
of a ag array. The following is the C program based on these two minor
improvement points.
#define N 1000000 // 1 mi l l i on
#define WORDLENGTH si zeof ( i nt ) 8
void s e t bi t ( unsigned i nt bi t s , unsigned i nt i ){
bi t s [ i / WORDLENGTH] |= 1<<( i % WORDLENGTH) ;
}
i nt t e s t b i t ( unsigned i nt bi t s , unsigned i nt i ){
return bi t s [ i /WORDLENGTH] & (1<<( i % WORDLENGTH) ) ;
}
unsigned i nt bi t s [ N/WORDLENGTH+1] ;
i nt mi n f r ee ( i nt xs , i nt n){
i nt i , l en = N/WORDLENGTH+1;
for ( i =0; i <l en ; ++i )
bi t s [ i ] =0;
for ( i =0; i <n ; ++i )
i f ( xs [ i ]<n)
s e t bi t ( bi t s , xs [ i ] ) ;
for ( i =0; i <=n ; ++i )
i f ( ! t e s t b i t ( bi t s , i ) )
return i ;
}
This C program can handle 1,000,000 (1 million) IDs in just 0.023 second in
my computer.
The last for-loop can be further improved as below. This is just a minor
ne-tuning.
for ( i =0; ; ++i )
i f ( bi t s [ i ] !=0 )
for ( j =0; ; ++j )
i f ( ! t e s t b i t ( bi t s , i WORDLENGTH+j ) )
return i WORDLENGTH+j ;
0.2.2 Improvement 2, Divide and Conquer
Although the above improvement looks perfect, it costs O(N) extra spaces to
keep a check list. if N is huge number, which means the huge amount of spaces
are need.
The typical divide and conquer strategy is to break the problem to some
smaller ones, and solve them to get the nal answer.
0.2. THE SMALLEST FREE ID PROBLEM, THE POWER OF ALGORITHM13
Based on formula 1, if we halve the series of number at position N/2, We
can put all numbers x
i
N/2 as the rst half sub-list A

, put all the others


as the second half sub-list A

. If the length of A

is exactly N/2, which means


the rst half of numbers are full, it indicates that the minimum free number
must be in A

. We need recursively seek in the shorter list A

. Otherwise, it
means the minimum free number is located in A

, which again leads to a smaller


problem.
When we search the minimum free number in A

, the condition changes


a little bit, we are not searching the smallest free number from 0, but actu-
ally from N/2 + 1 as the lower bound. So the algorithm is something like
minfree(A, l, u), where l is the lower bound and u is the upper bound index of
the element.
Note that there is a trivial case, that if the number list is empty, we merely
return the lower bound as the result.
The divide and conquer solution can be formally expressed as a function
rather than the pseudo code.
minfree(A) = search(A, 0, |A| 1)
search(A, l, u) =
_
_
_
l : A =
search(A

, m+ 1, u) : |A

| = ml + 1
search(A

, l, m) : otherwise
where
m =
l+u
2

A

= {x A x m}
A

= {x A x > m}
It is obvious that this algorithm doesnt need any extra spaces
2
. In each call
it performs O(|A|) comparison to build A

and A

. After that the problem scale


halves. So the time need for this algorithm is T(N) = T(N/2) + O(N) which
deduce to O(N). Another way to analyze the performance is by observing that
at the rst time it takes O(N) to build A

and A

and in the second call, it


takes O(N/2), and O(N/4) for the third time... The total time is O(N +N/2 +
N/4 +...) = O(2N) = O(N) .
In functional programming language such as Haskell, partition list has al-
ready been provided in library. This algorithm can be translated as the follow-
ing.
import Data.List
minFree xs = bsearch xs 0 (length xs - 1)
bsearch xs l u | xs == [] = l
| length as == m - l + 1 = bsearch bs (m+1) u
| otherwise = bsearch as l m
where
2
Procedural programmer may note that it actually takes O(lg N) stack spaces for book-
keeping. As well see later, this can be eliminated either by tail recursion optimization, for
instance gcc -O2. or by manually change the recursion to iteration
14 Preface
m = (l + u) div 2
(as, bs) = partition (m) xs
0.2.3 Expressiveness V.S. Performance
Imperative language programmers may concern about the performance of this
kind of implementation. For instance in this minimum free number problem,
The function recursively called proportion to O(lg N), which means the stack
size consumed is O(lg N). Its not free in terms of space.
If we go one step ahead, we can eliminate the recursion by iteration which
yields the following C program.
int min_free(int xs, int n){
int l=0;
int u=n-1;
while(n){
int m = (l + u) / 2;
int right, left = 0;
for(right = 0; right < n; ++ right)
if(xs[right] m){
swap(xs[left], xs[right]);
++left;
}
if(left == m - l + 1){
xs = xs + left;
n = n - left;
l = m+1;
}
else{
n = left;
u = m;
}
}
return l;
}
This program uses a quick-sort like approach to re-arrange the array so
that all the elements before left are less than or equal to m; while the others
between left and right are greater than m. This is shown in gure 1.
This program is fast and it doesnt need extra stack space. However, com-
pare to the previous Haskell program, its hard to read and the expressiveness
decreased. We have to make balance between performance and expressiveness.
0.3 The number puzzle, power of data structure
If the rst problem, to nd the minimum free number, is a some what useful in
practice, this problem is a pure one for fun. The puzzle is to nd the 1,500th
number, which only contains factor 2, 3 or 5. The rst 3 numbers are of course
2, 3, and 5. Number 60 = 2
2
3
1
5
1
, However it is the 25th number. Number
21 = 2
0
3
1
7
1
, isnt a valid number because it contains a factor 7. The rst 10
such numbers are list as the following.
2,3,4,5,6,8,9,10,12,15
0.3. THE NUMBER PUZZLE, POWER OF DATA STRUCTURE 15
x[i]<=m x[i]>m ...?...
left right
Figure 1: Divide the array, all x[i] m where 0 i < left; while all x[i] > m
where left i < right. The left elements are unknown.
If we consider 1 = 2
0
3
0
5
0
, then 1 is also a valid number and it is the rst
one.
0.3.1 The brute-force solution
It seems the solution is quite easy without need any serious algorithms. We can
check all numbers from 1, then extract all factors of 2, 3 and 5 to see if the left
part is 1.
1: function Get-Number(n)
2: x 0
3: i 0
4: loop
5: if Valid?(x) then
6: i i + 1
7: if i = n then
8: return x
9: function Valid?(x)
10: while x mod 2 = 0 do
11: x x/2
12: while x mod 3 = 0 do
13: x x/3
14: while x mod 5 = 0 do
15: x x/5
16: if x = 1 then
17: return True
18: else
19: return False
This brute-force algorithm works for most small n. However, to nd the
1500th number (which is 859963392), the C program based on this algorithm
takes 40.39 seconds in my computer. I have to kill the program after 10 minutes
when I increased n to 15,000.
16 Preface
0.3.2 Improvement 1
Analysis of the above algorithm shows that modular and divide calculations
are very expensive [2]. And they executed a lot in loops. Instead of checking a
number contains only 2, 3, or 5 as factors, one alternative solution is to construct
such number by these factors.
We start from 1, and times it with 2, or 3, or 5 to generate rest numbers.
The problem turns to be how to generate the candidate number in order? One
handy way is to utilize the queue data structure.
A queue data structure is used to push elements at one end, and pops them
at the other end. So that the element be pushed rst is also be popped out rst.
This property is called FIFO (First-In-First-Out).
The idea is to push 1 as the only element to the queue, then we pop an
element, times it with 2, 3, and 5, to get 3 new elements. We then push them
back to the queue in order. Note that, the new elements may have already
existed in the queue. In such case, we just drop the element. The new element
may also smaller than the others in the queue, so we must put them to the
correct position. Figure 2 illustrates this idea.
1
1*2=2 1*3=3 1*5=5
2 3 5
2*2=4 2*3=6 2*5=10
3 4 5 6 10
3*2=6 3*3=9 3*5=15
4 5 6 9 10 15
4*2=8 4*3=12 4*5=20
Figure 2: First 4 steps of constructing numbers with a queue.
1. Queue is initialized with 1 as the only element;
2. New elements 2, 3, and 5 are pushed back;
3. New elements 4, 6, and 10, are pushed back in order;
4. New elements 9 and 15 are pushed back, element 6 already exists.
This algorithm is shown as the following.
1: function Get-Number(n)
2: Q NIL
3: Enqueue(Q, 1)
4: while n > 0 do
5: x Dequeue(Q)
6: Unique-Enqueue(Q, 2x)
7: Unique-Enqueue(Q, 3x)
8: Unique-Enqueue(Q, 5x)
9: n n 1
10: return x
11: function Unique-Enqueue(Q, x)
0.3. THE NUMBER PUZZLE, POWER OF DATA STRUCTURE 17
12: i 0
13: while i < |Q| Q[i] < x do
14: i i + 1
15: if i < |Q| x = Q[i] then
16: return
17: Insert(Q, i, x)
The insert function takes O(|Q|) time to nd the proper position and insert
it. If the element has already existed, it just returns.
A rough estimation tells that the length of the queue increase proportion to
n, (Each time, we extract one element, and pushed 3 new, the increase ratio
2), so the total running time is O(1 + 2 + 3 +... +n) = O(n
2
).
Figure3 shows the number of queue access time against n. It is quadratic
curve which reect the O(n
2
) performance.
Figure 3: Queue access count v.s. n.
The C program based on this algorithm takes only 0.016[s] to get the right
answer 859963392. Which is 2500 times faster than the brute force solution.
Improvement 1 can also be considered in recursive way. Suppose X is the
innity series for all numbers which only contain factors of 2, 3, or 5. The
following formula shows an interesting relationship.
X = {1} {2x : x X} {3x : x X} {5x : x X} (2)
Where we can dene to a special form so that all elements are stored
in order as well as unique to each other. Suppose that X = {x
1
, x
2
, x
3
...},
Y = {y
1
, y
2
, y
3
, ...}, X

= {x
2
, x
3
, ...} and Y

= {y
2
, y
3
, ...}. We have
X Y =
_

_
X : Y =
Y : X =
{x
1
, X

Y } : x
1
< y
1
{x
1
, X

} : x
1
= y
1
{y
1
, X Y

} : x
1
> y
1
In a functional programming language such as Haskell, which supports lazy
evaluation, The above innity series functions can be translate into the following
program.
18 Preface
ns = 1:merge (map (2) ns) (merge (map (3) ns) (map (5) ns))
merge [] l = l
merge l [] = l
merge (x:xs) (y:ys) | x <y = x : merge xs (y:ys)
| x ==y = x : merge xs ys
| otherwise = y : merge (x:xs) ys
By evaluate ns !! (n-1), we can get the 1500th number as below.
>ns !! (1500-1)
859963392
0.3.3 Improvement 2
Considering the above solution, although it is much faster than the brute-force
one, It still has some drawbacks. It produces many duplicated numbers and
they are nally dropped when examine the queue. Secondly, it does linear scan
and insertion to keep the order of all elements in the queue, which degrade the
ENQUEUE operation from O(1) to O(|Q|).
If we use three queues instead of using only one, we can improve the solution
one step ahead. Denote these queues as Q
2
, Q
3
, and Q
5
, and we initialize them
as Q
2
= {2}, Q
3
= {3} and Q
5
= {5}. Each time we DEQUEUEed the smallest
one from Q
2
, Q
3
, and Q
5
as x. And do the following test:
If x comes from Q
2
, we ENQUEUE 2x, 3x, and 5x back to Q
2
, Q
3
, and
Q
5
respectively;
If x comes from Q
3
, we only need ENQUEUE 3x to Q
3
, and 5x to Q
5
;
We neednt ENQUEUE 2x to Q
2
, because 2x have already existed in Q
3
;
If x comes from Q
5
, we only need ENQUEUE 5x to Q
5
; there is no need
to ENQUEUE 3x, 5x to Q
3
, Q
5
because they have already been in the
queues;
We repeatedly ENQUEUE the smallest one until we nd the n-th element.
The algorithm based on this idea is implemented as below.
1: function Get-Number(n)
2: if n = 1 then
3: return 1
4: else
5: Q
2
{2}
6: Q
3
{3}
7: Q
5
{5}
8: while n > 1 do
9: x min(Head(Q
2
), Head(Q
3
), Head(Q
5
))
10: if x = Head(Q
2
) then
11: Dequeue(Q
2
)
12: Enqueue(Q
2
, 2x)
13: Enqueue(Q
3
, 3x)
14: Enqueue(Q
5
, 5x)
15: else if x = Head(Q
3
) then
0.3. THE NUMBER PUZZLE, POWER OF DATA STRUCTURE 19
2
min=2
3 5
2*min=4 3*min=6 5*min=10
4
min=3
3 6 5 10
3*min=9 5*min=15
4
min=4
6 9 5 10 15
2*min=8 3*min=12 5*min=20
8
min=5
6 9 12 5 10 15 20
5*min=25
Figure 4: First 4 steps of constructing numbers with Q
2
, Q
3
, and Q
5
.
1. Queues are initialized with 2, 3, 5 as the only element;
2. New elements 4, 6, and 10 are pushed back;
3. New elements 9, and 15, are pushed back;
4. New elements 8, 12, and 20 are pushed back;
5. New element 25 is pushed back.
16: Dequeue(Q
3
)
17: Enqueue(Q
3
, 3x)
18: Enqueue(Q
5
, 5x)
19: else
20: Dequeue(Q
5
)
21: Enqueue(Q
5
, 5x)
22: n n 1
23: return x
This algorithm loops n times, and within each loop, it extract one head
element from the three queues, which takes constant time. Then it appends
one to three new elements at the end of queues which bounds to constant time
too. So the total time of the algorithm bounds to O(n). The C++ program
translated from this algorithm shown below takes less than 1 s to produce the
1500th number, 859963392.
typedef unsigned long Integer;
Integer get_number(int n){
if(n==1)
return 1;
queue<Integer> Q2, Q3, Q5;
Q2.push(2);
Q3.push(3);
Q5.push(5);
Integer x;
20 Preface
while(n-- > 1){
x = min(min(Q2.front(), Q3.front()), Q5.front());
if(x==Q2.front()){
Q2.pop();
Q2.push(x2);
Q3.push(x3);
Q5.push(x5);
}
else if(x==Q3.front()){
Q3.pop();
Q3.push(x3);
Q5.push(x5);
}
else{
Q5.pop();
Q5.push(x5);
}
}
return x;
}
This solution can be also implemented in Functional way. We dene a func-
tion take(n), which will return the rst n numbers contains only factor 2, 3, or
5.
take(n) = f(n, {1}, {2}, {3}, {5})
Where
f(n, X, Q
2
, Q
3
, Q
5
) =
_
X : n = 1
f(n 1, X {x}, Q

2
, Q

3
, Q

5
) : otherwise
x = min(Q
21
, Q
31
, Q
51
)
Q

2
, Q

3
, Q

5
=
_
_
_
{Q
22
, Q
23
, ...} {2x}, Q
3
{3x}, Q
5
{5x} : x = Q
21
Q
2
, {Q
32
, Q
33
, ...} {3x}, Q5 {5x} : x = Q
31
Q
2
, Q
3
, {Q
52
, Q
53
, ...} {5x} : x = Q
51
And these functional denition can be realized in Haskell as the following.
ks 1 xs _ = xs
ks n xs (q2, q3, q5) = ks (n-1) (xs++[x]) update
where
x = minimum $ map head [q2, q3, q5]
update | x == head q2 = ((tail q2)++[x2], q3++[x3], q5++[x5])
| x == head q3 = (q2, (tail q3)++[x3], q5++[x5])
| otherwise = (q2, q3, (tail q5)++[x5])
takeN n = ks n [1] ([2], [3], [5])
Invoke last takeN 1500 will generate the correct answer 859963392.
0.4. NOTES AND SHORT SUMMARY 21
0.4 Notes and short summary
If review the 2 puzzles, we found in both cases, the brute-force solutions are so
weak. In the rst problem, its quite poor in dealing with long ID list, while in
the second problem, it doesnt work at all.
The rst problem shows the power of algorithms, while the second problem
tells why data structure is important. There are plenty of interesting problems,
which are hard to solve before computer was invented. With the aid of com-
puter and programming, we are able to nd the answer in a quite dierent way.
Compare to what we learned in mathematics course in school, we havent been
taught the method like this.
While there have been already a lot of wonderful books about algorithms,
data structures and math, however, few of them provide the comparison between
the procedural solution and the functional solution. From the above discussion,
it can be found that functional solution sometimes is very expressive and they
are close to what we are familiar in mathematics.
This series of post focus on providing both imperative and functional algo-
rithms and data structures. Many functional data structures can be referenced
from Okasakis book[6]. While the imperative ones can be founded in classic
text books [2] or even in WIKIpedia. Multiple programming languages, includ-
ing, C, C++, Python, Haskell, and Scheme/Lisp will be used. In order to make
it easy to read by programmers with dierent background, pseudo code and
mathematical function are the regular descriptions of each post.
The author is NOT a native English speaker, the reason why this book is
only available in English for the time being is because the contents are still
changing frequently. Any feedback, comments, or criticizes are welcome.
0.5 Structure of the contents
In the following series of post, Ill rst introduce about elementary data struc-
tures before algorithms, because many algorithms need knowledge of data struc-
tures as prerequisite.
The hello world data structure, binary search tree is the rst topic; Then
we introduce how to solve the balance problem of binary search tree. After
that, Ill show other interesting trees. Trie, Patricia, sux trees are useful in
text manipulation. While B-trees are commonly used in le system and data
base implementation.
The second part of data structures is about heaps. Well provide a gen-
eral Heap denition and introduce about binary heaps by array and by explicit
binary trees. Then well extend to K-ary heaps including Binomial heaps, Fi-
bonacci heaps, and pairing heaps.
Array and queues are considered among the easiest data structures typically,
However, well show how dicult to implement them in the third part.
As the elementary sort algorithms, well introduce insertion sort, quick sort,
merge sort etc in both imperative way and functional way.
The nal part is about searching, besides the element searching, well also
show string matching algorithms such as KMP.
All the posts are provided under GNU FDL (Free document license), and
programs are under GNU GPL.
22 Preface
0.6 Appendix
All programs provided along with this article are free for downloading. download
position: http://sites.google.com/site/algoxy/introduction
Bibliography
[1] Richard Bird. Pearls of functional algorithm design. Cambridge Univer-
sity Press; 1 edition (November 1, 2010). ISBN-10: 0521513383
[2] Jon Bentley. Programming Pearls(2nd Edition). Addison-Wesley Profes-
sional; 2 edition (October 7, 1999). ISBN-13: 978-0201657883
[3] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.
23
24 BIBLIOGRAPHY
Part I
Trees
25
AlgoXY 27
Binary search tree, the hello world data structure
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
28 Binary search tree
Chapter 1
Binary search tree, the
hello world data structure
1.1 Introduction
Its typically considered that Arrays or Lists are the hello world data struc-
ture. However, well see they are not so easy to implement actually. In some
procedural settings, Arrays are the elementary representation, and it is possible
to realize linked list by array (section 10.3 in [2]); While in some functional
settings, Linked list are the elementary bricks to build arrays and other data
structures.
Considering these factors, we start with Binary Search Tree (or BST) as the
hello world data structure. Jon Bentley mentioned an interesting problem in
programming pearls [2]. The problem is about to count the number of times
each word occurs in a big text. And the solution is something like the below
C++ code.
int main(int, char ){
map<string, int> dict;
string s;
while(cin>>s)
++dict[s];
map<string, int>::iterator it=dict.begin();
for(; it!=dict.end(); ++it)
cout<<itfirst<<":"<<itsecond<<"n";
}
And we can run it to produce the word counting result as the following
1
.
$ g++ wordcount.cpp -o wordcount
$ cat bbe.txt | ./wordcount > wc.txt
The map provided in standard template library is a kind of balanced binary
search tree with augmented data. Here we use the word in the text as the key
and the number of occurrence as the augmented data. This program is fast, and
1
This is not UNIX unique command, in Windows OS, it can be achieved by:
type bbe.txt|wordcount.exe > wc.txt
29
30CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE
it reects the power of binary search tree. Well introduce how to implement
BST in this post and show how to solve the balancing problem in later post.
Before we dive into binary search tree. Lets rst introduce about the more
general binary tree.
The concept of Binary tree is a recursive denition. Binary search tree is
just a special type of binary tree. The Binary tree is typically dened as the
following.
A binary tree is
either an empty node;
or a node contains 3 parts, a value, a left child which is a binary tree and
a right child which is also a binary tree.
Figure 1.1 shows this concept and an example binary tree.
k
L R
(a) Concept of binary tree
16
4 10
14 7
2 8 1
9 3
(b) An example binary tree
Figure 1.1: Binary tree concept and an example.
A binary search tree is a binary tree which satises the below criteria. for
each node in binary search tree,
all the values in left child tree are less than the value of of this node;
the value of this node is less than any values in its right child tree.
Figure 1.2 shows an example of binary search tree. Compare with Figure
1.1 we can see the dierence about the key ordering between them.
1.2. DATA LAYOUT 31
4
3 8
1
2
7 16
10
9 14
Figure 1.2: A Binary search tree example.
1.2 Data Layout
Based on the recursive denition of binary search tree, we can draw the data
layout in procedural setting with pointer supported as in gure 1.3.
The node contains a eld of key, which can be augmented with satellite data;
a eld contains a pointer to the left child and a eld point to the right child. In
order to back-track an ancestor easily, a parent eld can be provided as well.
In this post, well ignore the satellite data for simple illustration purpose.
Based on this layout, the node of binary search tree can be dened in a proce-
dural language, such as C++ as the following.
template<class T>
struct node{
node(T x):key(x), left(0), right(0), parent(0){}
~node(){
delete left;
delete right;
}
node left;
node right;
node parent; //parent is optional, itshelpfulforsucc/pred
Tkey;
};
There is another setting, for instance in Scheme/Lisp languages, the ele-
mentary data structure is linked-list. Figure 1.4 shows how a binary search tree
node can be built on top of linked-list.
Because in pure functional setting, Its hard to use pointer for back tracking
the ancestors, (and typically, there is no need to do back tracking, since we can
32CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE
key + sitellite data
left
right
parent
key + sitellite data
left
right
parent
key + sitellite data
left
right
parent
... ... ... ...
Figure 1.3: Layout of nodes with parent eld.
key next
left ... next
right ... NIL
Figure 1.4: Binary search tree node layout on top of linked list. Where left...
and right ... are either empty or binary search tree node composed in the same
way.
1.3. INSERTION 33
provide top-down solution in recursive way) there is not parent eld in such
layout.
For simplied reason, well skip the detailed layout in the future, and only
focus on the logic layout of data structures. For example, below is the denition
of binary search tree node in Haskell.
data Tree a = Empty
| Node (Tree a) a (Tree a)
1.3 Insertion
To insert a key k (may be along with a value in practice) to a binary search tree
T, we can follow a quite straight forward way.
If the tree is empty, then construct a leave node with key=k;
If k is less than the key of root node, insert it to the left child;
If k is greater than the key of root, insert it to the right child;
There is an exceptional case that if k is equal to the key of root, it means it
has already existed, we can either overwrite the data, or just do nothing. For
simple reason, this case is skipped in this post.
This algorithm is described recursively. It is so simple that is why we consider
binary search tree is hello world data structure. Formally, the algorithm can
be represented with a recursive function.
insert(T, k) =
_
_
_
node(, k, ) : T =
node(insert(L, k), Key, R) : k < Key
node(L, Key, insert(R, k)) : otherwise
(1.1)
Where
L = left(T)
R = right(T)
Key = key(T)
The node function creates a new node with given left sub-tree, a key and a
right sub-tree as parameters. means NIL or Empty. function left, right and
key are access functions which can get the left sub-tree, right sub-tree and the
key of a node.
Translate the above functions directly to Haskell yields the following pro-
gram.
insert::(Ord a) Tree a a Tree a
insert Empty k = Node Empty k Empty
insert (Node l x r) k | k < x = Node (insert l k) x r
| otherwise = Node l x (insert r k)
This program utilized the pattern matching features provided by the lan-
guage. However, even in functional settings without this feature, for instance,
Scheme/Lisp, the program is still expressive.
34CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE
(define (insert tree x)
(cond ((null? tree) (list () x ()))
((< x (key tree))
(make-tree (insert (left tree) x)
(key tree)
(right tree)))
((> x (key tree))
(make-tree (left tree)
(key tree)
(insert (right tree) x)))))
It is possible to turn the algorithm completely into imperative way without
recursion.
1: function Insert(T, k)
2: root T
3: x Create-Leaf(k)
4: parent NIL
5: while T = NIL do
6: parent T
7: if k < Key(T) then
8: T Left(T)
9: else
10: T Right(T)
11: Parent(x) parent
12: if parent = NIL then tree T is empty
13: return x
14: else if k < Key(parent) then
15: Left(parent) x
16: else
17: Right(parent) x
18: return root
19: function Create-Leaf(k)
20: x Empty-Node
21: Key(x) k
22: Left(x) NIL
23: Right(x) NIL
24: Parent(x) NIL
25: return x
Compare with the functional algorithm, it is obviously that this one is more
complex although it is fast and can handle very deep tree. A complete C++
program and a python program are available along with this post for reference.
1.4 Traversing
Traversing means visiting every element one by one in a binary search tree.
There are 3 ways to traverse a binary tree, pre-order tree walk, in-order tree
walk, and post-order tree walk. The names of these traversing methods highlight
the order of when we visit the root of a binary search tree.
1.4. TRAVERSING 35
Since there are three parts in a tree, as left child, the root, which con-
tains the key and satellite data, and the right child. If we denote them as
(left, current, right), the three traversing methods are dened as the following.
pre-order traverse, visit current, then left, nally right;
in-order traverse, visit left , then current, nally right;
post-order traverse, visit left, then right, nally current.
Note that each visiting operation is recursive. And we see the order of
visiting current determines the name of the traversing method.
For the binary search tree shown in gure 1.2, below are the three dierent
traversing results.
pre-order traverse result: 4, 3, 1, 2, 8, 7, 16, 10, 9, 14;
in-order traverse result: 1, 2, 3, 4, 7, 8, 9, 10, 14, 16;
post-order traverse result: 2, 1, 3, 7, 9, 14, 10, 16, 8, 4;
It can be found that the in-order walk of a binary search tree outputs the
elements in increase order, which is particularly helpful. The denition of binary
search tree ensures this interesting property, while the proof of this fact is left
as an exercise of this post.
In-order tree walk algorithm can be described as the following:
If the tree is empty, just return;
traverse the left child by in-order walk, then access the key, nally traverse
the right child by in-order walk.
Translate the above description yields a generic map function
map(f, T) =
_
: T =
node(l

, k

, r

) : otherwise
(1.2)
where
l

= map(f, left(T))
r

= map(f, right(T))
k

= f(key(T))
If we only need access the key without create the transformed tree, we can
realize this algorithm in procedural way lie the below C++ program.
template<class T, class F>
void in_order_walk(node<T> t, F f){
if(t){
in_order_walk(tleft, f);
f(tvalue);
in_order_walk(tright, f);
}
}
36CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE
The function takes a parameter f, it can be a real function, or a function
object, this program will apply f to the node by in-order tree walk.
We can simplied this algorithm one more step to dene a function which
turns a binary search tree to a sorted list by in-order traversing.
toList(T) =
_
: T =
toList(left(T)) {key(T)} toList(right(T)) : otherwise
(1.3)
Below is the Haskell program based on this denition.
toList::(Ord a)Tree a [a]
toList Empty = []
toList (Node l x r) = toList l ++ [x] ++ toList r
This provides us a method to sort a list of elements. We can rst build
a binary search tree from the list, then output the tree by in-order traversing.
This method is called as tree sort. Lets denote the list X = {x
1
, x
2
, x
3
, ..., x
n
}.
sort(X) = toList(fromList(X)) (1.4)
And we can write it in function composition form.
sort = toList.fromList
Where function fromList repeatedly insert every element to a binary search
tree.
fromList(X) = foldL(insert, , X) (1.5)
It can also be written in partial application form like below.
fromList = foldL insert
For the readers who are not familiar with folding from left, this function can
also be dened recursively as the following.
fromList(X) =
_
: X =
insert(fromList({x
2
, x
3
, ..., x
n
}), x
1
) : otherwise
Well intense use folding function as well as the function composition and
partial evaluation in the future, please refer to appendix of this book or [6] [7]
and [8] for more information.
Exercise 1.1
Given the in-order traverse result and pre-order traverse result, can you re-
construct the tree from these result and gure out the post-order traversing
result?
Pre-order result: 1, 2, 4, 3, 5, 6; In-order result: 4, 2, 1, 5, 3, 6; Post-order
result: ?
Write a program in your favorite language to re-construct the binary tree
from pre-order result and in-order result.
1.5. QUERYING A BINARY SEARCH TREE 37
Prove why in-order walk output the elements stored in a binary search
tree in increase order?
Can you analyze the performance of tree sort with big-O notation?
1.5 Querying a binary search tree
There are three types of querying for binary search tree, searching a key in the
tree, nd the minimum or maximum element in the tree, and nd the predecessor
or successor of an element in the tree.
1.5.1 Looking up
According to the denition of binary search tree, search a key in a tree can be
realized as the following.
If the tree is empty, the searching fails;
If the key of the root is equal to the value to be found, the search succeed.
The root is returned as the result;
If the value is less than the key of the root, search in the left child.
Else, which means that the value is greater than the key of the root, search
in the right child.
This algorithm can be described with a recursive function as below.
lookup(T, x) =
_

_
: T =
T : key(T) = x
lookup(left(T), x) : x < key(T)
lookup(right(T), x) : otherwise
(1.6)
In the real application, we may return the satellite data instead of the node
as the search result. This algorithm is simple and straightforward. Here is a
translation of Haskell program.
lookup::(Ord a) Tree a a Tree a
lookup Empty _ = Empty
lookup t@(Node l k r) x | k == x = t
| x < k = lookup l x
| otherwise = lookup r x
If the binary search tree is well balanced, which means that almost all nodes
have both non-NIL left child and right child, for N elements, the search algo-
rithm takes O(lg N) time to perform. This is not formal denition of balance.
Well show it in later post about red-black-tree. If the tree is poor balanced,
the worst case takes O(N) time to search for a key. If we denote the height of
the tree as h, we can uniform the performance of the algorithm as O(h).
The search algorithm can also be realized without using recursion in a pro-
cedural manner.
1: function Search(T, x)
38CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE
2: while T = NIL Key(T) = x do
3: if x < Key(T) then
4: T Left(T)
5: else
6: T Right(T)
7: return T
Below is the C++ program based on this algorithm.
template<class T>
node<T> search(node<T> t, T x){
while(t && tkey!=x){
if(x < tkey) t=tleft;
else t=tright;
}
return t;
}
1.5.2 Minimum and maximum
Minimum and maximum can be implemented from the property of binary search
tree, less keys are always in left child, and greater keys are in right.
For minimum, we can continue traverse the left sub tree until it is empty.
While for maximum, we traverse the right.
min(T) =
_
key(T) : left(T) =
min(left(T)) : otherwise
(1.7)
max(T) =
_
key(T) : right(T) =
max(right(T)) : otherwise
(1.8)
Both function bound to O(h) time, where h is the height of the tree. For the
balanced binary search tree, min/max are bound to O(lg N) time, while they
are O(N) in the worst cases.
We skip translating them to programs, Its also possible to implement them
in pure procedural way without using recursion.
1.5.3 Successor and predecessor
The last kind of querying, to nd the successor or predecessor of an element
is useful when a tree is treated as a generic container and traversed by using
iterator. It will be relative easier to implement if parent of a node can be
accessed directly.
It seems that the functional solution is hard to be found, because there is no
pointer like eld linking to the parent node. One solution is to left breadcrumbs
when we visit the tree, and use these information to back-track or even re-
construct the whole tree. Such data structure, that contains both the tree and
breadcrumbs is called zipper. please refer to [9] for details.
However, If we consider the original purpose of providing succ/pred func-
tion, to traverse all the binary search tree elements one by one as a generic
container, we realize that they dont make signicant sense in functional settings
1.5. QUERYING A BINARY SEARCH TREE 39
because we can traverse the tree in increase order by mapT function we dened
previously.
Well meet many problems in this series of post that they are only valid in
imperative settings, and they are not meaningful problems in functional settings
at all. One good example is how to delete an element in red-black-tree[3].
In this section, well only present the imperative algorithm for nding the
successor and predecessor in a binary search tree.
When nding the successor of element x, which is the smallest one y that
satises y > x, there are two cases. If the node with value x has non-NIL right
child, the minimum element in right child is the answer; For example, in Figure
1.2, in order to nd the successor of 8, we search its right sub tree for the
minimum one, which yields 9 as the result. While if node x dont have right
child, we need back-track to nd the closest ancestors whose left child is also
ancestor of x. In Figure 1.2, since 2 dont have right sub tree, we go back to its
parent 1. However, node 1 dont have left child, so we go back again and reach
to node 3, the left child of 3, is also ancestor of 2, thus, 3 is the successor of
node 2.
Based on this description, the algorithm can be given as the following.
1: function Succ(x)
2: if Right(x) = NIL then
3: return Min(Right(x))
4: else
5: p Parent(x)
6: while p = NIL and x = Right(p) do
7: x p
8: p Parent(p)
9: return p
The predecessor case is quite similar to the successor algorithm, they are
symmetrical to each other.
1: function Pred(x)
2: if Left(x) = NIL then
3: return Max(Left(x))
4: else
5: p Parent(x)
6: while p = NIL and x = Left(p) do
7: x p
8: p Parent(p)
9: return p
Below are the Python programs based on these algorithms. They are changed
a bit in while loop conditions.
def succ(x):
if x.right is not None: return tree_min(x.right)
p = x.parent
while p is not None and p.left != x:
x = p
p = p.parent
return p
def pred(x):
40CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE
if x.left is not None: return tree_max(x.left)
p = x.parent
while p is not None and p.right != x:
x = p
p = p.parent
return p
Exercise 1.2
Can you gure out how to iterate a tree as a generic container by using
pred()/succ()? Whats the performance of such traversing process in terms
of big-O?
A reader discussed about traversing all elements inside a range [a, b]. In
C++, the algorithm looks like the below code:
for each(m.lower bound(12), m.upper bound(26), f);
Can you provide the purely function solution for this problem?
1.6 Deletion
Deletion is another imperative only topic for binary search tree. This is because
deletion mutate the tree, while in purely functional settings, we dont modify
the tree after building it in most application.
However, One method of deleting element from binary search tree in purely
functional way is shown in this section. Its actually reconstructing the tree but
not modifying the tree.
Deletion is the most complex operation for binary search tree. this is because
we must keep the BST property, that for any node, all keys in left sub tree are
less than the key of this node, and they are all less than any keys in right sub
tree. Deleting a node can break this property.
In this post, dierent with the algorithm described in [2], A simpler one from
SGI STL implementation is used.[6]
To delete a node x from a tree.
If x has no child or only one child, splice x out;
Otherwise (x has two children), use minimum element of its right sub tree
to replace x, and splice the original minimum element out.
The simplicity comes from the truth that, the minimum element is stored in
a node in the right sub tree, which cant have two non-NIL children. It ends up
in the trivial case, the the node can be directly splice out from the tree.
Figure 1.5, 1.6, and 1.7 illustrate these dierent cases when deleting a node
from the tree.
1.6. DELETION 41
Tree
x
NIL NIL
Figure 1.5: x can be spliced out.
Tree
x
L NIL
(a) Before delete x
Tree
L
(b) After delete x
x is spliced out, and replaced by its left child.
Tree
x
NIL R
(c) Before delete x
Tree
R
(d) Before delete x
x is spliced out, and replaced by its right child.
Figure 1.6: Delete a node which has only one non-NIL child.
42CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE
Tree
x
L R
(a) Before delete x
Tree
min(R)
L delete(R, min(R))
(b) After delete x
x is replaced by splicing the minimum element from its right child.
Figure 1.7: Delete a node which has both children.
Based on this idea, the deletion can be dened as the below function.
delete(T, x) =
_

_
: T =
node(delete(L, x), K, R) : x < K
node(L, K, delete(R, x)) : x > K
R : x = K L =
L : x = K R =
node(L, y, delete(R, y)) : otherwise
(1.9)
Where
L = left(T)
R = right(T)
K = key(T)
y = min(R)
Translating the function to Haskell yields the below program.
delete::(Ord a) Tree a a Tree a
delete Empty _ = Empty
delete (Node l k r) x | x < k = (Node (delete l x) k r)
| x > k = (Node l k (delete r x))
-- x == k
| isEmpty l = r
| isEmpty r = l
| otherwise = (Node l k (delete r k))
where k = min r
Function isEmpty is used to test if a tree is empty (). Note that the
algorithm rst performs search to locate the node where the element need be
deleted, after that it execute the deletion. This algorithm takes O(h) time where
h is the height of the tree.
1.6. DELETION 43
Its also possible to pass the node but not the element to the algorithm for
deletion. Thus the searching is no more needed.
The imperative algorithm is more complex because it need set the parent
properly. The function will return the root of the result tree.
1: function Delete(T, x)
2: root T
3: x

x save x
4: parent Parent(x)
5: if Left(x) = NIL then
6: x Right(x)
7: else if Right(x) = NIL then
8: x Left(x)
9: else both children are non-NIL
10: y Min(Right(x))
11: Key(x) Key(y)
12: Copy other satellite data from y to x
13: if Parent(y) = x then y hasnt left sub tree
14: Left(Parent(y)) Right(y)
15: else y is the root of right child of x
16: Right(x) Right(y)
17: Remove y
18: return root
19: if x = NIL then
20: Parent(x) parent
21: if parent = NIL then We are removing the root of the tree
22: root x
23: else
24: if Left(parent) = x

then
25: Left(parent) x
26: else
27: Right(parent) x
28: Remove x

29: return root


Here we assume the node to be deleted is not empty (otherwise we can simply
returns the original tree). In other cases, it will rst record the root of the tree,
create copy pointers to x, and its parent.
If either of the children is empty, the algorithm just splice x out. If it has
two non-NIL children, we rst located the minimum of right child, replace the
key of x to ys, copy the satellite data as well, then splice y out. Note that there
is a special case that y is the root node of xs left sub tree.
Finally we need reset the stored parent if the original x has only one non-
NIL child. If the parent pointer we copied before is empty, it means that we are
deleting the root node, so we need return the new root. After the parent is set
properly, we nally remove the old x from memory.
The relative Python program for deleting algorithm is given as below. Be-
cause Python provides GC, we neednt explicitly remove the node from the
memory.
def tree_delete(t, x):
44CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE
if x is None:
return t
[root, old_x, parent] = [t, x, x.parent]
if x.left is None:
x = x.right
elif x.right is None:
x = x.left
else:
y = tree_min(x.right)
x.key = y.key
if y.parent != x:
y.parent.left = y.right
else:
x.right = y.right
return root
if x is not None:
x.parent = parent
if parent is None:
root = x
else:
if parent.left == old_x:
parent.left = x
else:
parent.right = x
return root
Because the procedure seeks minimum element, it runs in O(h) time on a
tree of height h.
Exercise 1.3
There is a symmetrical solution for deleting a node which has two non-NIL
children, to replace the element by splicing the maximum one out o the
left sub-tree. Write a program to implement this solution.
1.7 Randomly build binary search tree
It can be found that all operations given in this post bound to O(h) time for a
tree of height h. The height aects the performance a lot. For a very unbalanced
tree, h tends to be O(N), which leads to the worst case. While for balanced
tree, h close to O(lg N). We can gain the good performance.
How to make the binary search tree balanced will be discussed in next post.
However, there exists a simple way. Binary search tree can be randomly built as
described in [2]. Randomly building can help to avoid (decrease the possibility)
unbalanced binary trees. The idea is that before building the tree, we can call
a random process, to shue the elements.
Exercise 1.4
Write a randomly building process for binary search tree.
1.8. APPENDIX 45
1.8 Appendix
All programs are provided along with this post. They are free for downloading.
We provided C, C++, Python, Haskell, and Scheme/Lisp programs as example.
46CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[2] Jon Bentley. Programming Pearls(2nd Edition). Addison-Wesley Profes-
sional; 2 edition (October 7, 1999). ISBN-13: 978-0201657883
[3] Chris Okasaki. Ten Years of Purely Functional Data Structures.
http://okasaki.blogspot.com/2008/02/ten-years-of-purely-functional-
data.html
[4] SGI. Standard Template Library Programmers Guide.
http://www.sgi.com/tech/stl/
[5] http://en.literateprograms.org/Category:Binary search tree
[6] http://en.wikipedia.org/wiki/Foldl
[7] http://en.wikipedia.org/wiki/Function composition
[8] http://en.wikipedia.org/wiki/Partial application
[9] Miran Lipovaca. Learn You a Haskell for Great Good! A Beginners
Guide. the last chapter. No Starch Press; 1 edition April 2011, 400 pp.
ISBN: 978-1-59327-283-8
The evolution of insertion sort
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
47
48 The evolution of insertion sort
Chapter 2
The evolution of insertion
sort
2.1 Introduction
In previous chapter, we introduced the hello world data structure, binary
search tree. In this chapter, we explain insertion sort, which can be think of
the hello world sorting algorithm
1
. Its straightforward, but the performance
is not as good as some divide and conqueror sorting approaches, such as quick
sort and merge sort. Thus insertion sort is seldom used as generic sorting utility
in modern software libraries. Well analyze the problems why it is slow, and
trying to improve it bit by bit till we reach the best bound of comparison based
sorting algorithms, O(N lg N), by evolution to tree sort. And we nally show
the connection between the hello world data structure and hello world sorting
algorithm.
The idea of insertion sort can be vivid illustrated by a real life poker game[2].
Suppose the cards are shued, and a player starts taking card one by one.
At any time, all cards in players hand are well sorted. When the player
gets a new card, he insert it in proper position according to the order of points.
Figure 2.1 shows this insertion example.
Based on this idea, the algorithm of insertion sort can be directly given as
the following.
function Sort(A)
X
for each x A do
Insert(X, x)
return X
Its easy to express this process with folding, which we mentioned in the
chapter of binary search tree.
insert = foldL insert (2.1)
1
Some reader may argue that Bubble sort is the easiest sort algorithm. Bubble sort isnt
covered in this book as we dont think its a valuable algorithm[1]
49
50 CHAPTER 2. THE EVOLUTION OF INSERTION SORT
Figure 2.1: Insert card 8 to proper position in a deck.
Note that in above algorithm, we store the sorted result in X, so this isnt
in-place sorting. Its easy to change it to in-place algorithm.
function Sort(A)
for i 2 to Length(A) do
insert A
i
to sorted sequence {A

1
, A

2
, ..., A

i1
}
At any time, when we process the i-th element, all elements before i have
already been sorted. we continuously insert the current elements until consume
all the unsorted data. This idea is illustrated as in gure 9.3.
... sorted elements ... x
insert
... unsorted elements ...
Figure 2.2: The left part is sorted data, continuously insert elements to sorted
part.
We can nd there is recursive concept in this denition. Thus it can be
expressed as the following.
sort(A) =
_
: A =
insert(sort({A
2
, A
3
, ...}), A
1
) : otherwise
(2.2)
2.2 Insertion
We havent answered the question about how to realize insertion however. Its
a puzzle how does human locate the proper position so quickly.
For computer, its an obvious option to perform a scan. We can either scan
from left to right or vice versa. However, if the sequence is stored in plain array,
its necessary to scan from right to left.
function Sort(A)
for i 2 to Length(A) do Insert A[i] to sorted sequence A[1...i 1]
2.2. INSERTION 51
x A[i]
j i 1
while j > 0 x < A[j] do
A[j + 1] A[j]
j j 1
A[j + 1] x
One may think scan from left to right is natural. However, it isnt as eect
as above algorithm for plain array. The reason is that, its expensive to insert an
element in arbitrary position in an array. As array stores elements continuously.
If we want to insert new element x in position i, we must shift all elements after
i, including i +1, i +2, ... one position to right. After that the cell at position i
is empty, and we can put x in it. This is illustrated in gure 2.3.
A[1] A[2] ... A[i-1] A[i] A[i+1] A[i+2] ... A[n-1] A[n] empty
x
insert
Figure 2.3: Insert x to array A at position i.
If the length of array is N, this indicates we need examine the rst i elements,
then perform N i + 1 moves, and then insert x to the i-th cell. So insertion
from left to right need traverse the whole array anyway. While if we scan from
right to left, we totally examine the last j = N i + 1 elements, and perform
the same amount of moves. If j is small (e.g. less than N/2), there is possibility
to perform less operations than scan from left to right.
Translate the above algorithm to Python yields the following code.
def isort(xs):
n = len(xs)
for i in range(1, n):
x = xs[i]
j = i - 1
while j 0 and x < xs[j]:
xs[j+1] = xs[j]
j = j - 1
xs[j+1] = x
It can be found some other equivalent programs, for instance the following
ANSI C program. However this version isnt as eective as the pseudo code.
void isort(Key xs, int n){
int i, j;
for(i=1; i<n; ++i)
for(j=i-1; j0 && xs[j+1] < xs[j]; --j)
swap(xs, j, j+1);
}
52 CHAPTER 2. THE EVOLUTION OF INSERTION SORT
This is because the swapping function, which can exchange two elements
typically uses a temporary variable like the following:
void swap(Key xs, int i, int j){
Key temp = xs[i];
xs[i] = xs[j];
xs[j] = temp;
}
So the ANSI C program presented above takes 3M times assignment, where
M is the number of inner loops. While the pseudo code as well as the Python
program use shift operation instead of swapping. There are N +2 times assign-
ment.
We can also provide Insert() function explicitly, and call it from the general
insertion sort algorithm in previous section. We skip the detailed realization here
and left it as an exercise.
All the insertion algorithms are bound to O(N), where N is the length of
the sequence. No matter what dierence among them, such as scan from left
or from right. Thus the over all performance for insertion sort is quadratic as
O(N
2
).
Exercise 2.1
Provide explicit insertion function, and call it with general insertion sort
algorithm. Please realize it in both procedural way and functional way.
2.3 Improvement 1
Lets go back to the question, that why human being can nd the proper position
for insertion so quickly. We have shown a solution based on scan. Note the fact
that at any time, all cards at hands have been well sorted, another possible
solution is to use binary search to nd that location.
Well explain the search algorithms in other dedicated chapter. Binary search
is just briey introduced for illustration purpose here.
The algorithm will be changed to call a binary search procedure.
function Sort(A)
for i 2 to Length(A) do
x A[i]
p Binary-Search(A[1...i 1], x)
for j i down to p do
A[j] A[j 1]
A[p] x
Instead of scan elements one by one, binary search utilize the information
that all elements in slice of array {A
1
, ..., A
i1
} are sorted. Lets assume the
order is monotonic increase order. To nd a position j that satises A
j1

x A
j
. We can rst examine the middle element, for example, A
i/2
. If x is
less than it, we need next recursively perform binary search in the rst half of
the sequence; otherwise, we only need search in last half.
Every time, we halve the elements to be examined, this search process runs
O(lg N) time to locate the insertion position.
2.4. IMPROVEMENT 2 53
function Binary-Search(A, x)
l 1
u 1+ Length(A)
while l < u do
m
l+u
2

if A
m
= x then
return m Find a duplicated element
else if A
m
< x then
l m+ 1
else
u m
return l
The improved insertion sort algorithm is still bound to O(N
2
), compare to
previous section, which we use O(N
2
) times comparison and O(N
2
) moves, with
binary search, we just use O(N lg N) times comparison and O(N
2
) moves.
The Python program regarding to this algorithm is given below.
def isort(xs):
n = len(xs)
for i in range(1, n):
x = xs[i]
p = binary_search(xs[:i], x)
for j in range(i, p, -1):
xs[j] = xs[j-1]
xs[p] = x
def binary_search(xs, x):
l = 0
u = len(xs)
while l < u:
m = (l+u)/2
if xs[m] == x:
return m
elif xs[m] < x:
l = m + 1
else:
u = m
return l
Exercise 2.2
Write the binary search in recursive manner. You neednt use purely func-
tional programming language.
2.4 Improvement 2
Although we improve the search time to O(N lg N) in previous section, the
number of moves is still O(N
2
). The reason of why movement takes so long
time, is because the sequence is stored in plain array. The nature of array
is continuously layout data structure, so the insertion operation is expensive.
54 CHAPTER 2. THE EVOLUTION OF INSERTION SORT
This hints us that we can use linked-list setting to represent the sequence. It
can improve the insertion operation from O(N) to constant time O(1).
insert(A, x) =
_
_
_
{x} : A =
{x} A : x < A
1
{A
1
} insert({A
2
, A
3
, ...A
n
}, x) : otherwise
(2.3)
Translating the algorithm to Haskell yields the below program.
insert :: (Ord a) [a] a [a]
insert [] x = [x]
insert (y:ys) x = if x < y then x:y:ys else y:insert ys x
And we can complete the two versions of insertion sort program based on
the rst two equations in this chapter.
isort [] = []
isort (x:xs) = insert (isort xs) x
Or we can represent the recursion with folding.
isort = foldl insert []
Linked-list setting solution can also be described imperatively. Suppose
function Key(x), returns the value of element stored in node x, and Next(x)
accesses the next node in the linked-list.
function Insert(L, x)
p NIL
H L
while L = NIL Key(L) < Key(x) do
p L
L Next(L)
Next(x) L
if p = NIL then
H x
else
Next(p) x
return H
For example in ANSI C, the linked-list can be dened as the following.
struct node{
Key key;
struct node next;
};
Thus the insert function can be given as below.
struct node insert(struct node lst, struct node x){
struct node p, head;
p = NULL;
for(head = lst; lst && xkey > lstkey; lst = lstnext)
p = lst;
xnext = lst;
if(!p)
return x;
2.5. FINAL IMPROVEMENT BY BINARY SEARCH TREE 55
pnext = x;
return head;
}
Instead of using explicit linked-list such as by pointer or reference based
structure. Linked-list can also be realized by another index array. For any
array element A
i
, Next
i
stores the index of next element follows A
i
. It means
A
Next
i
is the next element after A
i
.
The insertion algorithm based on this solution is given like below.
function Insert(A, Next, i)
j
while Next
j
= NIL A
Next
j
< A
i
do
j Next
j
Next
i
Next
j
Next
j
i
Here means the head of the Next table. And the relative Python program
for this algorithm is given as the following.
def isort(xs):
n = len(xs)
next = [-1](n+1)
for i in range(n):
insert(xs, next, i)
return next
def insert(xs, next, i):
j = -1
while next[j] != -1 and xs[next[j]] < xs[i]:
j = next[j]
next[j], next[i] = i, next[j]
Although we change the insertion operation to constant time by using linked-
list. However, we have to traverse the linked-list to nd the position, which re-
sults O(N
2
) times comparison. This is because linked-list, unlike array, doesnt
support random access. It means we cant use binary search with linked-list
setting.
Exercise 2.3
Complete the insertion sort by using linked-list insertion function in your
favorate imperative programming language.
The index based linked-list return the sequence of rearranged index as
result. Write a program to re-order the original array of elements from
this result.
2.5 Final improvement by binary search tree
It seems that we drive into a corner. We must improve both the comparison
and the insertion at the same time, or we will end up with O(N
2
) performance.
56 CHAPTER 2. THE EVOLUTION OF INSERTION SORT
We must use binary search, this is the only way to improve the comparison
time to O(lg N). On the other hand, we must change the data structure, because
we cant achieve constant time insertion at a position with plain array.
This remind us about our hello world data structure, binary search tree. It
naturally support binary search from its denition. At the same time, We can
insert a new leaf in binary search tree in O(1) constant time if we already nd
the location.
So the algorithm changes to this.
function Sort(A)
T
for each x A do
T Insert-Tree(T, x)
return To-List(T)
Where Insert-Tree() and To-List() are described in previous chapter
about binary search tree.
As we have analyzed for binary search tree, the performance of tree sort is
bound to O(N lg N), which is the lower limit of comparison based sort[3].
2.6 Short summary
In this chapter, we present the evolution process of insertion sort. Insertion sort
is well explained in most textbooks as the rst sorting algorithm. It has simple
and straightforward idea, but the performance is quadratic. Some textbooks
stop here, but we want to show that there exist ways to improve it by dierent
point of view. We rst try to save the comparison time by using binary search,
and then try to save the insertion operation by changing the data structure to
linked-list. Finally, we combine these two ideas and evolute insertion sort to
tree sort.
Bibliography
[1] http://en.wikipedia.org/wiki/Bubble sort
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[3] Donald E. Knuth. The Art of Computer Programming, Volume 3: Sorting
and Searching (2nd Edition). Addison-Wesley Professional; 2 edition (May
4, 1998) ISBN-10: 0201896850 ISBN-13: 978-0201896855
Red-black tree, not so complex as it was thought
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
57
58 Red black tree
Chapter 3
Red-black tree, not so
complex as it was thought
3.1 Introduction
3.1.1 Exploit the binary search tree
We have shown the power of binary search tree by using it to count the occur-
rence of every word in Bible. The idea is to use binary search tree as a dictionary
for counting.
One may come to the idea that to feed a yellow page book
1
to a binary
search tree, and use it to look up the phone number for a contact.
By modifying a bit of the program for word occurrence counting yields the
following code.
int main(int, char ){
ifstream f("yp.txt");
map<string, string> dict;
string name, phone;
while(f>>name && f>>phone)
dict[name]=phone;
for(;;){
cout<<"nname:";
cin>>name;
if(dict.find(name)==dict.end())
cout<<"notfound";
else
cout<<"phone:"<<dict[name];
}
}
This program works well. However, if you replace the STL map with the bi-
nary search tree as mentioned previously, the performance will be bad, especially
when you search some names such as Zara, Zed, Zulu.
This is because the content of yellow page is typically listed in lexicographic
order. Which means the name list is in increase order. If we try to insert a
1
A name-phone number contact list book
59
60CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
sequence of number 1, 2, 3, ..., n to a binary search tree. We will get a tree like
in Figure 3.1.
1
2
3
...
n
Figure 3.1: unbalanced tree
This is a extreme unbalanced binary search tree. The looking up performs
O(h) for a tree with height h. In balanced case, we benet from binary search
tree by O(lg N) search time. But in this extreme case, the search time down-
graded to O(N). Its no better than a normal link-list.
Exercise 3.1
For a very big yellow page list, one may want to speed up the dictionary
building process by two concurrent tasks (threads or processes). One task
reads the name-phone pair from the head of the list, while the other one
reads from the tail. The building terminates when these two tasks meet
at the middle of the list. What will be the binary search tree looks like
after building? What if you split the the list more than two and use more
tasks?
Can you nd any more cases to exploit a binary search tree? Please
consider the unbalanced trees shown in gure 3.2.
3.1.2 How to ensure the balance of the tree
In order to avoid such case, we can shue the input sequence by randomized
algorithm, such as described in Section 12.4 in [2]. However, this method doesnt
always work, for example the input is fed from user interactively, and the tree
need to be built/updated after each input.
There are many solutions people have ever found to make binary search tree
balanced. Many of them rely on the rotation operations to binary search tree.
Rotation operations change the tree structure while maintain the ordering of
the elements. Thus it either improve or keep the balance property of the binary
search tree.
3.1. INTRODUCTION 61
n
n-1
n-2
...
1
(a)
1
2
n
3
n-1
4
...
(b)
m
m-1 m+1
m-2
...
1
m+2
...
n
(c)
Figure 3.2: Some unbalanced trees
62CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
In this chapter, well rst introduce about red-black tree, which is one of
the most popular and widely used self-adjusting balanced binary search tree. In
next chapter, well introduce about AVL tree, which is another intuitive solution;
In later chapter about binary heaps, well show another interesting tree called
splay tree, which can gradually adjust the the tree to make it more and more
balanced.
3.1.3 Tree rotation

X
a Y
b c
(a)
Y
X c
a b
(b)
Figure 3.3: Tree rotation, rotate-left transforms the tree from left side to right
side, and rotate-right does the inverse transformation.
Tree rotation is a kind of special operation that can transform the tree
structure without changing the in-order traverse result. It based on the fact
that for a specied ordering, there are multiple binary search trees correspond
to it. Figure 3.3 shows the tree rotation. For a binary search tree on the left
side, left rotate transforms it to the tree on the right, and right rotate does the
inverse transformation.
Although tree rotation can be realized in procedural way, there exists quite
simple function description if using pattern matching.
rotateL(T) =
_
node(node(a, X, b), Y, c) : pattern(T) = node(a, X, node(b, Y, c))
T : otherwise
(3.1)
rotateR(T) =
_
node(a, X, node(b, Y, c)) : pattern(T) = node(node(a, X, b), Y, c))
T : otherwise
(3.2)
However, the pseudo code dealing imperatively has to set all elds accord-
ingly.
1: function Left-Rotate(T, x)
2: p Parent(x)
3: y Right(x) Assume y = NIL
4: a Left(x)
5: b Left(y)
6: c Right(y)
3.1. INTRODUCTION 63
7: Replace(x, y)
8: Set-Children(x, a, b)
9: Set-Children(y, x, c)
10: if p = NIL then
11: T y
12: return T
13: function Right-Rotate(T, y)
14: p Parent(y)
15: x Left(y) Assume x = NIL
16: a Left(x)
17: b Right(x)
18: c Right(y)
19: Replace(y, x)
20: Set-Children(y, b, c)
21: Set-Children(x, a, y)
22: if p = NIL then
23: T x
24: return T
25: function Set-Left(x, y)
26: Left(x) y
27: if y = NIL then Parent(y) x
28: function Set-Right(x, y)
29: Right(x) y
30: if y = NIL then Parent(y) x
31: function Set-Children(x, L, R)
32: Set-Left(x, L)
33: Set-Right(x, R)
34: function Replace(x, y)
35: if Parent(x) = NIL then
36: if y = NIL then Parent(y) NIL
37: else if Left(Parent(x)) = x then Set-Left(Parent(x), y)
38: elseSet-Right(Parent(x), y)
39: Parent(x) NIL
Compare these pseudo codes with the pattern matching functions, the former
focus on the structure states changing, while the latter focus on the rotation
process. As the title of this chapter indicated, red-black tree neednt be so
complex as it was thought. Most traditional algorithm text books use the classic
procedural way to teach red-black tree, there are several cases need to deal and
all need carefulness to manipulate the node elds. However, by changing the
mind to functional settings, things get intuitive and simple. Although there is
some performance overhead.
Most of the content in this chapter is based on Chris Okasakis work in [2].
64CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
3.2 Denition of red-black tree
Red-black tree is a type of self-balancing binary search tree[3].
2
By using color
changing and rotation, red-black tree provides a very simple and straightforward
way to keep the tree balanced.
For a binary search tree, we can augment the nodes with a color eld, a node
can be colored either red or black. We call a binary search tree red-black tree
if it satises the following 5 properties[2].
1. Every node is either red or black.
2. The root is black.
3. Every leaf (NIL) is black.
4. If a node is red, then both its children are black.
5. For each node, all paths from the node to descendant leaves contain the
same number of black nodes.
Why this 5 properties can ensure the red-black tree is well balanced? Because
they have a key characteristic, the longest path from root to a leaf cant be as
2 times longer than the shortest path.
Please note the 4-th property, which means there wont be two adjacent red
nodes. so the shortest path only contains black nodes, any paths longer than
the shortest one has interval red nodes. According to property 5, all paths have
the same number of black nodes, this nally ensure there wont be any path is
2 times longer than others[3]. Figure 3.4 shows an example red-black tree.
13
8 17
1 11
6
15 25
22 27
Figure 3.4: An example red-black tree
2
Red-black tree is one of the equivalent form of 2-3-4 tree (see chapter B-tree about 2-3-4
tree). That is to say, for any 2-3-4 tree, there is at least one red-black tree has the same data
order.
3.3. INSERTION 65
All read only operations such as search, min/max are as same as in binary
search tree. While only the insertion and deletion are special.
As we have shown in word occurrence example, many implementation of
set or map container are based on red-black tree. One example is the C++
Standard library (STL)[6].
As mentioned previously, the only change in data layout is that there is color
information augmented to binary search tree. This can be represented as a data
eld in imperative languages such as C++ like below.
enum Color {Red, Black};
template <class T>
struct node{
Color color;
T key;
node left;
node right;
node parent;
};
In functional settings, we can add the color information in constructors,
below is the Haskell example of red-black tree denition.
data Color = R | B
data RBTree a = Empty
| Node Color (RBTree a) a (RBTree a)
Exercise 3.2
Can you prove that a red-black tree with n nodes has height at most
2 lg(n + 1)?
3.3 Insertion
Inserting a new node as what has been mentioned in binary search tree may
cause the tree unbalanced. The red-black properties has to be maintained, so
we need do some xing by transform the tree after insertion.
When we insert a new key, one good practice is to always insert it as a red
node. As far as the new inserted node isnt the root of the tree, we can keep all
properties except the 4-th one. that it may bring two adjacent red nodes.
Functional and procedural implementation have dierent xing methods.
One is intuitive but has some overhead, the other is a bit complex but has
higher performance. Most text books about algorithm introduce the later one.
In this chapter, we focus on the former to show how easily a red-black tree
insertion algorithm can be realized. The traditional procedural method will be
given only for comparison purpose.
As described by Chris Okasaki, there are total 4 cases which violate property
4. All of them has 2 adjacent red nodes. However, they have a uniformed form
after xing[2] as shown in gure 4.3.
Note that this transformation will move the redness one level up. So this is a
bottom-up recursive xing, the last step will make the root node red. According
66CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
@
@
@R

@
@
@I
z
y D
x C
A B
z
x D
A y
B C
x
A y
B z
C D
x
A z
y D
B C
y
x z
A B C D
Figure 3.5: 4 cases for balancing a red-black tree after insertion
3.3. INSERTION 67
to property 2, root is always black. Thus we need nal xing to revert the root
color to black.
Observing that the 4 cases and the xed result have strong pattern features,
the xing function can be dened by using the similar method we mentioned in
tree rotation. To avoid too long formula, we abbreviate Color as C, Black as
B, and Red as R.
balance(T) =
_
node(R, node(B, A, x, B), y, node(B, C, z, D)) : match(T)
T : otherwise
(3.3)
where function node() can construct a red-black tree node with 4 parameters
as color, the left child, the key and the right child. Function match() can test
if a tree is one of the 4 possible patterns as the following.
match(T) =
pattern(T) = node(B, node(R, node(R, A, x, B), y, C), z, D)
pattern(T) = node(B, node(R, A, x, node(R, B, y, C), z, D))
pattern(T) = node(B, A, x, node(R, B, y, node(R, C, z, D)))
pattern(T) = node(B, A, x, node(R, node(R, B, y, C), z, D))
With function balance() dened, we can modify the previous binary search
tree insertion functions to make it work for red-black tree.
insert(T, k) = makeBlack(ins(T, k)) (3.4)
where
ins(T, k) =
_
_
_
node(R, , k, ) : T =
balance(node(ins(L, k), Key, R)) : k < Key
balance(node(L, Key, ins(R, k))) : otherwise
(3.5)
L, R, Key represent the left child, right child and the key of a tree.
L = left(T)
R = right(T)
Key = key(T)
Function makeBlack() is dened as the following, it forces the color of a
non-empty tree to be black.
makeBlack(T) = node(B, L, Key, R) (3.6)
Summarize the above functions and use language supported pattern match-
ing features, we can come to the following Haskell program.
insert::(Ord a)RBTree a a RBTree a
insert t x = makeBlack $ ins t where
ins Empty = Node R Empty x Empty
ins (Node color l k r)
| x < k = balance color (ins l) k r
| otherwise = balance color l k (ins r) --[3]
makeBlack(Node _ l k r) = Node B l k r
68CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
balance::Color RBTree a a RBTree a RBTree a
balance B (Node R (Node R a x b) y c) z d =
Node R (Node B a x b) y (Node B c z d)
balance B (Node R a x (Node R b y c)) z d =
Node R (Node B a x b) y (Node B c z d)
balance B a x (Node R b y (Node R c z d)) =
Node R (Node B a x b) y (Node B c z d)
balance B a x (Node R (Node R b y c) z d) =
Node R (Node B a x b) y (Node B c z d)
balance color l k r = Node color l k r
Note that the balance function is changed a bit from the original denition.
Instead of passing the tree, we pass the color, the left child, the key and the
right child to it. This can save a pair of boxing and un-boxing operations.
This program doesnt handle the case of inserting a duplicated key. However,
it is possible to handle it either by overwriting, or skipping. Another option is
to augment the data with a linked list[2].
Figure 3.6 shows two red-black trees built from feeding list 11, 2, 14, 1, 7,
15, 5, 8, 4 and 1, 2, ..., 8.
14
7
2
1 5 11 15
4 8
1
2
3
4
6
5 7
8
Figure 3.6: insert results generated by the Haskell algorithm
This algorithm shows great simplicity by summarizing the uniform feature
from the four dierent unbalanced cases. It is expressive over the traditional tree
rotation approach, that even in programming languages which dont support
pattern matching, the algorithm can still be implemented by manually check
the pattern. A Scheme/Lisp program is available along with this book can be
referenced as an example.
The insertion algorithm takes O(lg N) time to insert a key to a red-black
tree which has N nodes.
Exercise 3.3
Write a program in an imperative language, such as C, C++ or python
to realize the same algorithm in this section. Note that, because there is
no language supported pattern matching, you need to test the 4 dierent
cases manually.
3.4. DELETION 69
3.4 Deletion
Remind the deletion section in binary search tree. Deletion is imperative only
for red-black tree as well. In typically practice, it often builds the tree just one
time, and performs looking up frequently after that. Okasaki explained why
he didnt provide red-black tree deletion in his work in [3]. One reason is that
deletions are much messier than insertions.
The purpose of this section is just to show that red-black tree deletion is
possible in purely functional settings, although it actually rebuilds the tree
because trees are read only in terms of purely functional data structure. In real
world, its up to the user (or actually the programmer) to adopt the proper
solution. One option is to mark the node be deleted with a ag, and perform
a tree rebuilding when the number of deleted nodes exceeds 50% of the total
number of nodes.
Not only in functional settings, even in imperative settings, deletion is more
complex than insertion. We face more cases to x. Deletion may also violate the
red black tree properties, so we need x it after the normal deletion as described
in binary search tree.
The deletion algorithm in this book are based on top of a handout in [5].
The problem only happens if you try to delete a black node, because it will
violate the 4-th property of red-black tree, which means the number of black
node in the path may decreased so that it is not uniformed black-height any
more.
When delete a black node, we can resume red-black property number 4 by
introducing a doubly-black concept[2]. It means that the although the node
is deleted, the blackness is kept by storing it in the parent node. If the parent
node is red, it turns to black, However, if it has been already black, it turns to
doubly-black.
In order to express the doubly-black node, The denition need some mod-
ication accordingly.
data Color = R | B | BB -- BB: doubly black for deletion
data RBTree a = Empty | BBEmpty -- doubly black empty
| Node Color (RBTree a) a (RBTree a)
When deleting a node, we rst perform the same deleting algorithm in bi-
nary search tree mentioned in previous chapter. After that, if the node to be
sliced out is black, we need x the tree to keep the red-black properties. Lets
denote the empty tree as , and for non-empty tree, it can be decomposed to
node(Color, L, Key, R) for its color, left sub-tree, key and the right sub-tree.
The delete function is dened as the following.
delete(T, k) = blackenRoot(del(T, k)) (3.7)
70CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
where
del(T, k) =
_

_
: T =
fixBlack
2
(node(C, del(L, k), Key, R)) : k < Key
fixBlack
2
(node(C, L, Key, del(R, k))) : k > Key
=
_
mkBlk(R) : C = B
R : otherwise
: L =
=
_
mkBlk(L) : C = B
L : otherwise
: R =
fixBlack
2
(node(C, L, k

, del(R, k

))) : otherwise
(3.8)
The real deleting happens inside function del. For the trivial case, that the
tree is empty, the deletion result is ; If the key to be deleted is less than the
key of the current node, we recursively perform deletion on its left sub-tree; if
it is bigger than the key of the current node, then we recursively delete the key
from the right sub-tree; Because it may bring doubly-blackness, so we need x
it.
If the key to be deleted is equal to the key of the current node, we need
splice it out. If one of its children is empty, we just replace the node by the
other one and reserve the blackness of this node. otherwise we cut and past the
minimum element k

= min(R) from the right sub-tree.


Function delete just forces the result tree of del to have a black root. This
is realized by function blackenRoot.
blackenRoot(T) =
_
: T =
node(B, L, Key, R) : otherwise
(3.9)
Compare with the makeBlack function, which is dened in red-black tree
insertion section, they are almost same, except for the case of empty tree. This
is only valid in deletion, because insertion cant result an empty tree, while
deletion may.
Function mkBlk is dened to reserved the blackness of a node. If the node
to be sliced isnt black, this function wont be applied, otherwise, it turns a red
node to black and turns a black node to doubly-black. This function also marks
an empty tree to doubly-black empty.
mkBlk(T) =
_

_
: T =
node(B, L, Key, R) : C = R
node(B
2
, L, Key, R) : C = B
T : otherwise
(3.10)
where means doubly-black empty node and B
2
is the doubly-black color.
Summarizing the above functions yields the following Haskell program.
delete::(Ord a)RBTree a a RBTree a
delete t x = blackenRoot(del t x) where
del Empty _ = Empty
del (Node color l k r) x
| x < k = fixDB color (del l x) k r
| x > k = fixDB color l k (del r x)
-- x == k, delete this node
| isEmpty l = if color==B then makeBlack r else r
3.4. DELETION 71
| isEmpty r = if color==B then makeBlack l else l
| otherwise = fixDB color l k (del r k) where k= min r
blackenRoot (Node _ l k r) = Node B l k r
blackenRoot _ = Empty
makeBlack::RBTree a RBTree a
makeBlack (Node B l k r) = Node BB l k r -- doubly black
makeBlack (Node _ l k r) = Node B l k r
makeBlack Empty = BBEmpty
makeBlack t = t
The nal attack to the red-black tree deletion algorithm is to realize the
fixBlack
2
function. The purpose of this function is to eliminate the doubly-
black colored node by rotation and color changing.
Lets solve the doubly-black empty node rst. For any node, If one of its
child is doubly-black empty, and the other child is non-empty, we can safely
replace the doubly-black empty with a normal empty node.
Like gure 3.7, if we are going to delete the node 4 from the tree (Instead
show the whole tree, only part of the tree is shown), the program will use a
doubly-black empty node to replace node 4. In the gure, the doubly-black
node is shown in black circle with 2 edges. It means that for node 5, it has a
doubly-black empty left child and has a right non-empty child (a leaf node with
key 6). In such case we can safely change the doubly-black empty to normal
empty node. which wont violate any red-black properties.
3
2 5
1 4 6
(a) Delete 4 from the tree.
3
2 5
1 NIL 6
(b) After 4 is sliced o, it is doubly-black empty.
3
2 5
1 6
(c) We can safely change it to normal NIL.
Figure 3.7: One child is doubly-black empty node, the other child is non-empty
On the other hand, if a node has a doubly-black empty node and the other
child is empty, we have to push the doubly-blackness up one level. For example
72CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
in gure 3.8, if we want to delete node 1 from the tree, the program will use a
doubly-black empty node to replace 1. Then node 2 has a doubly-black empty
node and has an empty right node. In such case we must mark node 2 as
doubly-black after change its left child back to empty.
3
2 5
1 4 6
(a) Delete 1 from the tree.
3
2 5
NIL 4 6
(b) After 1 is sliced o, it is doubly-black empty.
3
2 5
4 6
(c) We must push the doubly-blackness up
to node 2.
Figure 3.8: One child is doubly-black empty node, the other child is empty.
Based on above analysis, in order to x the doubly-black empty node, we
dene the function partially like the following.
fixBlack
2
(T) =
_

_
node(B
2
, , Key, ) : (L = R = ) (L = R = )
node(C, , Key, R) : L = R =
node(C, L, Key, ) : R = L =
... : ...
(3.11)
After dealing with doubly-black empty node, we need to x the case that
the sibling of the doubly-black node is black and it has one red child. In this
situation, we can x the doubly-blackness with one rotation. Actually there are
4 dierent sub-cases, all of them can be transformed to one uniformed pattern.
They are shown in the gure 3.9. These cases are described in [2] as case 3 and
case 4.
3.4. DELETION 73
Figure 3.9: Fix the doubly black by rotation, the sibling of the doubly-black
node is black, and it has one red child.
74CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
The handling of these 4 sub-cases can be dened on top of formula 3.11.
fixBlack
2
(T) =
_

_
... : ...
node(C, node(B, mkBlk(A), x, B), y, node(B, C, z, D)) : p1.1
node(C, node(B, A, x, B), y, node(B, C, z, mkBlk(D))) : p1.2
... : ...
(3.12)
where p1.1 and p1.2 each represent 2 patterns as the following.
p1.1 =
_
_
_
node(C, A, x, node(B, node(R, B, y, C), z, D)) Color(A) = B
2

node(C, A, x, node(B, B, y, node(R, C, z, D))) Color(A) = B


2
_
_
_
p1.2 =
_
_
_
node(C, node(B, A, x, node(R, B, y, C)), z, D) Color(D) = B
2

node(C, node(B, node(R, A, x, B), y, C), z, D) Color(D) = B


2
_
_
_
Besides the above cases, there is another one that not only the sibling of the
doubly-black node is black, but also its two children are black. We can change
the color of the sibling node to red; resume the doubly-black node to black and
propagate the doubly-blackness one level up to the parent node as shown in
gure 3.10. Note that there are two symmetric sub-cases. This case is described
in [2] as case 2.
We go on adding this xing after formula 3.12.
fixBlack
2
(T) =
_

_
... : ...
mkBlk(node(C, mkBlk(A), x, node(R, B, y, C))) : p2.1
mkBlk(node(C, node(R, A, x, B), y, mkBlk(C))) : p2.2
... : ...
(3.13)
where p2.1 and p2.2 are two patterns as below.
p2.1 =
_
node(C, A, x, node(B, B, y, C))
Color(A) = B
2
Color(B) = Color(C) = B
_
p2.2 =
_
node(C, node(B, A, x, B), y, C)
Color(C) = B
2
Color(A) = Color(B) = B
_
There is a nal case left, that the sibling of the doubly-black node is red.
We can do a rotation to change this case to pattern p1.1 or p1.2. Figure 3.11
shows about it.
We can nish formula 3.13 with 3.14.
fixBlack
2
(T) =
_

_
... : ...
fixBlack
2
(B, fixBlack
2
(node(R, A, x, B), y, C) : p3.1
fixBlack
2
(B, A, x, fixBlack
2
(node(R, B, y, C)) : p3.2
T : otherwise
(3.14)
3.4. DELETION 75
=
x
a y
b c
(a) Color of x can be either black or red.
x
a y
b c
(b) If x was red, then it becomes black, oth-
erwise, it becomes doubly-black.
=
y
x c
a b
(c) Color of y can be either black or red.
y
x c
a b
(d) If y was red, then it becomes black, oth-
erwise, it becomes doubly-black.
Figure 3.10: propagate the blackness up.
Figure 3.11: The sibling of the doubly-black node is red.
76CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
where p3.1 and p3.2 are two patterns as the following.
p3.1 = {Color(T) = B Color(L) = B
2
Color(R) = R}
p3.2 = {Color(T) = B Color(L) = R Color(R) = B
2
}
This two cases are described in [2] as case 1.
Fixing the doubly-black node with all above dierent cases is a recursive
function. There are two termination conditions. One contains pattern p1.1 and
p1.2, The doubly-black node was eliminated. The other cases may continuously
propagate the doubly-blackness from bottom to top till the root. Finally the
algorithm marks the root node as black anyway. The doubly-blackness will be
removed.
Put formula 3.11, 3.12, 3.13, and 3.14 together, we can write the nal Haskell
program.
fixDB::Color RBTree a a RBTree a RBTree a
fixDB color BBEmpty k Empty = Node BB Empty k Empty
fixDB color BBEmpty k r = Node color Empty k r
fixDB color Empty k BBEmpty = Node BB Empty k Empty
fixDB color l k BBEmpty = Node color l k Empty
-- the sibling is black, and it has one red child
fixDB color a@(Node BB _ _ _) x (Node B (Node R b y c) z d) =
Node color (Node B (makeBlack a) x b) y (Node B c z d)
fixDB color a@(Node BB _ _ _) x (Node B b y (Node R c z d)) =
Node color (Node B (makeBlack a) x b) y (Node B c z d)
fixDB color (Node B a x (Node R b y c)) z d@(Node BB _ _ _) =
Node color (Node B a x b) y (Node B c z (makeBlack d))
fixDB color (Node B (Node R a x b) y c) z d@(Node BB _ _ _) =
Node color (Node B a x b) y (Node B c z (makeBlack d))
-- the sibling and its 2 children are all black, propagate the blackness up
fixDB color a@(Node BB _ _ _) x (Node B b@(Node B _ _ _) y c@(Node B _ _ _))
= makeBlack (Node color (makeBlack a) x (Node R b y c))
fixDB color (Node B a@(Node B _ _ _) x b@(Node B _ _ _)) y c@(Node BB _ _ _)
= makeBlack (Node color (Node R a x b) y (makeBlack c))
-- the sibling is red
fixDB B a@(Node BB _ _ _) x (Node R b y c) = fixDB B (fixDB R a x b) y c
fixDB B (Node R a x b) y c@(Node BB _ _ _) = fixDB B a x (fixDB R b y c)
-- otherwise
fixDB color l k r = Node color l k r
The deletion algorithm takes O(lg N) time to delete a key from a red-black
tree with N nodes.
Exercise 3.4
As we mentioned in this section, deletion can be implemented by just
marking the node as deleted without actually removing it. Once the num-
ber of marked nodes exceeds 50% of the total node number, a tree re-build
is performed. Try to implement this method in your favorite programming
language.
3.5. IMPERATIVE RED-BLACK TREE ALGORITHM 77
3.5 Imperative red-black tree algorithm
We almost nished all the content in this chapter. By induction the patterns, we
can implement the red-black tree in a simple way compare to the imperative tree
rotation solution. However, we should show the comparator for completeness.
For insertion, the basic idea is to use the similar algorithm as described in
binary search tree. And then x the balance problem by rotation and return
the nal result.
1: function Insert(T, k)
2: root T
3: x Create-Leaf(k)
4: Color(x) RED
5: parent NIL
6: while T = NIL do
7: parent T
8: if k < Key(T) then
9: T Left(T)
10: else
11: T Right(T)
12: Parent(x) parent
13: if parent = NIL then tree T is empty
14: return x
15: else if k < Key(parent) then
16: Left(parent) x
17: else
18: Right(parent) x
19: return Insert-Fix(root, x)
The only dierence from the binary search tree insertion algorithm is that
we set the color of the new node as red, and perform xing before return. It is
easy to translate the pseudo code to real imperative programming language, for
instance Python
3
.
def rb_insert(t, key):
root = t
x = Node(key)
parent = None
while(t):
parent = t
if(key < t.key):
t = t.left
else:
t = t.right
if parent is None: #tree is empty
root = x
elif key < parent.key:
parent.set_left(x)
else:
parent.set_right(x)
return rb_insert_fix(root, x)
3
C, and C++ source codes are available along with this book
78CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
There are 3 base cases for xing, and if we take the left-right symmetric
into consideration. there are total 6 cases. Among them two cases can be
merged together, because they all have uncle node in red color, we can toggle
the parent color and uncle color to black and set grand parent color to red.
With this merging, the xing algorithm can be realized as the following.
1: function Insert-Fix(T, x)
2: while Parent(x) = NIL and Color(Parent(x)) = RED do
3: if Color(Uncle(x)) = RED then Case 1, xs uncle is red
4: Color(Parent(x)) BLACK
5: Color(Grand-Parent(x)) RED
6: Color(Uncle(x)) BLACK
7: x Grandparent(x)
8: else xs uncle is black
9: if Parent(x) = Left(Grand-Parent(x)) then
10: if x = Right(Parent(x)) then Case 2, x is a right child
11: x Parent(x)
12: T Left-Rotate(T, x)
Case 3, x is a left child
13: Color(Parent(x)) BLACK
14: Color(Grand-Parent(x)) RED
15: T Right-Rotate(T, Grand-Parent(x))
16: else
17: if x = Left(Parent(x)) then Case 2, Symmetric
18: x Parent(x)
19: T Right-Rotate(T, x)
Case 3, Symmetric
20: Color(Parent(x)) BLACK
21: Color(Grand-Parent(x)) RED
22: T Left-Rotate(T, Grand-Parent(x))
23: Color(T) BLACK
24: return T
This program takes O(lg N) time to insert a new key to the red-black tree.
Compare this pseudo code and the balance function we dened in previous
section, we can see the dierence. They dier not only in terms of simplicity,
but also in logic. Even if we feed the same series of keys to the two algorithms,
they may build dierent red-black trees. There is a bit performance overhead
in the pattern matching algorithm. Okasaki discussed about the dierence in
detail in his paper[2].
Translate the above algorithm to Python yields the below program.
# Fix the redred violation
def rb_insert_fix(t, x):
while(x.parent and x.parent.color==RED):
if x.uncle().color == RED:
#case 1: ((a:R x:R b) y:B c:R) = ((a:R x:B b) y:R c:B)
set_color([x.parent, x.grandparent(), x.uncle()],
[BLACK, RED, BLACK])
x = x.grandparent()
else:
if x.parent == x.grandparent().left:
if x == x.parent.right:
3.6. MORE WORDS 79
#case 2: ((a x:R b:R) y:B c) = case 3
x = x.parent
t=left_rotate(t, x)
# case 3: ((a:R x:R b) y:B c) = (a:R x:B (b y:R c))
set_color([x.parent, x.grandparent()], [BLACK, RED])
t=right_rotate(t, x.grandparent())
else:
if x == x.parent.left:
#case 2: (a x:B (b:R y:R c)) = case 3
x = x.parent
t = right_rotate(t, x)
# case 3: (a x:B (b y:R c:R)) = ((a x:R b) y:B c:R)
set_color([x.parent, x.grandparent()], [BLACK, RED])
t=left_rotate(t, x.grandparent())
t.color = BLACK
return t
Figure 3.12 shows the results of feeding same series of keys to the above
python insertion program. Compare them with gure 3.6, one can tell the
dierence clearly.
11
2 14
1 7
5 8
15
(a)
5
2 7
1 4
3
6 9
8
(b)
Figure 3.12: Red-black trees created by imperative algorithm.
We skip the red-black tree delete algorithm in imperative settings, because
it is even more complex than the insertion. The implementation of deleting is
left as an exercise of this chapter.
Exercise 3.5
Implement the red-black tree deleting algorithm in your favorite impera-
tive programming language. you can refer to [2] for algorithm details.
3.6 More words
Red-black tree is the most popular implementation of balanced binary search
tree. Another one is the AVL tree, which well introduce in next chapter. Red-
black tree can be a good start point for more data structures. If we extend the
80CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
number of children from 2 to K, and keep the balance as well, it leads to B-
tree, If we store the data along with edge but not inside node, it leads to Tries.
However, the multiple cases handling and the long program tends to make new
comers think red-black tree is complex.
Okasakis work helps making the red-black tree much easily understand.
There are many implementation in other programming languages in that manner
[7]. Its also inspired me to nd the pattern matching solution for Splay tree
and AVL tree etc.
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[2] Chris Okasaki. FUNCTIONAL PEARLS Red-Black Trees in a Functional
Setting. J. Functional Programming. 1998
[3] Chris Okasaki. Ten Years of Purely Functional Data Structures.
http://okasaki.blogspot.com/2008/02/ten-years-of-purely-functional-
data.html
[4] Wikipedia. Red-black tree. http://en.wikipedia.org/wiki/Red-black tree
[5] Lyn Turbak. Red-Black Trees. cs.wellesley.edu/ cs231/fall01/red-
black.pdf Nov. 2, 2001.
[6] SGI STL. http://www.sgi.com/tech/stl/
[7] Pattern matching. http://rosettacode.org/wiki/Pattern matching
AVL tree
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
81
82 AVL tree
Chapter 4
AVL tree
4.1 Introduction
4.1.1 How to measure the balance of a tree?
Besides red-black tree, are there any other intuitive solutions of self-balancing
binary search tree? In order to measure how balancing a binary search tree,
one idea is to compare the height of the left sub-tree and right sub-tree. If
they diers a lot, the tree isnt well balanced. Lets denote the dierence height
between two children as below
(T) = |L| |R| (4.1)
Where |T| means the height of tree T, and L, R denotes the left sub-tree
and right sub-tree.
If (T) = 0, The tree is denitely balanced. For example, a complete binary
tree has N = 2
h
1 nodes for height h. There is no empty branches unless the
leafs. Another trivial case is empty tree. () = 0. The less absolute value of
(T) the more balancing the tree is.
We dene (T) as the balance factor of a binary search tree.
4.2 Denition of AVL tree
An AVL tree is a special binary search tree, that all sub-trees satisfying the
following criteria.
|(T)| 1 (4.2)
The absolute value of balance factor is less than or equal to 1, which means
there are only three valid values, -1, 0 and 1. Figure 4.1 shows an example AVL
tree.
Why AVL tree can keep the tree balanced? In other words, Can this de-
nition ensure the height of the tree as O(lg N) where N is the number of the
nodes in the tree? Lets prove this fact.
For an AVL tree of height h, The number of nodes varies. It can have at
most 2
h
1 nodes for a complete binary tree. We are interesting about how
83
84 CHAPTER 4. AVL TREE
4
2 8
1 3 6 9
5 7 10
Figure 4.1: An example AVL tree
many nodes there are at least. Lets denote the minimum number of nodes for
height h AVL tree as N(h). Its obvious for the trivial cases as below.
For empty tree, h = 0, N(0) = 0;
For a singleton root, h = 1, N(1) = 1;
Whats the situation for common case N(h)? Figure 4.2 shows an AVL tree
T of height h. It contains three part, the root node, and two sub trees A, B.
We have the following fact.
h = max(height(L), height(R)) + 1 (4.3)
We immediately know that, there must be one child has height h 1. Lets
say height(A) = h 1. According to the denition of AVL tree, we have.
|height(A) height(B)| 1. This leads to the fact that the height of other
tree B cant be lower than h 2, So the total number of the nodes of T is the
number of nodes in tree A, and B plus 1 (for the root node). We exclaim that.
N(h) = N(h 1) +N(h 2) + 1 (4.4)
k
h-1 h-2
Figure 4.2: An AVL tree with height h, one of the sub-tree with height h 1,
the other is h 2
4.2. DEFINITION OF AVL TREE 85
This recursion reminds us the famous Fibonacci series. Actually we can
transform it to Fibonacci series by dening N

(h) = N(h) + 1. So equation 4.4


changes to.
N

(h) = N

(h 1) +N

(h 2) (4.5)
Lemma 4.2.1. Let N(h) be the minimum number of nodes for an AVL tree
with height h. and N

(h) = N(h) + 1, then


N

(h)
h
(4.6)
Where =

5+1
2
is the golden ratio.
Proof. For the trivial case, we have
h = 0, N

(0) = 1
0
= 1
h = 1, N

(1) = 2
1
= 1.618...
For the induction case, suppose N

(h)
h
.
N

(h + 1) = N

(h) +N

(h 1) {Fibonacci}

h
+
h1
=
h1
( + 1) { + 1 =
2
=

5+3
2
}
=
h+1
From Lemma 4.2.1, we immediately get
h log

(N + 1) = log

(2) lg(N + 1) 1.44 lg(N + 1) (4.7)


It tells that the height of AVL tree is proportion to O(lg N), which means
that AVL tree is balanced.
During the basic mutable tree operations such as insertion and deletion, if
the balance factor changes to any invalid value, some xing has to be performed
to resume || within 1. Most implementations utilize tree rotations. In this
chapter, well show the pattern matching solution which is inspired by Okasakis
red-black tree solution[2]. Because of this modify-xing approach, AVL tree is
also a kind of self-balancing binary search tree. For comparison purpose, well
also show the procedural algorithms.
Of course we can compute the value recursively, another option is to store
the balance factor inside each nodes, and update them when we modify the tree.
The latter one avoid computing the same value every time.
Based on this idea, we can add one data eld to the original binary search
tree as the following C++ code example
1
.
template <class T>
struct node{
int delta;
T key;
node left;
node right;
node parent;
};
1
Some implementations store the height of a tree instead of as in [5]
86 CHAPTER 4. AVL TREE
In purely functional setting, some implementation use dierent constructor
to store the information. for example in [1], there are 4 constructors, E, N,
P, Z dened. E for empty tree, N for tree with negative 1 balance factor, P for
tree with positive 1 balance factor and Z for zero case.
In this chapter, well explicitly store the balance factor inside the node.
data AVLTree a = Empty
| Br (AVLTree a) a (AVLTree a) Int
The immutable operations, including looking up, nding the maximum and
minimum elements are all same as the binary search tree. Well skip them and
focus on the mutable operations.
4.3 Insertion
Insert a new element to an AVL tree may violate the AVL tree property that the
absolute value exceeds 1. To resume it, one option is to do the tree rotation
according to the dierent insertion cases. Most implementation is based on this
approach
Another way is to use the similar pattern matching method mentioned by
Okasaki in his red-black tree implementation [2]. Inspired by this idea, it is
possible to provide a simple and intuitive solution.
When insert a new key to the AVL tree, the balance factor of the root may
changes in range [1, 1], and the height may increase at most by one, which
we need recursively use this information to update the value in upper level
nodes. We can dene the result of the insertion algorithm as a pair of data
(T

, H). Where T

is the new tree and H is the increment of height. Lets


denote function first(pair) which can return the rst element in a pair. We
can modify the binary search tree insertion algorithm as the following to handle
AVL tree.
insert(T, k) = first(ins(T, k)) (4.8)
where
ins(T, k) =
_
_
_
(node(, k, , 0), 1) : T =
(tree(ins(L, k), Key, (R, 0)), ) : k < Key
(tree((L, 0), Key, ins(R, k)), ) : otherwise
(4.9)
L, R, Key, represent the left child, right child, the key and the balance
factor of a tree.
L = left(T)
R = right(T)
Key = key(T)
= (T)
When we insert a new key k to a AVL tree T, if the tree is empty, we just
need create a leaf node with k, set the balance factor as 0, and the height is
increased by one. This is the trivial case. Function node() is dened to build a
tree by taking a left sub-tree, a right sub-tree, a key and a balance factor.
4.3. INSERTION 87
If T isnt empty, we need compare the Key with k. If k is less than the key,
we recursively insert it to the left child, otherwise we insert it into the right
child.
As we dened above, the result of the recursive insertion is a pair like
(L

, H
l
), we need do balancing adjustment as well as updating the increment
of height. Function tree() is dened to dealing with this task. It takes 4 pa-
rameters as (L

, H
l
), Key, (R

, H
r
), and . The result of this function is
dened as (T

, H), where T

is the new tree after adjustment, and H is the


new increment of height which is dened as
H = |T

| |T| (4.10)
This can be further detailed deduced in 4 cases.
H = |T

| |T|
= 1 +max(|R

|, |L

|) (1 +max(|R|, |L|))
= max(|R

|, |L

|) max(|R|, |L|)
=
_

_
H
r
: 0

0
+ H
r
: 0

0
H
l
: 0

0
H
l
: otherwise
(4.11)
To prove this equation, note the fact that the height cant increase both in
left and right with only one insertion.
These 4 cases can be explained from the denition of balance factor denition
that it equal to the dierence from the right sub tree and left sub tree.
If 0 and

0, it means that the height of right sub tree isnt less


than the height of left sub tree both before insertion and after insertion.
In this case, the increment in height of the tree is only contributed from
the right sub tree, which is H
r
.
If 0, which means the height of left sub tree isnt less than the height
of right sub tree before, and it becomes

0, which means that the


height of right sub tree increases due to insertion, and the left side keeps
same (|L

| = |L|). So the increment in height is


H = max(|R

|, |L

|) max(|R|, |L|) { 0

0}
= |R

| |L

| {|L| = |L

|}
= |R| + H
r
|L|
= + H
r
For the case 0

0, Similar as the second one, we can get.


H = max(|R

|, |L

|) max(|R|, |L|) { 0

0}
= |L

| |R|
= |L| + H
l
|R|
= H
l

For the last case, the both and

is no bigger than zero, which means


the height left sub tree is always greater than or equal to the right sub
tree, so the increment in height is only contributed from the right sub
tree, which is H
l
.
88 CHAPTER 4. AVL TREE
The next problem in front of us is how to determine the new balancing factor
value

before performing balancing adjustment. According to the denition


of AVL tree, the balancing factor is the height of right sub tree minus the height
of right sub tree. We have the following facts.

= |R

| |L

|
= |R| + H
r
(|L| + H
l
)
= |R| |L| + H
r
H
l
= + H
r
H
l
(4.12)
With all these changes in height and balancing factor get clear, its possible
to dene the tree() function mentioned in (4.9).
tree((L

, H
l
), Key, (R

, H
r
), ) = balance(node(L

, Key, R

), H)
(4.13)
Before we moving into details of balancing adjustment, lets translate the
above equations to real programs in Haskell.
First is the insert function.
insert::(Ord a)AVLTree a a AVLTree a
insert t x = fst $ ins t where
ins Empty = (Br Empty x Empty 0, 1)
ins (Br l k r d)
| x < k = tree (ins l) k (r, 0) d
| x == k = (Br l k r d, 0)
| otherwise = tree (l, 0) k (ins r) d
Here we also handle the case that inserting a duplicated key (which means
the key has already existed.) as just overwriting.
tree::(AVLTree a, Int) a (AVLTree a, Int) Int (AVLTree a, Int)
tree (l, dl) k (r, dr) d = balance (Br l k r d, delta) where
d = d + dr - dl
delta = deltaH d d dl dr
And the denition of height increment is as below.
deltaH :: Int Int Int Int Int
deltaH d d dl dr
| d 0 && d 0 = dr
| d 0 && d 0 = d+dr
| d 0 && d 0 = dl - d
| otherwise = dl
4.3.1 Balancing adjustment
As the pattern matching approach is adopted in doing re-balancing. We need
consider what kind of patterns violate the AVL tree property.
Figure 4.3 shows the 4 cases which need x. For all these 4 cases the bal-
ancing factors are either -2, or +2 which exceed the range of [1, 1]. After
balancing adjustment, this factor turns to be 0, which means the height of left
sub tree is equal to the right sub tree.
We call these four cases left-left lean, right-right lean, right-left lean, and left-
right lean cases in clock-wise direction from top-left. We denote the balancing
4.3. INSERTION 89
@
@
@R

@
@
@I
(z) = 2
(y) = 1
(x) = 2
(y) = 1
(z) = 2
(x) = 1
(x) = 2
(z) = 1

(y) = 0
z
y D
x C
A B
z
x D
A y
B C
x
A y
B z
C D
x
A z
y D
B C
y
x z
A B C D
Figure 4.3: 4 cases for balancing a AVL tree after insertion
90 CHAPTER 4. AVL TREE
factor before xing as (x), (y), and (z), while after xing, they changes to

(x),

(y), and

(z) respectively.
Well next prove that, after xing, we have (y) = 0 for all four cases, and
well provide the result values of

(x) and

(z).
Left-left lean case
As the structure of sub tree x doesnt change due to xing, we immediately get

(x) = (x).
Since (y) = 1 and (z) = 2, we have
(y) = |C| |x| = 1 |C| = |x| 1
(z) = |D| |y| = 2 |D| = |y| 2
(4.14)
After xing.

(z) = |D| |C| {From(4.14)}


= |y| 2 (|x| 1)
= |y| |x| 1 {x is child of y |y| |x| = 1}
= 0
(4.15)
For

(y), we have the following fact after xing.

(y) = |z| |x|


= 1 +max(|C|, |D|) |x| {By (4.15), we have|C| = |D|}
= 1 +|C| |x| {By (4.14)}
= 1 +|x| 1 |x|
= 0
(4.16)
Summarize the above results, the left-left lean case adjust the balancing
factors as the following.

(x) = (x)

(y) = 0

(z) = 0
(4.17)
Right-right lean case
Since right-right case is symmetric to left-left case, we can easily achieve the
result balancing factors as

(x) = 0

(y) = 0

(z) = (z)
(4.18)
Right-left lean case
First lets consider

(x). After balance xing, we have.

(x) = |B| |A| (4.19)


4.3. INSERTION 91
Before xing, if we calculate the height of z, we can get.
|z| = 1 +max(|y|, |D|) {(z) = 1 |y| > |D|}
= 1 +|y|
= 2 +max(|B|, |C|)
(4.20)
While since (x) = 2, we can deduce that.
(x) = 2 |z| |A| = 2 {By (4.20)}
2 +max(|B|, |C|) |A| = 2
max(|B|, |C|) |A| = 0
(4.21)
If (y) = 1, which means |C| |B| = 1, it means
max(|B|, |C|) = |C| = |B| + 1 (4.22)
Take this into (4.21) yields
|B| + 1 |A| = 0 |B| |A| = 1 {By (4.19) }

(x) = 1
(4.23)
If (y) = 1, it means max(|B|, |C|) = |B|, taking this into (4.21), yields.
|B| |A| = 0 {By (4.19)}

(x) = 0
(4.24)
Summarize these 2 cases, we get relationship of

(x) and (y) as the follow-


ing.

(x) =
_
1 : (y) = 1
0 : otherwise
(4.25)
For

(z) according to denition, it is equal to.

(z) = |D| |C| {(z) = 1 = |D| |y|}


= |y| |C| 1 {|y| = 1 +max(|B|, |C|)}
= max(|B|, |C|) |C|
(4.26)
If (y) = 1, then we have |C| |B| = 1, so max(|B|, |C|) = |B| = |C| +1.
Takes this into (4.26), we get

(z) = 1.
If (y) = 1, then max(|B|, |C|) = |C|, we get

(z) = 0.
Combined these two cases, the relationship between

(z) and (y) is as


below.

(z) =
_
1 : (y) = 1
0 : otherwise
(4.27)
Finally, for

(y), we deduce it like below.

(y) = |z| |x|


= max(|C|, |D|) max(|A|, |B|)
(4.28)
There are three cases.
92 CHAPTER 4. AVL TREE
If (y) = 0, it means |B| = |C|, and according to (4.25) and (4.27), we
have

(x) = 0 |A| = |B|, and

(z) = 0 |C| = |D|, these lead to

(y) = 0.
If (y) = 1, From (4.27), we have

(z) = 0 |C| = |D|.

(y) = max(|C|, |D|) max(|A|, |B|) {|C| = |D|}


= |C| max(|A|, |B|) {From (4.25):

(x) = 1 |B| |A| = 1}


= |C| (|B| + 1) {(y) = 1 |C| |B| = 1}
= 0
If (y) = 1, From (4.25), we have

(x) = 0 |A| = |B|.

(y) = max(|C|, |D|) max(|A|, |B|) {|A| = |B|}


= max(|C|, |D|) |B| {From (4.27): |D| |C| = 1}
= |C| + 1 |B| {(y) = 1 |C| |B| = 1}
= 0
Both three cases lead to the same result that

(y) = 0.
Collect all the above results, we get the new balancing factors after xing as
the following.

(x) =
_
1 : (y) = 1
0 : otherwise

(y) = 0

(z) =
_
1 : (y) = 1
0 : otherwise
(4.29)
Left-right lean case
Left-right lean case is symmetric to the Right-left lean case. By using the similar
deduction, we can nd the new balancing factors are identical to the result in
(4.29).
4.3.2 Pattern Matching
All the problems have been solved and its time to dene the nal pattern
matching xing function.
balance(T, H) =
_

_
(node(node(A, x, B, (x)), y, node(C, z, D, 0), 0), 0) : P
ll
(T)
(node(node(A, x, B, 0), y, node(C, z, D, (z)), 0), 0) : P
rr
(T)
(node(node(A, x, B,

(x)), y, node(C, z, D,

(z)), 0), 0) : P
rl
(T) P
lr
(T)
(T, H) : otherwise
(4.30)
Where P
ll
(T) means the pattern of tree T is left-left lean respectively.

(x)
and delta

(z) are dened in (4.29). The four patterns are tested as below.
P
ll
(T) = node(node(node(A, x, B, (x)), y, C, 1), z, D, 2)
P
rr
(T) = node(A, x, node(B, y, node(C, z, D, (z)), 1), 2)
P
rl
(T) = node(node(A, x, node(B, y, C, (y)), 1), z, D, 2)
P
lr
(T) = node(A, x, node(node(B, y, C, (y)), z, D, 1), 2)
(4.31)
4.3. INSERTION 93
Translating the above function denition to Haskell yields a simple and in-
tuitive program.
balance :: (AVLTree a, Int) (AVLTree a, Int)
balance (Br (Br (Br a x b dx) y c (-1)) z d (-2), _) =
(Br (Br a x b dx) y (Br c z d 0) 0, 0)
balance (Br a x (Br b y (Br c z d dz) 1) 2, _) =
(Br (Br a x b 0) y (Br c z d dz) 0, 0)
balance (Br (Br a x (Br b y c dy) 1) z d (-2), _) =
(Br (Br a x b dx) y (Br c z d dz) 0, 0) where
dx = if dy == 1 then -1 else 0
dz = if dy == -1 then 1 else 0
balance (Br a x (Br (Br b y c dy) z d (-1)) 2, _) =
(Br (Br a x b dx) y (Br c z d dz) 0, 0) where
dx = if dy == 1 then -1 else 0
dz = if dy == -1 then 1 else 0
balance (t, d) = (t, d)
The insertion algorithm takes time proportion to the height of the tree, and
according to the result we proved above, its performance is O(lg N) where N is
the number of elements stored in the AVL tree.
Verication
One can easily create a function to verify a tree is AVL tree. Actually we need
verify two things, rst, its a binary search tree; second, it satises AVL tree
property.
We left the rst verication problem as an exercise to the reader.
In order to test if a binary tree satises AVL tree property, we can test
the dierence in height between its two children, and recursively test that both
children conform to AVL property until we arrive at an empty leaf.
avl?(T) =
_
True : T =
avl?(L) avl?(R) ||R| |L|| 1 : otherwise
(4.32)
And the height of a AVL tree can also be calculate from the denition.
|T| =
_
0 : T =
1 +max(|R|, |L|) : otherwise
(4.33)
The corresponding Haskell program is given as the following.
isAVL :: (AVLTree a) Bool
isAVL Empty = True
isAVL (Br l _ r d) = and [isAVL l, isAVL r, abs (height r - height l) 1]
height :: (AVLTree a) Int
height Empty = 0
height (Br l _ r _) = 1 + max (height l) (height r)
Exercise 4.1
Write a program to verify a binary tree is a binary search tree in your
favorite programming language. If you choose to use an imperative language,
please consider realize this program without recursion.
94 CHAPTER 4. AVL TREE
4.4 Deletion
As we mentioned before, deletion doesnt make signicant sense in purely func-
tional settings. As the tree is read only, its typically performs frequently looking
up after build.
Even if we implement deletion, its actually re-building the tree as we pre-
sented in chapter of red-black tree. We left the deletion of AVL tree as an
exercise to the reader.
Exercise 4.2
Take red-black tree deletion algorithm as an example, write the AVL tree
deletion program in purely functional approach in your favorite program-
ming language.
Write the deletion algorithm in imperative approach in your favorite pro-
gramming language.
4.5 Imperative AVL tree algorithm
We almost nished all the content in this chapter about AVL tree. However, it
necessary to show the traditional insert-and-rotate approach as the comparator
to pattern matching algorithm.
Similar as the imperative red-black tree algorithm, the strategy is rst to do
the insertion as same as for binary search tree, then x the balance problem by
rotation and return the nal result.
1: function Insert(T, k)
2: root T
3: x Create-Leaf(k)
4: (x) 0
5: parent NIL
6: while T = NIL do
7: parent T
8: if k < Key(T) then
9: T Left(T)
10: else
11: T Right(T)
12: Parent(x) parent
13: if parent = NIL then tree T is empty
14: return x
15: else if k < Key(parent) then
16: Left(parent) x
17: else
18: Right(parent) x
19: return AVL-Insert-Fix(root, x)
Note that after insertion, the height of the tree may increase, so that the
balancing factor may also change, insert on right side will increase by 1,
4.5. IMPERATIVE AVL TREE ALGORITHM 95
while insert on left side will decrease it. By the end of this algorithm, we need
perform bottom-up xing from node x towards root.
We can translate the pseudo code to real programming language, such as
Python
2
.
def avl_insert(t, key):
root = t
x = Node(key)
parent = None
while(t):
parent = t
if(key < t.key):
t = t.left
else:
t = t.right
if parent is None: #tree is empty
root = x
elif key < parent.key:
parent.set_left(x)
else:
parent.set_right(x)
return avl_insert_fix(root, x)
This is a top-down algorithm search the tree from root down to the proper
position and insert the new key as a leaf. By the end of this algorithm, it calls
xing procedure, by passing the root and the new node inserted.
Note that we reuse the same methods of set left() and set right() as we
dened in chapter of red-black tree.
In order to resume the AVL tree balance property by xing, we rst deter-
mine if the new node is inserted on left hand or right hand. If it is on left, the
balancing factor decreases, otherwise it increases. If we denote the new value
as

, there are 3 cases of the relationship between and

.
If || = 1 and |

| = 0, this means adding the new node makes the tree


perfectly balanced, the height of the parent node doesnt change, the al-
gorithm can be terminated.
If || = 0 and |

| = 1, it means that either the height of left sub tree or


right sub tree increases, we need go on check the upper level of the tree.
If || = 1 and |

| = 2, it means the AVL tree property is violated due to


the new insertion. We need perform rotation to x it.
1: function AVL-Insert-Fix(T, x)
2: while Parent(x) = NIL do
3: (Parent(x))
4: if x = Left(Parent(x)) then
5:

1
6: else
7:

+ 1
8: (Parent(x))

9: P Parent(x)
2
C and C++ source code are available along with this book
96 CHAPTER 4. AVL TREE
10: L Left(x)
11: R Right(x)
12: if || = 1 and |

| = 0 then Height doesnt change, terminates.


13: return T
14: else if || = 0 and |

| = 1 then Go on bottom-up updating.


15: x P
16: else if || = 1 and |

| = 2 then
17: if

= 2 then
18: if (R) = 1 then Right-right case
19: (P) 0 By (4.18)
20: (R) 0
21: T Left-Rotate(T, P)
22: if (R) = 1 then Right-left case
23:
y
(Left(R)) By (4.29)
24: if
y
= 1 then
25: (P) 1
26: else
27: (P) 0
28: (Left(R)) 0
29: if
y
= 1 then
30: (R) 1
31: else
32: (R) 0
33: T Right-Rotate(T, R)
34: T Left-Rotate(T, P)
35: if

= 2 then
36: if (L) = 1 then Left-left case
37: (P) 0
38: (L) 0
39: Right-Rotate(T, P)
40: else Left-Right case
41:
y
(Right(L))
42: if
y
= 1 then
43: (L) 1
44: else
45: (L) 0
46: (Right(L)) 0
47: if
y
= 1 then
48: (P) 1
49: else
50: (P) 0
51: Left-Rotate(T, L)
52: Right-Rotate(T, P)
53: break
54: return T
Here we reuse the rotation algorithms mentioned in red-black tree chapter.
Rotation operation doesnt update balancing factor at all, However, since
rotation changes (actually improves) the balance situation we should update
4.5. IMPERATIVE AVL TREE ALGORITHM 97
these factors. Here we refer the results from above section. Among the four
cases, right-right case and left-left case only need one rotation, while right-left
case and left-right case need two rotations.
The relative python program is shown as the following.
def avl_insert_fix(t, x):
while x.parent is not None:
d2 = d1 = x.parent.delta
if x == x.parent.left:
d2 = d2 - 1
else:
d2 = d2 + 1
x.parent.delta = d2
(p, l, r) = (x.parent, x.parent.left, x.parent.right)
if abs(d1) == 1 and abs(d2) == 0:
return t
elif abs(d1) == 0 and abs(d2) == 1:
x = x.parent
elif abs(d1)==1 and abs(d2) == 2:
if d2 == 2:
if r.delta == 1: # Right-right case
p.delta = 0
r.delta = 0
t = left_rotate(t, p)
if r.delta == -1: # Right-Left case
dy = r.left.delta
if dy == 1:
p.delta = -1
else:
p.delta = 0
r.left.delta = 0
if dy == -1:
r.delta = 1
else:
r.delta = 0
t = right_rotate(t, r)
t = left_rotate(t, p)
if d2 == -2:
if l.delta == -1: # Left-left case
p.delta = 0
l.delta = 0
t = right_rotate(t, p)
if l.delta == 1: # Left-right case
dy = l.right.delta
if dy == 1:
l.delta = -1
else:
l.delta = 0
l.right.delta = 0
if dy == -1:
p.delta = 1
else:
p.delta = 0
t = left_rotate(t, l)
98 CHAPTER 4. AVL TREE
t = right_rotate(t, p)
break
return t
We skip the AVL tree deletion algorithm and left this as an exercise to the
reader.
4.6 Chapter note
AVL tree was invented in 1962 by Adelson-Velskii and Landis[3], [4]. The name
AVL tree comes from the two inventors name. Its earlier than red-black tree.
Its very common to compare AVL tree and red-black tree, both are self-
balancing binary search trees, and for all the major operations, they both con-
sume O(lg N) time. From the result of (4.7), AVL tree is more rigidly balanced
hence they are faster than red-black tree in looking up intensive applications
[3]. However, red-black trees could perform better in frequently insertion and
removal cases.
Many popular self-balancing binary search tree libraries are implemented on
top of red-black tree such as STL etc. However, AVL tree provides an intuitive
and eective solution to the balance problem as well.
After this chapter, well extend the tree data structure from storing data in
node to storing information on edges, which leads to Trie and Patrica, etc. If
we extend the number of children from two to more, we can get B-tree. These
data structures will be introduced next.
Bibliography
[1] Data.Tree.AVL http://hackage.haskell.org/packages/archive/AvlTree/4.2/doc/html/Data-
Tree-AVL.html
[2] Chris Okasaki. FUNCTIONAL PEARLS Red-Black Trees in a Functional
Setting. J. Functional Programming. 1998
[3] Wikipedia. AVL tree. http://en.wikipedia.org/wiki/AVL tree
[4] Guy Cousinear, Michel Mauny. The Functional Approach to Program-
ming. Cambridge University Press; English Ed edition (October 29, 1998).
ISBN-13: 978-0521576819
[5] Pavel Grafov. Implementation of an AVL tree in Python.
http://github.com/pgrafov/python-avl-tree
Trie and Patricia with Functional and imperative implementation
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
99
100 Trie and Patricia
Chapter 5
Trie and Patricia with
Functional and imperative
implementation
5.1 abstract
Trie and Patricia are important data structures in information retrieving and
manipulating. None of these data structures are new. They were invented in
1960s. This post collects some existing knowledge about them. Some functional
and imperative implementation are given in order to show the basic idea of these
data structures. There are multiple programming languages used, including,
C++, Haskell, python and scheme/lisp. C++ and python are mostly used to
show the imperative implementation, while Haskell and Scheme are used for
functional purpose.
There may be mistakes in the post, please feel free to point out.
This post is generated by L
A
T
E
X2

, and provided with GNU FDL(GNU Free


Documentation License). Please refer to http://www.gnu.org/copyleft/fdl.html
for detail.
Keywords: Trie, Patricia, Radix tree
5.2 Introduction
There isnt a separate chapter about Trie or Patricia in CLRS book. While
these data structure are very basic, especially in information retrieving. Some
of them are also widely used in compiler design[2], and bio-information area,
such as DNA pattern matching [2].
101
102CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
In CLRS book index, Trie is redirected to Radix tree, while Radix tree is
described in Problem 12-1 [2].
Figure 5.1: an Radix tree example in CLRS
Figure 5.1 shows a radix tree contains the bit strings 1011, 10, 011, 100 and
0. When searching for a key k = b
0
b
1
...b
n
, we take the rst bit b
0
(MSB from
left), check if it is 0 or 1, if it is 0, we turn left, and turn right for 1. Then we
take the 2nd bit and repeat this search until we either meet a leaf or nish all
n bits.
Note that radix tree neednt store key in node at all. The information are
represented by edges in fact. The node with key string in the above gure are
only for illustration.
It is very natural to come to the idea is it possible to represent keys in
integers instead of string, because integer can be denoted in binary format?.
Such approach can save spaces and it is fast if we can use bit-wise manipulation.
Ill rst show the integer based Trie data structure and implementation in
section 5.3. Then we can point out the drawbacks and go to the improved data
structure of integer based Patricia in section 5.4. After that, Ill show alphabetic
Trie and Patricia and list some typical use of them in textual manipulation
engineering problems.
This article provides example implementation of Trie and Patricia in C,
C++, Haskell, Python, and Scheme/Lisp languages. Some functional imple-
mentation can be referenced from current Haskell packages ?? ??.
All source code can be downloaded in appendix 8.7, please refer to appendix
for detailed information about build and run.
5.3 Integer Trie
Lets give a denition of the data structure in gure 5.1. To be more accurate,
it should be called as binary trie. a binary trie is a binary tree in which the
placement of each key is controlled by its bits, each 0 means go left at the next
node and each 1 means go right at the next node[2].
Because integers can be represented in binary format in computer, it is
possible to store integer keys rather than 0,1 strings. When we insert an integer
as a new key to the trie, we take rst bit, if it is 0, we recursively insert the
other bits to the left sub tree; if it is 1, we insert into right sub tree.
5.3. INTEGER TRIE 103
However, there is a problem if we treat the key as integer. Consider a binary
trie shows in gure 5.2. If the keys are represented in string based on 0 and
1, all the three keys are dierent. While if they are turned into integers, they
are identical. So if we want to insert a data with integer key 3, where should
we put it into the trie?
0 1
0 1
1
0011
1
011
1
11
1
Figure 5.2: a big-endian trie
One approach is to treat all prex zero as eective bits. Suppose an integer
is represented in 32-bits, If we want to insert key 1 to a trie, it will end up with
a 32-levels tree. There are 31 node has only 1 left sub tree and the last node
has a right sub tree. It is very inecient in terms of space.
Chris Okasaki shows a method to solve this problem in [2]. Instead of normal
big-endian integer, we can use little-endian integer as key. By using little-endian,
decimal integer 1 is represent as binary 1, if we insert it to an empty binary trie,
we get a trie with a root and a right leaf. There is only 1 level. For integer 3, it
is 11 in binary, we neednt add any prex 0, the position in the trie is unique.
5.3.1 Denition of Integer Trie
Trie is invented by Edward Fredkin. It comes from retrieval, pronounce as
/tri:/ by the inventor, while it is pronounced /trai/ try by other authors [4].
The denition of little-endian binary trie is simple, we can reuse the structure
of binary tree, with its left sub tree to store 0 part and right tree to store 1 part.
The augment data can be stored as value.
Denition of little-endian integer Trie in C++
We can utilize C++ template to abstract the value stored in Trie. The type of
key is integer. Each node contains a value, a left child and a right child.
104CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
template<class T>
struct IntTrie{
IntTrie():value(), left(0), right(0){}
~IntTrie(){
delete left;
delete right;
}
T value;
IntTrie left;
IntTrie right;
};
In order to simplify the release, recursive destruction is added.
Denition of little-endian integer Trie in Haskell
In trie, since a node may not contains value, so we use Haskell Maybe data to
represent this situation. A IntTrie node is either an empty node, or a branch
node. The branch node contains a left child a Maybe value and a right child.
data IntTrie a = Empty
| Branch (IntTrie a) (Maybe a) (IntTrie a) -- left, value, right
type Key = Int
-- helpers
left :: IntTrie a IntTrie a
left (Branch l _ _) = l
left Empty = Empty
right :: IntTrie a IntTrie a
right (Branch _ _ r) = r
right Empty = Empty
value :: IntTrie a Maybe a
value (Branch _ v _) = v
value Empty = Nothing
In order to access the children and value some helper functions are given.
Denition of little-endian integer Trie in Python
The denition of integer trie in Python is shown as below. All elds are initial-
ized as empty values.
class IntTrie:
def __init__(self):
self.value = None
self.left = self.right = None
left child and right child represent sub Trie branches, and value is used to
store actual data.
Denition of little-endian integer Trie in Scheme/Lisp
In Scheme/Lisp, we provides some helper functions to create and access Trie
data. The underground data structure is still list.
5.3. INTEGER TRIE 105
;; Definition
(define (make-int-trie l v r) ;; left, value, right
(list l v r))
;; Helpers
(define (left trie)
(if (null? trie) () (car trie)))
(define (value trie)
(if (null? trie) () (cadr trie)))
(define (right trie)
(if (null? trie) () (caddr trie)))
Some helper functions are provided for easy access to children and value.
5.3.2 Insertion of integer trie
Iterative insertion algorithm
Since the key is little-endian, when we insert a key into trie, we take the bit
from right most (LSB). If it is 0, we go to the left child, and go to right for 1.
if the child is empty, we need create new node, and repeat this until meet the
last bit (MSB) of the integer. Below is the iterative algorithm of insertion.
1: procedure INT-TRIE-INSERT(T, x, data)
2: if T = NIL then
3: T EmptyNode
4: p T
5: while x = 0 do
6: if EV EN(x) = TRUE then
7: p LEFT(p)
8: else
9: p RIGHT(p)
10: if p = NIL then
11: p EmptyNode
12: x x/2
13: DATA(p) data
Insertion of integer Trie in C++
With C++ language we can speed up the above even/odd test and key update
with bit-wise operation.
template<class T>
IntTrie<T> insert(IntTrie<T> t, int key, T value=T()){
if(!t)
t = new IntTrie<T>();
IntTrie<T> p = t;
while(key){
if( (key&0x1) == 0){
106CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
if(!pleft) pleft = new IntTrie<T>();
p = pleft;
}
else{
if(!pright) pright = new IntTrie<T>();
p = pright;
}
key>>=1;
}
pvalue = value;
return t;
}
In order to verify this program, some helper functions are provided to sim-
plify the repeat insertions. And we also provide a function to convert the Trie
to readable string.
template<class T, class Iterator>
IntTrie<T> list_to_trie(Iterator first, Iterator last){
IntTrie<T> t(0);
for(;first!=last; ++first)
t = insert(t, first);
return t;
}
template<class T, class Iterator>
IntTrie<T> map_to_trie(Iterator first, Iterator last){
IntTrie<T> t(0);
for(;first!=last; ++first)
t = insert(t, firstfirst, firstsecond);
return t;
}
template<class T>
std::string trie_to_str(IntTrie<T> t, int prefix=0, int depth=0){
std::stringstream s;
s<<"("<<prefix;
if(tvalue!=T())
s<<":"<<tvalue;
if(tleft)
s<<","<<trie_to_str(tleft, prefix, depth+1);
if(tright)
s<<","<<trie_to_str(tright, (1<<depth)+prefix, depth+1);
s<<")";
return s.str();
}
Function list to trie just inserts keys, all values are treated as default value
as type T, while map to trie inserts both keys and values repeatedly. Function
trie to str helps to convert a Trie to literal string in a modied pre-order traverse.
The verication cases are as the following.
const int lst[] = {1, 4, 5};
std::list<int> l(lst, lst+sizeof(lst)/sizeof(int));
IntTrie<int> ti = list_to_trie<int, std::list<int>::iterator>(l.begin(), l.end());
5.3. INTEGER TRIE 107
std::copy(l.begin(), l.end(),
std::ostream_iterator<int>(std::cout, ","));
std::cout<<"="<<trie_to_str(ti)<<"n";
IntTrie<char> tc;
typedef std::list<std::pair<int, char> > Dict;
const int keys[] = {4, 1, 5, 9};
const char vals[] = "bacd";
Dict m;
for(int i=0; i<sizeof(keys)/sizeof(int); ++i)
m.push_back(std::make_pair(keys[i], vals[i]));
tc = map_to_trie<char, Dict::iterator>(m.begin(), m.end());
std::copy(keys, keys+sizeof(keys)/sizeof(int),
std::ostream_iterator<int>(std::cout, ","));
std::cout<<"="<<trie_to_str(tc);
The above code lines will output results in console like this.
1, 4, 5, ==>(0, (0, (0, (4))), (1, (1, (5))))
4, 1, 5, 9, ==>(0, (0, (0, (4:b))), (1:a, (1, (1, (9:d)), (5:c))))
Insertion of integer trie in Python
Imperative implementation of insertion in Python can be easily given by trans-
lating the pseudo code of the algorithm.
def trie_insert(t, key, value = None):
if t is None:
t = IntTrie()
p = t
while key != 0:
if key & 1 == 0:
if p.left is None:
p.left = IntTrie()
p = p.left
else:
if p.right is None:
p.right = IntTrie()
p = p.right
key = key>>1
p.value = value
return t
In order to test this insertion program, some test helpers are provided:
def trie_to_str(t, prefix=0, depth=0):
to_str = lambda x: "%s" %x
str="("+to_str(prefix)
if t.value is not None:
str += ":"+t.value
if t.left is not None:
str += ","+trie_to_str(t.left, prefix, depth+1)
if t.right is not None:
str += ","+trie_to_str(t.right, (1<<depth)+prefix, depth+1)
str+=")"
108CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
return str
def list_to_trie(l):
t = None
for x in l:
t = trie_insert(t, x)
return t
def map_to_trie(m):
t = None
for k, v in m.items():
t = trie_insert(t, k, v)
return t
Function trie to str can print the contents of trie in pre-order, It doesnt
only print the value of the node, but also print the edge information.
Function list to trie can repeatedly insert a list of keys into a trie, since the
default parameter of value is None, so all data relative to keys are empty. If the
data isnt empty, function map to trie can insert a list of key-value pairs into
the trie.
Then a test class is given to encapsulate test cases:
class IntTrieTest:
def run(self):
self.test_insert()
def test_insert(self):
t = None
t = trie_insert(t, 0);
t = trie_insert(t, 1);
t = trie_insert(t, 4);
t = trie_insert(t, 5);
print trie_to_str(t)
t1 = list_to_trie([1, 4, 5])
print trie_to_str(t1)
t2 = map_to_trie({4:b, 1:a, 5:c, 9:d})
print trie_to_str(t2)
if __name__ == "__main__":
IntTrieTest().run()
Running this program will print the following result.
(0, (0, (0, (4))), (1, (1, (5))))
(0, (0, (0, (4))), (1, (1, (5))))
(0, (0, (0, (4:b))), (1:a, (1, (1, (9:d)), (5:c))))
Please note that by pre-order traverse trie, we can get a lexical order result
of the keys. For instance, the last result print the little-endian format of key 1,
4, 5, 9 as below.
001
1
1001
101
5.3. INTEGER TRIE 109
They are in lexical order. Well go back to this feature of trie later in
alphabetic trie section.
Recursive insertion algorithm
Insertion can also be implemented in a recursive manner. If the LSB is 0, it
means that the key to be inserted is an even number, we recursively insert the
date to left child, we can divide the number by 2 and round to integer to get
rid of the LSB. If the LSB is 1, the key is then an odd number, the recursive
insertion will be happened to right child. This algorithm is described as below.
1: function INT-TRIE-INSERT(T, x, data)
2: if T = NIL then
3: T CREATE EMPTY NODE()
4: if x = 0 then
5: V ALUE(T) data
6: else
7: if EV EN(x) then
8: LEFT(T) INT TRIE INSERT

(LEFT(T), x/2, data)


9: else
10: RIGHT(T) INTTRIEINSERT

(RIGHT(T), x/2, data)


11: return T
Insertion of integer Trie in Haskell
To simplify the problem, If user insert a data with key already exists, we just
overwrite the previous stored data. This approach can be easily replaced with
other methods, such as storing data as a linked list etc.
Insertion integer key into a trie can be implemented with Haskell as below.
insert :: IntTrie a Key a IntTrie a
insert t 0 x = Branch (left t) (Just x) (right t)
insert t k x =
if even k
then Branch (insert (left t) (k div 2) x) (value t) (right t)
else Branch (left t) (value t) (insert (right t) (k div 2) x)
If the key is zero, we just insert the data to current node, in other cases, the
program go down along the trie according to the last bit of the key.
To test this program, some test helper functions are provided.
fromList :: [(Key, a)] IntTrie a
fromList xs = foldl ins Empty xs where
ins t (k, v) = insert t k v
-- k = .. a2, a1, a0 = k = ai m + k, where m=2^i
toString :: (Show a)IntTrie a String
toString t = toStr t 0 1 where
toStr Empty k m = "."
toStr tr k m = "(" ++ (toStr (left tr) k (2m)) ++
" " ++ (show k) ++ (valueStr (value tr)) ++
" " ++ (toStr (right tr) (m+k) (2m)) ++ ")"
valueStr (Just x) = ":" ++ (show x)
valueStr _ = ""
110CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
fromList function can create a trie from a list of integer-data pairs. toString
function can turn a trie data structure to readable string for printing. This is a
modied in-order tree traverse, since the number stored is in little-endian, the
program store the 2
m
to calculate the keys. The following code shows a test.
main = do
putStrLn (toString (fromList [(1, a), (4, b), (5, c), (9, d)]))
This will output:
(((. 0 (. 4:b .)) 0 .) 0 (((. 1 (. 9:d .)) 1 (. 5:c .)) 1:a .))
Figure 5.3 shows this result.
0
1:a
1
0
4:b
1
0
0
5:c
1
9:d
1
Figure 5.3: An little-endian integer binary trie for the map {1 a, 4 b, 5
c, 9 d}.
Insertion of integer Trie in Scheme/Lisp
Insertion program implemented with Scheme/Lisp is quite similar, since Scheme
has a complex numeric system, it will use fraction instead of round to integer,
we minus the key by 1 before divide it with 2.
;; Insertion
;; if user insert an existed value, just overwrite the old value
;; usage: (insert t k x) t: trie, k: key, x: value
(define (insert t k x)
(if (= k 0)
(make-int-trie (left t) x (right t))
(if (even? k)
(make-int-trie (insert (left t) (/ k 2) x) (value t) (right t))
(make-int-trie (left t) (value t) (insert (right t) (/ (- k 1) 2) x)))))
In order to make creation of Trie from a list of key-value pairs easy, here is
a helper function.
5.3. INTEGER TRIE 111
(define (listtrie lst) ;;lst is list of pairs
(define (insert-pair t p)
(insert t (car p) (cadr p)))
(fold-left insert-pair () lst))
In order to convert the Trie to a readable string, a converter function is given
as the following.
(define (triestring trie)
(define (valuestring x)
(cond ((null? x) ".")
((number? x) (numberstring x))
((string? x) x)
(else "unknownvalue")))
(define (triestr t k m)
(if (null? t)
"."
(string-append "(" (triestr (left t) k ( m 2)) ""
(numberstring k) (valuestring (value t)) ""
(triestr (right t) (+ m k) ( m 2)) ")")))
(triestr trie 0 1))
To verify the program, we can test it with a easy test case.
(define (test-int-trie)
(define t (listtrie (list (1 "a") (4 "b") (5 "c") (9 "d"))))
(display (triestring t)) (newline))
Evaluate this test function will generate below output.
(test-int-trie)
(((. 0. (. 4b )) 0. ) 0. (((. 1. (. 9d )) 1. (. 5c )) 1a ))
Which is identical to the Haskell output.
5.3.3 Look up in integer binary trie
Iterative looking up algorithm
To look up a key in a little-endian integer binary trie. We take each bit of the
key from left (LSB), and go left or right according to if the bit is 0, until we
consumes all bits. this algorithm can be described as below pseudo code.
1: function INT-TRIE-LOOKUP(T, x)
2: while x = 0 and T = NIL do
3: if EV EN(x) = TRUE then
4: T LEFT(T)
5: else
6: T RIGHT(T)
7: x x/2
8: if T = NIL then
9: return DATA(T)
10: else
11: return NIL
112CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Look up implemented in C++
In C++, we can test the LSB by using bit-wise operation. The following code
snippet searches a key in an integer Trie. If the target node is found, the value
of that node is returned, else it will return the default value of the value type
1
.
template<class T>
T lookup(IntTrie<T> t, int key){
while(key && t){
if( (key & 0x1) == 0)
t = tleft;
else
t = tright;
key>>=1;
}
if(t)
return tvalue;
else
return T();
}
To verify this program, some simple test cases are provided.
std::cout<<"nlookup4:"<<lookup(tc, 4)
<<"nlookup9:"<<lookup(tc, 9)
<<"nlookup0:"<<lookup(tc, 0);
Where tc is the Trie we created in insertion section. the output is like
below.
look up 4: b
look up 9: d
look up 0:
Look up implemented in Python
By translating the pseudo code algorithm, we can easily get a python imple-
mentation.
def lookup(t, key):
while key != 0 and (t is not None):
if key & 1 == 0:
t = t.left
else:
t = t.right
key = key>>1
if t is not None:
return t.value
else:
return None
In this implementation, instead of using even-odd property, bit-wise manip-
ulation is used to test if a bit is 0 or 1.
Here is the smoke test of the lookup function.
1
One good alternative is to raise exception if not found
5.3. INTEGER TRIE 113
class IntTrieTest:
#...
def test_lookup(self):
t = map_to_trie({4:y, 1:x, 5:z})
print "lookup4:", lookup(t, 4)
print "lookup5:", lookup(t, 5)
print "lookup0:", lookup(t, 0)
The output of the test cases is as below.
look up 4: y
look up 5: z
look up 0: None
Recursive looking up algorithm
Looking up in integer Trie can also be implemented in recursive manner. We
take the LSB of the key to be found, if it is 0, we recursively look it up in left
child, else in right child.
1: function INT-TRIE-LOOKUP(T, x)
2: if T = NIL then
3: return NIL
4: else if x = 0 then
5: return V ALUE(T)
6: else if EV EN(x) then
7: return INT TRIE LOOKUP

(LEFT(T), x/2)
8: else
9: return INT TRIE LOOKUP

(RIGHT(T), x/2)
Look up implemented in Haskell
In Haskell, we can use pattern matching to realize the above long if-then-else
statements. The program is as the following.
search :: IntTrie a Key Maybe a
search Empty k = Nothing
search t 0 = value t
search t k = if even k then search (left t) (k div 2)
else search (right t) (k div 2)
If trie is empty, we simply returns nothing; if key is zero we return the value
of current node; in other case we recursively search either left child or right child
according to the LSB is 0 or not.
To test this program, we can write a smoke test case as following.
testIntTrie = "t=" ++ (toString t) ++
"nsearch t 4: " ++ (show $ search t 4) ++
"nsearch t 0: " ++ (show $ search t 0)
where
t = fromList [(1, a), (4, b), (5, c), (9, d)]
main = do
putStrLn testIntTrie
This program will output these result.
114CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
t=(((. 0 (. 4:b .)) 0 .) 0 (((. 1 (. 9:d .)) 1 (. 5:c .)) 1:a .))
search t 4: Just b
search t 0: Nothing
Look up implemented in Scheme/Lisp
Scheme/Lisp implementation is quite similar. Note that we decrees key by 1
before divide it with 2.
(define (lookup t k)
(if (null? t) ()
(if (= k 0) (value t)
(if (even? k)
(lookup (left t) (/ k 2))
(lookup (right t) (/ (- k 1) 2))))))
Test cases use the same Trie which is created in insertion section.
(define (test-int-trie)
(define t (listtrie (list (1 "a") (4 "b") (5 "c") (9 "d"))))
(display (triestring t)) (newline)
(display "lookup4:") (display (lookup t 4)) (newline)
(display "lookup0:") (display (lookup t 0)) (newline))
the result is as same as the one output by Haskell program.
(test-int-trie)
(((. 0. (. 4b )) 0. ) 0. (((. 1. (. 9d )) 1. (. 5c )) 1a ))
lookup 4: b
lookup 0: ()
5.4 Integer Patricia Tree
Its very easy to nd the drawbacks of integer binary trie. Trie wasts a lot of
spaces. Note in gure 5.3, all nodes except leafs store the real data. Typically,
an integer binary trie contains many nodes only have one child. It is very easy
to come to the idea for improvement, to compress the chained nodes which
have only one child. Patricia is such a data structure invented by Donald R.
Morrison in 1968. Patricia means practical algorithm to retrieve information
coded in alphanumeric[3]. Wikipedia redirect Patricia as Radix tree.
Chris Okasaki gave his implementation of Integer Patricia tree in paper [2].
If we merge the chained nodes which have only one child together in gure 5.3,
We can get a patricia as shown in gure 5.4.
From this gure, we can found that the keys of sibling nodes have the longest
common prex. They only branches out at certain bit. It means that we can
save a lot of data by storing the common prex.
Dierent from integer Trie, using big-endian integer in Patricia doesnt cause
the problem mentioned in section 5.3. Because all zero bits before MSB can be
just omitted to save space. Big-endian integer is more natural than little-endian
integer. Chris Okasaki list some signicant advantages of big-endian Patricia
trees [2].
5.4. INTEGER PATRICIA TREE 115
4:b
001
1:a
1
0
9:d
01
5:c
1
Figure 5.4: Little endian patricia for the map {1 a, 4 b, 5 c, 9 d}.
5.4.1 Denition of Integer Patricia tree
Integer Patricia tree is a special kind of binary tree, it is
either a leaf node contains an integer key and a value
or a branch node, contains a left child and a right child. The integer keys
of two children shares the longest common prex bits, the next bit of the
left childs key is zero while it is one for right childs key.
Denition of big-endian integer Patricia tree in Haskell
If we translate the above recursive denition to Haskell, we can get below Integer
Patrica Tree code.
data IntTree a = Empty
| Leaf Key a
| Branch Prefix Mask (IntTree a) (IntTree a) -- prefix, mask, left, right
type Key = Int
type Prefix = Int
type Mask = Int
In order to tell from which bit the left and right children dier, a mask is
recorded by the branch node. Typically, a mask is 2
n
, all lower bits than n
doesnt belong to common prex
Denition of big-endian integer Patricia tree in Python
Such denition can be represent in Python similarly. Some helper functions are
provided for easy operation later on.
class IntTree:
def __init__(self, key = None, value = None):
self.key = key
self.value = value
self.prefix = self.mask = None
116CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
self.left = self.right = None
def set_children(self, l, r):
self.left = l
self.right = r
def replace_child(self, x, y):
if self.left == x:
self.left = y
else:
self.right = y
def is_leaf(self):
return self.left is None and self.right is None
def get_prefix(self):
if self.prefix is None:
return self.key
else:
return self.prefix
Some helper member functions are provided in this denition. When Initial-
ized, prex, mask and children are all set to invalid value. Note the get prex()
function, in case the prex hasnt been initialized, which means it is a leaf node,
the key itself is returned.
Denition of big-endian integer Patricia tree in C++
With ISO C++, the type of the data stored in Patricia can be abstracted as a
template parameter. The denition is similar to the python version.
template<class T>
struct IntPatricia{
IntPatricia(int k=0, T v=T()):
key(k), value(v), prefix(k), mask(1), left(0), right(0){}
~IntPatricia(){
delete left;
delete right;
}
bool is_leaf(){
return left == 0 && right == 0;
}
bool match(int x){
return (!is_leaf()) && (maskbit(x, mask) == prefix);
}
void replace_child(IntPatricia<T> x, IntPatricia<T> y){
if(left == x)
left = y;
else
right = y;
}
5.4. INTEGER PATRICIA TREE 117
void set_children(IntPatricia<T> l, IntPatricia<T> r){
left = l;
right = r;
}
int key;
T value;
int prefix;
int mask;
IntPatricia left;
IntPatricia right;
};
In order to release the memory easily, the program just recursively deletes
the children in destructor. The default value of type T is used for initialization.
The prex is initialized to be the same value as key.
For the member function match(), Ill explain it in later part.
Denition of big-endian integer Patricia tree in Scheme/Lisp
In Scheme/Lisp program, the data structure behind is still list, we provide
creator functions and accessors to create Patricia and access the children, key,
value, prex and mask.
(define (make-leaf k v) ;; key and value
(list k v))
(define (make-branch p m l r) ;; prefix, mask, left and right
(list p m l r))
;; Helpers
(define (leaf? t)
(= (length t) 2))
(define (branch? t)
(= (length t) 4))
(define (key t)
(if (leaf? t) (car t) ()))
(define (value t)
(if (leaf? t) (cadr t) ()))
(define (prefix t)
(if (branch? t) (car t) ()))
(define (mask t)
(if (branch? t) (cadr t) ()))
(define (left t)
(if (branch? t) (caddr t) ()))
(define (right t)
(if (branch? t) (cadddr t) ()))
118CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Function key and value are only applicable to leaf node while prex, mask,
children accessors are only applicable to branch node. So we test the node type
in these functions.
5.4.2 Insertion of Integer Patricia tree
When insert a key into a integer Patricia tree, if the tree is empty, we can just
create a leaf node with the given key and data. (as shown in gure 5.5).
NIL 12
Figure 5.5: (a). Insert key 12 to an empty patricia tree.
If the tree only contains a leaf node x, we can create a branch, put the new
key and data as a leaf y of the branch. To determine if the new leaf y should
be left node or right node. We need nd the longest common prex of x and y,
for example if key(x) is 12 (1100 in binary), key(y) is 15 (1111 in binary), then
the longest common prex is 11oo. The o denotes the bits we dont care about.
we can use an integer to mask the those bits. In this case, the mask number
is 4 (100 in binary). The next bit after the prex presents 2
1
. Its 0 in key(x),
while it is 1 in key(y). So we put x as left child and y as right child. Figure 5.6
shows this case.
12
prefix=1100
mask=100
12
0
15
1
Figure 5.6: (b). Insert key 15 to the result tree in (a).
If the tree is neither empty, nor a leaf node, we need rstly check if the key
to be inserted matches common prex with root node. If it does, then we can
recursively insert the key to the left child or right child according to the next
bit. For instance, if we want to insert key 14 (1110 in binary) to the result tree
in gure 5.6, since it has common prex 11oo, and the next bit (the bit of 2
1
) is
5.4. INTEGER PATRICIA TREE 119
1, so we tried to insert 14 to the right child. Otherwise, if the key to be inserted
doesnt match the common prex with the root node, we need branch a new
leaf node. Figure 5.7 shows these 2 dierent cases.
prefix=1100
mask=100
12
0
15
1
prefix=1100
mask=100
12
0
prefix=1110
mask=10
1
14
0
15
1
prefix=1100
mask=100
12
0
15
1
prefix=0
mask=10000
prefix=1110
mask=10
1
5
0
12
0
15
1
Figure 5.7: (c). Insert key 14 to the result tree in (b); (d). Insert key 5 to the
result tree in (b).
Iterative insertion algorithm for integer Patricia
Summarize the above cases, the insertion of integer patricia can be described
with the following algorithm.
1: function INT-PATRICIA-INSERT(T, x, data)
2: if T = NIL then
3: T CREATE LEAF(x, data)
4: return T
5: y T
6: p NIL
7: while y is not LEAF and MATCH(x, PREFIX(y), MASK(y)) do
8: p y
9: if ZERO(x, MASK(y)) = TRUE then
10: y LEFT(y)
11: else
12: y RIGHT(y)
13: if LEAF(y) = TRUE and x = KEY (y) then
14: DATA(y) data
15: else
16: z BRANCH(y, CREATE LEAF(x, data))
17: if p = NIL then
18: T z
19: else
20: if LEFT(p) = y then
21: LEFT(p) z
22: else
23: RIGHT(p) z
24: return T
120CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
In the above algorithm, MATCH procedure test if an integer key x, has the
same prex of node y above the mask bit. For instance, Suppose the prex
of node y can be denoted as p(n), p(n 1), ..., p(i), ..., p(0) in binary, key x is
k(n), k(n 1), ..., k(i), ..., k(0), and mask of node y is 100...0 = 2
i
, if and only if
p(j) = k(j) for all i j n, we say the key matches.
Insertion of big-endian integer Patricia tree in Python
Based on the above algorithm, the main insertion program can be realized as
the following.
def insert(t, key, value = None):
if t is None:
t = IntTree(key, value)
return t
node = t
parent = None
while(True):
if match(key, node):
parent = node
if zero(key, node.mask):
node = node.left
else:
node = node.right
else:
if node.is_leaf() and key == node.key:
node.value = value
else:
new_node = branch(node, IntTree(key, value))
if parent is None:
t = new_node
else:
parent.replace_child(node, new_node)
break
return t
The sub procedure of match, branch, lcp etc. are given as below.
def maskbit(x, mask):
return x & (~(mask-1))
def match(key, tree):
if tree.is_leaf():
return False
return maskbit(key, tree.mask) == tree.prefix
def zero(x, mask):
return x & (mask>>1) == 0
def lcp(p1, p2):
diff = (p1 ^ p2)
mask=1
while(diff!=0):
diff>>=1
5.4. INTEGER PATRICIA TREE 121
mask<1
return (maskbit(p1, mask), mask)
def branch(t1, t2):
t = IntTree()
(t.prefix, t.mask) = lcp(t1.get_prefix(), t2.get_prefix())
if zero(t1.get_prefix(), t.mask):
t.set_children(t1, t2)
else:
t.set_children(t2, t1)
return t
Function maskbit() can clear all bits covered by a mask to 0. For instance,
x = 101101(b), and mask = 2
3
= 100(b), the lowest 2 bits will be cleared to
0, which means maskbit(x, mask) = 101100(b). This can be easily done by
bit-wise operation.
Function zero() is used to check if the bit next to mask bit is 0. For instance,
if x = 101101(b), y = 101111(b), and mask = 2
3
= 100(b), zero will check if the
2nd lowest bit is 0. So zero(x, mask) = true, zero(y, mask) = false.
Function lcp can extract the Longest Common Prex of two integer. For
the x and y in above example, because only the last 2 bits are dierent, so
lcp(x, y) = 101100(b). And we set a mask to 2
3
= 100(b) to indicate that the
last 2 bits are not eective for the prex value.
To convert a list or a map into a Patricia tree, we can repeatedly insert the
elements one by one. Since the program is same, except for the insert function,
we can abstract list to xxx and map to xxx to utility functions
# in trieutil.py
def from_list(l, insert_func):
t = None
for x in l:
t = insert_func(t, x)
return t
def from_map(m, insert_func):
t = None
for k, v in m.items():
t = insert_func(t, k, v)
return t
With this high level functions, we can provide list to patricia and map to patricia
as below.
def list_to_patricia(l):
return from_list(l, insert)
def map_to_patricia(m):
return from_map(m, insert)
In order to have smoke test of the above insertion program, some test cases
and output helper are given.
def to_string(t):
to_str = lambda x: "%s" %x
if t is None:
122CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
return ""
if t.is_leaf():
str = to_str(t.key)
if t.value is not None:
str += ":"+to_str(t.value)
return str
str ="["+to_str(t.prefix)+"@"+to_str(t.mask)+"]"
str+="("+to_string(t.left)+","+to_string(t.right)+")"
return str
class IntTreeTest:
def run(self):
self.test_insert()
def test_insert(self):
print "testinsert"
t = list_to_patricia([6])
print to_string(t)
t = list_to_patricia([6, 7])
print to_string(t)
t = map_to_patricia({1:x, 4:y, 5:z})
print to_string(t)
if __name__ == "__main__":
IntTreeTest().run()
The program will output a result as the following.
test insert
6
[6@2](6,7)
[0@8](1:x,[4@2](4:y,5:z))
This result means the program creates a Patrica tree shown in Figure 5.8.
Insertion of big-endian integer Patricia tree in C++
In the below C++ program, the default value of data type is used if user doesnt
provide data. It is nearly strict translation of the pseudo code.
template<class T>
IntPatricia<T> insert(IntPatricia<T> t, int key, T value=T()){
if(!t)
return new IntPatricia<T>(key, value);
IntPatricia<T> node = t;
IntPatricia<T> parent(0);
while( nodeis_leaf()==false && nodematch(key) ){
parent = node;
if(zero(key, nodemask))
node = nodeleft;
else
node = noderight;
}
5.4. INTEGER PATRICIA TREE 123
prefix=0
mask=8
1:x
0
prefix=100
mask=2
1
4:y
0
5:z
1
Figure 5.8: Insert map 1 x, 4 y, 5 z into a big-endian integer Patricia
tree.
124CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
if(nodeis_leaf() && key == nodekey)
nodevalue = value;
else{
IntPatricia<T> p = branch(node, new IntPatricia<T>(key, value));
if(!parent)
return p;
parentreplace_child(node, p);
}
return t;
}
Lets review the implementation of member function match()
bool match(int x){
return (!is_leaf()) && (maskbit(x, mask) == prefix);
}
if a node is not a leaf, and it has common prex (in bit-wise) as the key to be
inserted, we say the node match the key. It is realized by a maskbit() function
as below.
int maskbit(int x, int mask){
return x & (~(mask-1));
}
Since mask is always 2
n
, minus 1 will ip it to 111...1(b), then we reverse
the it by bit-wise not, and clear all the lowest n 1 bits of x by bit-wise and.
The branch() function in above program is as the following.
template<class T>
IntPatricia<T> branch(IntPatricia<T> t1, IntPatricia<T> t2){
IntPatricia<T> t = new IntPatricia<T>();
tmask = lcp(tprefix, t1prefix, t2prefix);
if(zero(t1prefix, tmask))
tset_children(t1, t2);
else
tset_children(t2, t1);
return t;
}
It will extract the Longest Common Prex, and create a new node, put the
2 nodes to be merged as its children. Function lcp() is implemented as below.
int lcp(int& p, int p1, int p2){
int diff = p1^p2;
int mask = 1;
while(diff){
diff>>=1;
mask<1;
}
p = maskbit(p1, mask);
return mask;
}
Because we can only return one value in C++, we set the reference of pa-
rameter p as the common prex result and returns the mask value.
5.4. INTEGER PATRICIA TREE 125
To decide which child is left and which one is right when branching, we need
test if the bit next to mask bit is zero.
bool zero(int x, int mask){
return (x & (mask>>1)) == 0;
}
To verify the C++ program, some simple test cases are provided.
IntPatricia<int> ti(0);
const int lst[] = {6, 7};
ti = std::accumulate(lst, lst+sizeof(lst)/sizeof(int), ti,
std::ptr_fun(insert_key<int>));
std::copy(lst, lst+sizeof(lst)/sizeof(int),
std::ostream_iterator<int>(std::cout, ","));
std::cout<<"="<<patricia_to_str(ti)<<"n";
const int keys[] = {1, 4, 5};
const char vals[] = "xyz";
IntPatricia<char> tc(0);
for(unsigned int i=0; i<sizeof(keys)/sizeof(int); ++i)
tc = insert(tc, keys[i], vals[i]);
std::copy(keys, keys+sizeof(keys)/sizeof(int),
std::ostream_iterator<int>(std::cout, ","));
std::cout<<"="<<patricia_to_str(tc);
To avoid repeating ourselves, we provide a dierent way instead of write a
list to patrica(), which is very similar to list to trie in previous section.
In C++ STL, std::accumulate() plays a similar role of fold-left. But the
functor we provide to accumulate must take 2 parameters, so we provide a
wrapper function as below.
template<class T>
IntPatricia<T> insert_key(IntPatricia<T> t, int key){
return insert(t, key);
}
With all these code line, we can get the following result.
6, 7, ==>[6@2](6,7)
1, 4, 5, ==>[0@8](1:x,[4@2](4:y,5:z))
Recursive insertion algorithm for integer Patricia
To implement insertion in recursive way, we treat the dierent cases separately.
If the tree is empty, we just create a leaf node and return; if the tree is a leaf
node, we need check the key of the node is as same as the key to be inserted,
we overwrite the data in case they are same, else we need branch a new node
and extract the longest common prex and mask bit; In other case, we need
examine if the key as common prex with the branch node, and recursively
perform insertion either to left child or to right child according to the next
dierent bit is 0 or 1; Below recursive algorithm describes this approach.
1: function INT-PATRICIA-INSERT(T, x, data)
2: if T = NIL or (T is a leaf and x = KEY (T)) then
3: return CREATE LEAF(x, data)
126CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
4: else if MATCH(x, PREFIX(T), MASK(T)) then
5: if ZERO(x, MASK(T)) then
6: LEFT(T) INTPATRICIAINSERT

(LEFT(T), x, data)
7: else
8: RIGHT(T) INTPATRICIAINSERT

(RIGHT(T), x, data)
9: return T
10: else
11: return BRANCH(T, CREATE LEAF(x, data))
Insertion of big-endian integer Patricia tree in Haskell
Insertion of big-endian integer Patricia tree can be implemented in Haskell by
Change the above algorithm to recursive approach.
-- usage: insert tree key x
insert :: IntTree a Key a IntTree a
insert t k x
= case t of
Empty Leaf k x
Leaf k x if k==k then Leaf k x
else join k (Leaf k x) k t -- t@(Leaf k x)
Branch p m l r
| match k p m if zero k m
then Branch p m (insert l k x) r
else Branch p m l (insert r k x)
| otherwise join k (Leaf k x) p t -- t@(Branch p m l r)
The match, zero and join functions in this program are dened as below.
-- join 2 nodes together.
-- (prefix1, tree1) ++ (prefix2, tree2)
-- 1. find the longest common prefix == lcp(prefix1, prefix2), where
-- prefix1 = a(n),a(n-1),...a(i+1),a(i),x...
-- prefix2 = a(n),a(n-1),...a(i+1),a(i),y...
-- prefix = a(n),a(n-1),...a(i+1),a(i),00...0
-- 2. mask bit = 100...0b (=2^i)
-- so mask is something like, 1,2,4,...,128,256,...
-- 3. if x==0, y==1 then (tree1left, tree2right),
-- else if x==1, y==0 then (tree2left, tree1right).
join :: Prefix IntTree a Prefix IntTree a IntTree a
join p1 t1 p2 t2 = if zero p1 m then Branch p m t1 t2
else Branch p m t2 t1
where
(p, m) = lcp p1 p2
-- lcp means longest common prefix
lcp :: Prefix Prefix (Prefix, Mask)
lcp p1 p2 = (p, m) where
m = bit (highestBit (p1 xor p2))
p = mask p1 m
-- get the order of highest bit of 1.
-- For a number x = 00...0,1,a(i-1)...a(1)
-- the result is i
5.4. INTEGER PATRICIA TREE 127
highestBit :: Int Int
highestBit x = if x==0 then 0 else 1+highestBit (shiftR x 1)
-- For a number x = a(n),a(n-1)...a(i),a(i-1),...,a(0)
-- and a mask m = 100..0 (=2^i)
-- the result of mask x m is a(n),a(n-1)...a(i),00..0
mask :: Int Mask Int
mask x m = (x &. complement (m-1)) -- complement means bit-wise not.
-- Test if the next bit after mask bit is zero
-- For a number x = a(n),a(n-1)...a(i),1,...a(0)
-- and a mask m = 100..0 (=2^i)
-- because the bit next to a(i) is 1, so the result is False
-- For a number y = a(n),a(n-1)...a(i),0,...a(0) the result is True.
zero :: Int Mask Bool
zero x m = x &. (shiftR m 1) == 0
-- Test if a key matches a prefix above of the mask bit
-- For a prefix: p(n),p(n-1)...p(i)...p(0)
-- a key: k(n),k(n-1)...k(i)...k(0)
-- and a mask: 100..0 = (2^i)
-- If and only if p(j)==k(j), ijn the result is True
match :: Key Prefix Mask Bool
match k p m = (mask k m) == p
In order to test the above insertion program, some test helper functions are
provided.
-- Generate a Int Patricia tree from a list
-- Usage: fromList [(k1, x1), (k2, x2),..., (kn, xn)]
fromList :: [(Key, a)] IntTree a
fromList xs = foldl ins Empty xs where
ins t (k, v) = insert t k v
toString :: (Show a)IntTree a String
toString t =
case t of
Empty "."
Leaf k x (show k) ++ ":" ++ (show x)
Branch p m l r "[" ++ (show p) ++ "@" ++ (show m) ++ "]" ++
"(" ++ (toString l) ++ ", " ++ (toString r) ++ ")"
With these helpers, insertion can be test as the following.
testIntTree = "t=" ++ (toString t)
where
t = fromList [(1, x), (4, y), (5, z)]
main = do
putStrLn testIntTree
This test will output:
t=[0@8](1:x, [4@2](4:y, 5:z))
This result means the program creates a Patrica tree shown in Figure 5.8.
128CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Insertion of big-endian integer Patricia tree in Scheme/Lisp
In Scheme/Lisp, we use switch-case like condition to test if the node is empty,
or a leaf or a branch.
(define (insert t k x) ;; t: patrica, k: key, x: value
(cond ((null? t) (make-leaf k x))
((leaf? t) (if (= (key t) k)
(make-leaf k x) ;; overwrite
(branch k (make-leaf k x) (key t) t)))
((branch? t) (if (match? k (prefix t) (mask t))
(if (zero-bit? k (mask t))
(make-branch (prefix t)
(mask t)
(insert (left t) k x)
(right t))
(make-branch (prefix t)
(mask t)
(left t)
(insert (right t) k x)))
(branch k (make-leaf k x) (prefix t) t)))))
Where the function match?, zero-bit?, and branch are given as the following.
We use the scheme x number bit-wise operations to mask the number and test
bit.
(define (mask-bit x m)
(fix:and x (fix:not (- m 1))))
(define (zero-bit? x m)
(= (fix:and x (fix:lsh m -1)) 0))
(define (lcp x y) ;; get the longest common prefix
(define (count-mask z)
(if (= z 0) 1 ( 2 (count-mask (fix:lsh z -1)))))
(let ((m (count-mask (fix:xor x y)))
(p (mask-bit x m)))
(cons p m)))
(define (match? k p m)
(= (mask-bit k m) p))
(define (branch p1 t1 p2 t2) ;; pi: prefix i, ti: Patricia i
(let ((pm (lcp p1 p2))
(p (car pm))
(m (cdr pm)))
(if (zero-bit? p1 m)
(make-branch p m t1 t2)
(make-branch p m t2 t1))))
We can use the very same list-trie function which is dened in integer trie.
Below is an example to create a integer Patricia tree.
(define (test-int-patricia)
(define t (listtrie (list (1 "x") (4 "y") (5 "z"))))
(display t) (newline))
5.4. INTEGER PATRICIA TREE 129
Evaluate it will generate a Patricia tree like below.
(test-int-patricia)
(0 8 (1 x) (4 2 (4 y) (5 z)))
It is identical to the insert result output by Hasekll insertion program.
5.4.3 Look up in Integer Patricia tree
Consider the property of integer Patricia tree, to look up a key, we test if the
key has common prex with the root, if yes, we then check the next bit diers
from common prex is zero or one. If it is zero, we then do look up in the left
child, else we turn to right.
Iterative looking up in integer Patricia tree
In case we reach a leaf node, we can directly check if the key of the leaf is equal
to what we are looking up. This algorithm can be described with the following
pseudo code.
1: function INT-PATRICIA-LOOK-UP(T, x)
2: if T = NIL then
3: return NIL
4: while T is not LEAF and MATCH(x, PREFIX(T), MASK(T)) do
5: if ZERO(x, MASK(T)) then
6: T LEFT(T)
7: else
8: T RIGHT(T)
9: if T is LEAF and KEY (T) = x then
10: return DATA(T)
11: else
12: return NIL
Look up in big-endian integer Patricia tree in Python
With Python, we can directly translate the pseudo code into valid program.
def lookup(t, key):
if t is None:
return None
while (not t.is_leaf()) and match(key, t):
if zero(key, t.mask):
t = t.left
else:
t = t.right
if t.is_leaf() and t.key == key:
return t.value
else:
return None
We can verify this program by some simple smoke test cases.
print "testlookup"
t = map_to_patricia({1:x, 4:y, 5:z})
print "lookup4:", lookup(t, 4)
print "lookup0:", lookup(t, 0)
130CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
We can get similar output as below.
test look up
look up 4: y
look up 0: None
Look up in big-endian integer Patricia tree in C++
With C++ language, if the program doesnt nd the key, we can either raise
exception to indicate a search failure or return a special value.
template<class T>
T lookup(IntPatricia<T> t, int key){
if(!t)
return T(); //or throw exception
while( (!tis_leaf()) && tmatch(key)){
if(zero(key, tmask))
t = tleft;
else
t = tright;
}
if(tis_leaf() && tkey == key)
return tvalue;
else
return T(); //or throw exception
}
We can try some test cases to search keys in a Patricia tree we created when
test insertion.
std::cout<<"nlookup4:"<<lookup(tc, 4)
<<"nlookup0:"<<lookup(tc, 0)<<"n";
The output result is as the following.
look up 4: y
look up 0:
Recursive looking up in integer Patricia tree
We can easily change the while-loop in above iterative algorithm into recursive
calls, so that we can have a functional approach.
1: function INT-PATRICIA-LOOK-UP(T, x)
2: if T = NIL then
3: return NIL
4: else if T is a leaf and x = KEY (T) then
5: return V ALUE(T)
6: else if MATCH(x, PREFIX(T), MASK(T)) then
7: if ZERO(x, MASK(T)) then
8: return INT PATRICIALOOK UP

(LEFT(T), x)
9: else
10: return INT PATRICIALOOK UP

(RIGHT(T), x)
11: else
12: return NIL
5.4. INTEGER PATRICIA TREE 131
Look up in big-endian integer Patricia tree in Haskell
By changing the above if-then-else into pattern matching, we can get Haskell
version of looking up program.
-- look up a key
search :: IntTree a Key Maybe a
search t k
= case t of
Empty Nothing
Leaf k x if k==k then Just x else Nothing
Branch p m l r
| match k p m if zero k m then search l k
else search r k
| otherwise Nothing
And we can test this program with looking up some keys in the previously
created Patricia tree.
testIntTree = "t=" ++ (toString t) ++ "nsearch t 4: " ++ (show $ search t 4) ++
"nsearch t 0: " ++ (show $ search t 0)
where
t = fromList [(1, x), (4, y), (5, z)]
main = do
putStrLn testIntTree
The output result is as the following.
t=[0@8](1:x, [4@2](4:y, 5:z))
search t 4: Just y
search t 0: Nothing
Look up in big-endian integer Patricia tree in Scheme/Lisp
Scheme/Lisp program for looking up is similar in case the tree is empty, we just
returns nothing; If it is a leaf node and the key is equal to the number we are
looking for, we nd the result; If it is branch, we need test if binary format of
the prex matches the number, then we recursively search either in left child or
in right child according to the next bit after mask is zero or not.
(define (lookup t k)
(cond ((null? t) ())
((leaf? t) (if (= (key t) k) (value t) ()))
((branch? t) (if (match? k (prefix t) (mask t))
(if (zero-bit? k (mask t))
(lookup (left t) k)
(lookup (right t) k))
()))))
We can test it with the Patricia tree we create in the insertion program.
(define (test-int-patricia)
(define t (listtrie (list (1 "x") (4 "y") (5 "z"))))
(display t) (newline)
(display "lookup4:") (display (lookup t 4)) (newline)
(display "lookup0:") (display (lookup t 0)) (newline))
132CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
The result is like below.
(test-int-patricia)
(0 8 (1 x) (4 2 (4 y) (5 z)))
lookup 4: y
lookup 0: ()
5.5 Alphabetic Trie
Integer based Trie and Patricia Tree can be a good start point. Such tech-
nical plays important role in Compiler implementation. Okasaki pointed that
the widely used Haskell Compiler GHC (Glasgow Haskell Compiler), utilizes a
similar implementation for several years before 1998 [2].
While if we extend the type of the key from integer to alphabetic value,
Trie and Patricia tree can be very useful in textual manipulation engineering
problems.
5.5.1 Denition of alphabetic Trie
If the key is alphabetic value, just left and right children cant represent all
values. For English, there are 26 letters and each can be lower case or upper
case. If we dont care about case, one solution is to limit the number of branches
(children) to 26. Some simplied ANSI C implementation of Trie are dened
by using an array of 26 letters. This can be illustrated as in Figure 5.9.
In each node, not all branches may contain data. for instance, in the above
gure, the root node only has its branches represent letter a, b, and z have sub
trees. Other branches such as for letter c, is empty. For other nodes, empty
branches (point to nil) are not shown.
Ill give such simplied implementation in ANSI C in later section, however,
before we go to the detailed source code, lets consider some alternatives.
In case of language other than English, there may be more letters than 26,
and if we need solve case sensitive problem. we face a problem of dynamic size
of sub branches. There are 2 typical method to represent children, one is by
using Hash table, the other is by using map. Well show these two types of
method in Python and C++.
Denition of alphabetic Trie in ANSI C
ANSI C implementation is to illustrate a simplied approach limited only to
case-insensitive English language. The program cant deal with letters other
than lower case a to z such as digits, space, tab etc.
struct Trie{
struct Trie children[26];
void data;
};
In order to initialize/destroy the children and data, I also provide 2 helper
functions.
struct Trie create_node(){
struct Trie t = (struct Trie)malloc(sizeof(struct Trie));
5.5. ALPHABETIC TRIE 133
a
a b
nil
c
...
z
an
n
o
t
h
e
another
r
o
o
boy
y
bool
l
o
zoo
o
Figure 5.9: A Trie with 26 branches, with key a, an, another, bool, boy and zoo
inserted.
134CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
int i;
for(i=0; i<26; ++i)
tchildren[i]=0;
tdata=0;
return t;
}
void destroy(struct Trie t){
if(!t)
return;
int i;
for(i=0; i<26; ++i)
destroy(tchildren[i]);
if(tdata)
free(tdata);
free(t);
}
Note that, the destroy function uses recursive approach to free all children
nodes.
Denition of alphabetic Trie in C++
With C++ and STL, we can abstract the language and characters as type
parameter. Since the number of characters of the undetermined language varies,
we can use std::map to store children of a node.
template<class Char, class Value>
struct Trie{
typedef Trie<Char, Value> Self;
typedef std::map<Char, Self> Children;
typedef Value ValueType;
Trie():value(Value()){}
virtual ~Trie(){
for(typename Children::iterator it=children.begin();
it!=children.end(); ++it)
delete itsecond;
}
Value value;
Children children;
};
For simple illustration purpose, recursive destructor is used to release the
memory.
Denition of alphabetic Trie in Haskell
We can use Haskell record syntax to get some free accessor functions[4].
data Trie a = Trie { value :: Maybe a
5.5. ALPHABETIC TRIE 135
, children :: [(Char, Trie a)]}
empty = Trie Nothing []
Neither Map nor Hash table is used, just a list of pairs can realize the
same purpose. Function empty can help to create an empty Trie node. This
implementation doesnt constrain the key values to lower case English letters,
it can actually contains any values of Char type.
Denition of alphabetic Trie in Python
In Python version, we can use Hash table as the data structure to represent
children nodes.
class Trie:
def __init__(self):
self.value = None
self.children = {}
Denition of alphabetic Trie in Scheme/Lisp
The denition of alphabetic Trie in Scheme/Lisp is a list of two elements, one
is the value of the node, the other is a children list. The children list is a list of
pairs, one is the character binding to the child, the other is a Trie node.
(define (make-trie v lop) ;; v: value, lop: children, list of char-trie pairs
(cons v lop))
(define (value t)
(if (null? t) () (car t)))
(define (children t)
(if (null? t) () (cdr t)))
In order to create the child and access it easily, we also provide functions for
such purpose.
(define (make-child k t)
(cons k t))
(define (key child)
(if (null? child) () (car child)))
(define (tree child)
(if (null? child) () (cdr child)))
5.5.2 Insertion of alphabetic trie
To insert a key with type of string into a Trie, we pick the rst letter from the
key string. Then check from the root node, we examine which branch among
the children represents this letter. If the branch is null, we then create an empty
node. After that, we pick the next letter from the key string and pick the proper
branch from the grand children of the root.
136CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
We repeat the above process till nishing all the letters of the key. At this
time point, we can nally set the data to be inserted as the value of the node.
Note that the value of root node of Trie is always empty.
Iterative algorithm of trie insertion
The below pseudo code describes the above insertion algorithm.
1: function TRIE-INSERT(T, key, data)
2: if T = NIL then
3: T EmptyNode
4: p = T
5: for each c in key do
6: if CHILDREN(p)[c] = NIL then
7: CHILDREN(p)[c] EmptyNode
8: p CHILDREN(p)[c]
9: DATA(p) data
10: return T
Simplied insertion of alphabetic trie in ANSI C
Go on with the above ANSI C denition, because only lower case English letter
is supported, we can use plan array manipulation to do the insertion.
struct Trie insert(struct Trie t, const char key, void value){
if(!t)
t=create_node();
struct Trie p =t;
while(key){
int c = key - a;
if(!pchildren[c])
pchildren[c] = create_node();
p = pchildren[c];
++key;
}
pdata = value;
return t;
}
In order to test the above program, some helper functions to print content
of the Trie is provided as the following.
void print_trie(struct Trie t, const char prefix){
printf("(%s", prefix);
if(tdata)
printf(":%s", (char)(tdata));
int i;
for(i=0; i<26; ++i){
if(tchildren[i]){
printf(",");
char new_prefix=(char)malloc(strlen(prefix+1)sizeof(char));
sprintf(new_prefix, "%s%c", prefix, i+a);
print_trie(tchildren[i], new_prefix);
5.5. ALPHABETIC TRIE 137
}
}
printf(")");
}
After that, we can test the insertion program with such test cases.
struct Trie test_insert(){
struct Trie t=0;
t = insert(t, "a", 0);
t = insert(t, "an", 0);
t = insert(t, "another", 0);
t = insert(t, "boy", 0);
t = insert(t, "bool", 0);
t = insert(t, "zoo", 0);
print_trie(t, "");
return t;
}
int main(int argc, char argv){
struct Trie t = test_insert();
destroy(t);
return 0;
}
This program will output a Trie like this.
(, (a, (an, (ano, (anot, (anoth, (anothe, (another))))))),
(b, (bo, (boo, (bool)), (boy))), (z, (zo, (zoo))))
It is exactly the Trie as shown in gure 5.9.
Insertion of alphabetic Trie in C++
With above C++ denition, we can utilize STL provided search function in
std::map to locate a child quickly, the program is implemented as the following,
note that if user only provides key for insert, we also insert a default value of
that type.
template<class Char, class Value, class Key>
Trie<Char, Value> insert(Trie<Char, Value> t, Key key, Value value=Value()){
if(!t)
t = new Trie<Char, Value>();
Trie<Char, Value> p(t);
for(typename Key::iterator it=key.begin(); it!=key.end(); ++it){
if(pchildren.find(it) == pchildren.end())
pchildren[it] = new Trie<Char, Value>();
p = pchildren[it];
}
pvalue = value;
return t;
}
template<class T, class K>
T insert_key(T t, K key){
138CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
return insert(t, key);
}
Where insert key() acts as a adapter, well use similar accumulation method
to create trie from list later.
To test this program, we provide the helper functions to print the trie on
console.
template<class T>
std::string trie_to_str(T t, std::string prefix=""){
std::ostringstream s;
s<<"("<<prefix;
if(tvalue != typename T::ValueType())
s<<":"<<tvalue;
for(typename T::Children::iterator it=tchildren.begin();
it!=tchildren.end(); ++it)
s<<","<<trie_to_str(itsecond, prefix+itfirst);
s<<")";
return s.str();
}
After that, we can test our program with some simple test cases.
typedef Trie<char, std::string> TrieType;
TrieType t(0);
const char lst[] = {"a", "an", "another", "b", "bob", "bool", "home"};
t = std::accumulate(lst, lst+sizeof(lst)/sizeof(char), t,
std::ptr_fun(insert_key<TrieType, std::string>));
std::copy(lst, lst+sizeof(lst)/sizeof(char),
std::ostream_iterator<std::string>(std::cout, ","));
std::cout<<"n="<<trie_to_str(t)<<"n";
delete t;
t=0;
const char keys[] = {"001", "100", "101"};
const char vals[] = {"y", "x", "z"};
for(unsigned int i=0; i<sizeof(keys)/sizeof(char); ++i)
t = insert(t, std::string(keys[i]), std::string(vals[i]));
std::copy(keys, keys+sizeof(keys)/sizeof(char),
std::ostream_iterator<std::string>(std::cout, ","));
std::cout<<"="<<trie_to_str(t)<<"n";
delete t;
It will output result like this.
a, an, another, b, bob, bool, home,
==>(, (a, (an, (ano, (anot, (anoth, (anothe, (another))))))), (b, (bo,
(bob), (boo, (bool)))), (h, (ho, (hom, (home)))))
001, 100, 101, ==>(, (0, (00, (001:y))), (1, (10, (100:x), (101:z))))
Insertion of alphabetic trie in Python
In python the implementation is very similar to the pseudo code.
def trie_insert(t, key, value = None):
if t is None:
5.5. ALPHABETIC TRIE 139
t = Trie()
p = t
for c in key:
if not c in p.children:
p.children[c] = Trie()
p = p.children[c]
p.value = value
return t
And we dene the helper functions as the following.
def trie_to_str(t, prefix=""):
str="("+prefix
if t.value is not None:
str += ":"+t.value
for k,v in sorted(t.children.items()):
str += ","+trie_to_str(v, prefix+k)
str+=")"
return str
def list_to_trie(l):
return from_list(l, trie_insert)
def map_to_trie(m):
return from_map(m, trie_insert)
With these helpers, we can test the insert program as below.
class TrieTest:
#...
def test_insert(self):
t = None
t = trie_insert(t, "a")
t = trie_insert(t, "an")
t = trie_insert(t, "another")
t = trie_insert(t, "b")
t = trie_insert(t, "bob")
t = trie_insert(t, "bool")
t = trie_insert(t, "home")
print trie_to_str(t)
It will print a trie in console.
(, (a, (an, (ano, (anot, (anoth, (anothe, (another))))))),
(b, (bo, (bob), (boo, (bool)))), (h, (ho, (hom, (home)))))
Recursive algorithm of Trie insertion
The iterative algorithms can transform to recursive algorithm by such approach.
We take one character from the key, and locate the child branch, then recursively
insert the left characters of the key to that branch. If the branch is empty, we
create a new node and add it to children before doing the recursively insertion.
1: function TRIE-INSERT(T, key, data)
2: if T = NIL then
3: T EmptyNode
140CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
4: if key = NIL then
5: V ALUE(T) data
6: else
7: p FIND(CHILDREN(T), FIRST(key))
8: if p = NIL then
9: p APPEND(CHILDREN(T), FIRST(key), EmptyNode)
10: TRIE INSERT

(p, REST(key), data)


11: return T
Insertion of alphabetic trie in Haskell
To realize the insertion in Haskell, The only thing we need do is to translate the
for-each loop into recursive call.
insert :: Trie a String a Trie a
insert t [] x = Trie (Just x) (children t)
insert t (k:ks) x = Trie (value t) (ins (children t) k ks x) where
ins [] k ks x = [(k, (insert empty ks x))]
ins (p:ps) k ks x = if fst p == k
then (k, insert (snd p) ks x):ps
else p:(ins ps k ks x)
If the key is empty, the program reaches the trivial terminator case. It just
set the value. In other case, it examine the children recursively. Each element
of the children is a pair, contains a character and a branch.
Some helper functions are provided as the following.
fromList :: [(String, a)] Trie a
fromList xs = foldl ins empty xs where
ins t (k, v) = insert t k v
toString :: (Show a) Trie a String
toString t = toStr t "" where
toStr t prefix = "(" ++ prefix ++ showMaybe (value t) ++
(concat (map ((k, v) ", " ++ toStr v (prefix++[k])))
(sort (children t)))
++ ")"
showMaybe Nothing = ""
showMaybe (Just x) = ":" ++ show x
sort :: (Ord a)[(a, b)] [(a, b)]
sort [] = []
sort (p:ps) = sort xs ++ [p] ++ sort ys where
xs = [x | xps, fst x fst p ]
ys = [y | yps, fst y > fst p ]
The fromList function provide an easy way to repeatedly extract key-value
pairs from a list and insert them into a Trie.
Function toString can print the Trie in a modied pre-order way. Because
the children stored in a unsorted list, a sort function is provided to sort the
branches. The quick-sort algorithm is used.
We can test the above program with the below test cases.
5.5. ALPHABETIC TRIE 141
testTrie = "t=" ++ (toString t)
where
t = fromList [("a", 1), ("an", 2), ("another", 7), ("boy", 3),
("bool", 4), ("zoo", 3)]
main = do
putStrLn testTrie
The program outputs:
t=(, (a:1, (an:2, (ano, (anot, (anoth, (anothe, (another:7))))))),
(b, (bo, (boy:3), (boo, (bool:4)))), (z, (zo, (zoo:3))))
It is identical to the ANSI C result except the values we inserted.
Insertion of alphabetic trie in Scheme/Lisp
In order to manipulate string like list, we provide two helper function to provide
car, cdr like operations for string.
(define (string-car s)
(string-head s 1))
(define (string-cdr s)
(string-tail s 1))
After that, we can implement insert program as the following.
(define (insert t k x)
(define (ins lst k ks x) ;; return list of child
(if (null? lst)
(list (make-child k (insert () ks x)))
(if (string=? (key (car lst)) k)
(cons (make-child k (insert (tree (car lst)) ks x)) (cdr lst))
(cons (car lst) (ins (cdr lst) k ks x)))))
(if (string-null? k)
(make-trie x (children t))
(make-trie (value t)
(ins (children t) (string-car k) (string-cdr k) x))))
In order to print readable string for a Trie, we provide a pre-order manner
of Trie traverse function. It can convert a Trie to string.
(define (triestring t)
(define (valuestring x)
(cond ((null? x) ".")
((number? x) (numberstring x))
((string? x) x)
(else "unknownvalue")))
(define (triestr t prefix)
(define (childstr c)
(string-append "," (triestr (tree c) (string-append prefix (key c)))))
(let ((lst (map childstr (sort-children (children t)))))
(string-append "(" prefix (valuestring (value t))
(fold-left string-append "" lst) ")")))
(triestr t ""))
142CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Where sort-children is a quick sort algorithm to sort all children of a node
based on keys.
(define (sort-children lst)
(if (null? lst) ()
(let ((xs (filter (lambda (c) (string? (key c) (key (car lst))))
(cdr lst)))
(ys (filter (lambda (c) (string>? (key c) (key (car lst))))
(cdr lst))))
(append (sort-children xs)
(list (car lst))
(sort-children ys)))))
Function lter is only available after R
6
RS, for R
5
RS, we dene the lter
function manually.
(define (filter pred lst)
(keep-matching-items lst pred))
With all of these denition, we can test our insert program with some simple
test cases.
(define (test-trie)
(define t (listtrie (list ("a" 1) ("an" 2) ("another" 7)
("boy" 3) ("bool" 4) ("zoo" 3))))
(define t2 (listtrie (list ("zoo" 3) ("bool" 4) ("boy" 3)
("another" 7) ("an" 2) ("a" 1))))
(display (triestring t)) (newline)
(display (triestring t2)) (newline))
In the above test program, function trie-string is reused, it is previous
dened for integer trie.
Evaluate test-trie function will output the following result.
(test-trie)
(., (a1, (an2, (ano., (anot., (anoth., (anothe., (another7))))))),
(b., (bo., (boo., (bool4)), (boy3))), (z., (zo., (zoo3))))
(., (a1, (an2, (ano., (anot., (anoth., (anothe., (another7))))))),
(b., (bo., (boo., (bool4)), (boy3))), (z., (zo., (zoo3))))
5.5.3 Look up in alphabetic trie
To look up a key in a Trie, we also extract the character from the key string
one by one. For each character, we search among the children branches to see if
there is a branch represented by this character. In case there is no such child,
the look up process terminates immediately to indicate a fail result. If we reach
the last character, The data stored in the current node is the result we are
looking up.
Iterative look up algorithm for alphabetic Trie
This process can be described in pseudo code as below.
1: function TRIE-LOOK-UP(T, key)
2: if T = NIL then
3: return NIL
5.5. ALPHABETIC TRIE 143
4: p = T
5: for each c in key do
6: if CHILDREN(p)[c] = NIL then
7: return NIL
8: p CHILDREN(p)[c]
9: return DATA(p)
Look up in alphabetic Trie in C++
We can easily translate the iterative algorithm to C++. If the key specied
cant be found in the Trie, our program returns a default value of the data type.
As alternative, it is also a choice to raise exception.
template<class T, class Key>
typename T::ValueType lookup(T t, Key key){
if(!t)
return typename T::ValueType(); //or throw exception
T p(t);
for(typename Key::iterator it=key.begin(); it!=key.end(); ++it){
if(pchildren.find(it) == pchildren.end())
return typename T::ValueType(); //or throw exception
p = pchildren[it];
}
return pvalue;
}
To verify the look up program, we can test it with some simple test cases.
Trie<char, int> t(0);
const char keys[] = {"a", "an", "another", "b", "bool", "bob", "home"};
const int vals[] = {1, 2, 7, 1, 4, 3, 4};
for(unsigned int i=0; i<sizeof(keys)/sizeof(char); ++i)
t = insert(t, std::string(keys[i]), vals[i]);
std::cout<<"nlookupanother:"<<lookup(t, std::string("another"))
<<"nlookuphome:"<<lookup(t, std::string("home"))
<<"nlookupthe:"<<lookup(t, std::string("the"))<<"n";
delete t;
We can get result as below.
lookup another: 7
lookup home: 4
lookup the: 0
We see that key word the isnt contained in Trie, in our program, the
default value of integer, 0 is returned.
Look up in alphabetic trie in Python
By translating the algorithm in Python language, we can get a imperative pro-
gram.
def lookup(t, key):
if t is None:
144CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
return None
p = t
for c in key:
if not c in p.children:
return None
p = p.children[c]
return p.value
We can use the similiar test cases to test looking up function.
class TrieTest:
#...
def test_lookup(self):
t = map_to_trie({"a":1, "an":2, "another":7, "b":1,
"bool":4, "bob":3, "home":4})
print "findanother:", lookup(t, "another")
print "findhome:", lookup(t, "home")
print "findthe:", lookup(t, "the")
The result of these test cases are.
find another: 7
find home: 4
find the: None
Recursive look up algorithm for alphabetic Trie
In recursive algorithm, we take rst character from the key to be looked up. If
it can be found in a child for the current node, we then recursively search the
rest characters of the key from that child branch. else it means the key cant
be found.
1: function TRIE-LOOK-UP(T, key)
2: if key = NIL then
3: return V ALUE(T)
4: p FIND(CHILDREN(T), FIRST(key))
5: if p = NIL then
6: return NIL
7: else
8: return TRIE LOOK UP

(p, REST(key))
Look up in alphabetic trie in Haskell
To express this algorithm in Haskell, we can utilize lookup function in Haskell
standard library[4].
find :: Trie a String Maybe a
find t [] = value t
find t (k:ks) = case lookup k (children t) of
Nothing Nothing
Just t find t ks
We can append some search test cases right after insert.
5.6. ALPHABETIC PATRICIA TREE 145
testTrie = "t=" ++ (toString t) ++
"nsearch t an: " ++ (show (find t "an")) ++
"nsearch t boy: " ++ (show (find t "boy")) ++
"nsearch t the: " ++ (show (find t "the"))
...
Here is the search result.
search t an: Just 2
search t boy: Just 3
search t the: Nothing
Look up in alphabetic trie in Scheme/Lisp
In Scheme/Lisp program, if the key is empty, we just return the value of the
current node, else we recursively nd in children of the node to see if there is
a child binding to a character, which match the rst character of the key. We
repeat this process till examine all characters of the key.
(define (lookup t k)
(define (find k lst)
(if (null? lst) ()
(if (string=? k (key (car lst)))
(tree (car lst))
(find k (cdr lst)))))
(if (string-null? k) (value t)
(let ((child (find (string-car k) (children t))))
(if (null? child) ()
(lookup child (string-cdr k))))))
we can test this look up with similar test cases as in Haskell program.
(define (test-trie)
(define t (listtrie (list ("a" 1) ("an" 2) ("another" 7)
("boy" 3) ("bool" 4) ("zoo" 3))))
(display (triestring t)) (newline)
(display "lookupan:") (display (lookup t "an")) (newline)
(display "lookupboy:") (display (lookup t "boy")) (newline)
(display "lookupthe:") (display (lookup t "the")) (newline))
This program will output the following result.
(test-trie)
(., (a1, (an2, (ano., (anot., (anoth., (anothe., (another7))))))),
(b., (bo., (boo., (bool4)), (boy3))), (z., (zo., (zoo3))))
lookup an: 2
lookup boy: 3
lookup the: ()
5.6 Alphabetic Patricia Tree
Alphabetic Trie has the same problem as integer Trie. It is not memory ecient.
We can use the same method to compress alphabetic Trie into Patricia.
146CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
5.6.1 Denition of alphabetic Patricia Tree
Alphabetic patricia tree is a special tree, each node contains multiple branches.
All children of a node share the longest common prex string. There is no node
has only one children, because it is conict with the longest common prex
property.
If we turn the Trie shown in gure 5.9 into Patricia tree by compressing
those nodes which has only one child. we can get a Patricia tree like in gure
5.10.
a
a bo
zoo
zoo
an
n
another
other
bool
ol
boy
y
Figure 5.10: A Patricia tree, with key a, an, another, bool, boy and zoo inserted.
Note that the root node always contains empty value.
Denition of alphabetic Patricia Tree in Haskell
We can use a similar denition as Trie in Haskell, we need change the type of
the rst element of children from single character to string.
type Key = String
data Patricia a = Patricia { value :: Maybe a
, children :: [(Key, Patricia a)]}
empty = Patricia Nothing []
leaf :: a Patricia a
leaf x = Patricia (Just x) []
Besides the denition, helper functions to create a empty Patricia node and
to create a leaf node are provided.
Denition of alphabetic Patricia tree in Python
The denition of Patricia tree is same as Trie in Python.
class Patricia:
def __init__(self, value = None):
self.value = value
5.6. ALPHABETIC PATRICIA TREE 147
self.children = {}
Denition of alphabetic Patricia tree in C++
With ISO C++, we abstract the key type of value type as type parameters, and
utilize STL provide map container to represent children of a node.
template<class Key, class Value>
struct Patricia{
typedef Patricia<Key, Value> Self;
typedef std::map<Key, Self> Children;
typedef Key KeyType;
typedef Value ValueType;
Patricia(Value v=Value()):value(v){}
virtual ~Patricia(){
for(typename Children::iterator it=children.begin();
it!=children.end(); ++it)
delete itsecond;
}
Value value;
Children children;
};
For illustration purpose, we simply release the memory in a recursive way.
Denition of alphabetic Patricia tree in Scheme/Lisp
We can fully reuse the denition of alphabetic Trie in Scheme/Lisp. In order to
provide a easy way to create a leaf node, we dene an extra helper function.
(define (make-leaf x)
(make-trie x ()))
5.6.2 Insertion of alphabetic Patricia Tree
When insert a key, s, into the Patricia tree, if the tree is empty, we can just
create an leaf node. Otherwise, we need check each child of the Patricia tree.
Every branch of the children is binding to a key, we denote them as, s
1
, s
2
, ..., s
n
,
which means there are n branches. if s and s
i
have common prex, we then
need branch out 2 new sub branches. Branch itself is represent with the common
prex, each new branches is represent with the dierent parts. Note there are
2 special cases. One is that s is the substring of s
i
, the other is that s
i
is the
substring of s. Figure 5.11 shows these dierent cases.
Iterative insertion algorithm for alphabetic Patricia
The insertion algorithm can be described as below pseudo code.
1: function PATRICIA-INSERT(T, key, value)
2: if T = NIL then
3: T NewNode
148CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
NIL
boy
(a)
bo
ol y
(b)
x
another
p1 p2 ...
y
an
x
other
p1 p2 ...
(c)
an p1 ...
another
insert
an p1 ...
other
insert
(d)
Figure 5.11: (a). Insert key, boy into an empty Patricia tree, the result is a
leaf node;
(b). Insert key, bool into (a), result is a branch with common prex bo.
(c). Insert an, with value y into node x with prex another.
(d). Insert another, into a node with prex an, the key to be inserted
update to other, and do further insertion.
5.6. ALPHABETIC PATRICIA TREE 149
4: p = T
5: loop
6: match FALSE
7: for each i in CHILDREN(p) do
8: if key = KEY (i) then
9: V ALUE(p) value
10: return T
11: prefix LONGEST COMMONPREFIX(key, KEY (i))
12: key1 key subtract prefix
13: key2 KEY (i) subtract prefix
14: if prefix = NIL then
15: match TRUE
16: if key2 = NIL then
17: p TREE(i)
18: key key substract prefix
19: break
20: else
21: CHILDREN(p)[prefix] BRANCH(key1, value, key2, TREE(i))
22: DELETE CHILDREN(p)[KEY (i)]
23: return T
24: if match = FALSE then
25: CHILDREN(p)[key] CREATE LEAF(value)
26: return T
In the above algorithm, LONGEST-COMMON-PREFIX function will nd
the longest common prex of two given string, for example, string bool and
boy has longest common prex bo. BRANCH function will create a branch
node and update keys accordingly.
Insertion of alphabetic Patricia in C++
in C++, to support implicit type conversion we utilize the KeyType and Val-
ueType as parameter types. If we dene Patriciastd::string, std::string, we
can directly provide char* parameters. the algorithm is implemented as the
following.
template<class K, class V>
Patricia<K, V> insert(Patricia<K, V> t,
typename Patricia<K, V>::KeyType key,
typename Patricia<K, V>::ValueType value=V()){
if(!t)
t = new Patricia<K, V>();
Patricia<K, V> p = t;
typedef typename Patricia<K, V>::Children::iterator Iterator;
for(;;){
bool match(false);
for(Iterator it = pchildren.begin(); it!=pchildren.end(); ++it){
K k=itfirst;
if(key == k){
pvalue = value; //overwrite
return t;
}
150CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
K prefix = lcp(key, k);
if(!prefix.empty()){
match=true;
if(k.empty()){ //e.g. insert "another" into "an"
p = itsecond;
break;
}
else{
pchildren[prefix] = branch(key, new Patricia<K, V>(value),
k, itsecond);
pchildren.erase(it);
return t;
}
}
}
if(!match){
pchildren[key] = new Patricia<K, V>(value);
break;
}
}
return t;
}
Where the lcp and branch functions are dened like this.
template<class K>
K lcp(K& s1, K& s2){
typename K::iterator it1(s1.begin()), it2(s2.begin());
for(; it1!=s1.end() && it2!=s2.end() && it1 == it2; ++it1, ++it2);
K res(s1.begin(), it1);
s1 = K(it1, s1.end());
s2 = K(it2, s2.end());
return res;
}
template<class T>
T branch(typename T::KeyType k1, T t1,
typename T::KeyType k2, T t2){
if(k1.empty()){ //e.g. insert "an" into "another"
t1children[k2] = t2;
return t1;
}
T t = new T();
tchildren[k1] = t1;
tchildren[k2] = t2;
return t;
}
Function lcp() will extract the longest common prex and modify its pa-
rameters. Function branch() will create a new node and set the 2 nodes to
be merged as the children. There is a special case, if the key of one node is
sub-string of the other, it will chain them together.
We nd the implementation of patricia to str() will be very same as trie to str(),
so we can reuse it. Also the convert from a list of keys to trie can be reused
// list_to_trie
5.6. ALPHABETIC PATRICIA TREE 151
template<class Iterator, class T>
T list_to_trie(Iterator first, Iterator last, T t){
typedef typename T::ValueType ValueType;
return std::accumulate(first, last, t,
std::ptr_fun(insert_key<T, ValueType>));
}
We put all of the helper function templates to a utility header le, and we
can test patricia insertion program as below.
template<class Iterator>
void test_list_to_patricia(Iterator first, Iterator last){
typedef Patricia<std::string, std::string> PatriciaType;
PatriciaType t(0);
t = list_to_trie(first, last, t);
std::copy(first, last,
std::ostream_iterator<std::string>(std::cout, ","));
std::cout<<"n="<<trie_to_str(t)<<"n";
delete t;
}
void test_insert(){
const char lst1[] = {"a", "an", "another", "b", "bob", "bool", "home"};
test_list_to_patricia(lst1, lst1+sizeof(lst1)/sizeof(char));
const char lst2[] = {"home", "bool", "bob", "b", "another", "an", "a"};
test_list_to_patricia(lst2, lst2+sizeof(lst2)/sizeof(char));
const char lst3[] = {"romane", "romanus", "romulus"};
test_list_to_patricia(lst3, lst3+sizeof(lst3)/sizeof(char));
typedef Patricia<std::string, std::string> PatriciaType;
PatriciaType t(0);
const char keys[] = {"001", "100", "101"};
const char vals[] = {"y", "x", "z"};
for(unsigned int i=0; i<sizeof(keys)/sizeof(char); ++i)
t = insert(t, std::string(keys[i]), std::string(vals[i]));
std::copy(keys, keys+sizeof(keys)/sizeof(char),
std::ostream_iterator<std::string>(std::cout, ","));
std::cout<<"="<<trie_to_str(t)<<"n";
delete t;
}
Running test insert() function will generate the following output.
a, an, another, b, bob, bool, home,
==>(, (a, (an, (another))), (b, (bo, (bob), (bool))), (home))
home, bool, bob, b, another, an, a,
==>(, (a, (an, (another))), (b, (bo, (bob), (bool))), (home))
romane, romanus, romulus,
==>(, (rom, (roman, (romane), (romanus)), (romulus)))
001, 100, 101, ==>(, (001:y), (10, (100:x), (101:z)))
152CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Insertion of alphabetic Patrica Tree in Python
By translate the insertion algorithm into Python language, we can get a program
as below.
def insert(t, key, value = None):
if t is None:
t = Patricia()
node = t
while(True):
match = False
for k, tr in node.children.items():
if key == k: # just overwrite
node.value = value
return t
(prefix, k1, k2)=lcp(key, k)
if prefix != "":
match = True
if k2 == "":
# example: insert "another" into "an", go on traversing
node = tr
key = k1
break
else: #branch out a new leaf
node.children[prefix] = branch(k1, Patricia(value), k2, tr)
del node.children[k]
return t
if not match: # add a new leaf
node.children[key] = Patricia(value)
break
return t
Where the longest common prex nding and branching functions are im-
plemented as the following.
# longest common prefix
# returns (p, s1, s2), where p is lcp, s1=s1-p, s2=s2-p
def lcp(s1, s2):
j=0
while jlen(s1) and jlen(s2) and s1[0:j]==s2[0:j]:
j+=1
j-=1
return (s1[0:j], s1[j:], s2[j:])
def branch(key1, tree1, key2, tree2):
if key1 == "":
#example: insert "an" into "another"
tree1.children[key2] = tree2
return tree1
t = Patricia()
t.children[key1] = tree1
t.children[key2] = tree2
return t
5.6. ALPHABETIC PATRICIA TREE 153
Function lcp check every characters of two strings are same one by one till
it met a dierent one or either of the string nished.
In order to test the insertion program, some helper functions are provided.
def to_string(t):
return trie_to_str(t)
def list_to_patricia(l):
return from_list(l, insert)
def map_to_patricia(m):
return from_map(m, insert)
We can reuse the trie to str since their implementation are same. to string
function can turn a Patricia tree into string by traversing it in pre-order. list to patricia
helps to convert a list of object into a Patricia tree by repeatedly insert every
elements into the tree. While map to string does similar thing except it can
convert a list of key-value pairs into a Patricia tree.
Then we can test the insertion program with below test cases.
class PatriciaTest:
#...
def test_insert(self):
print "testinsert"
t = list_to_patricia(["a", "an", "another", "b", "bob", "bool", "home"])
print to_string(t)
t = list_to_patricia(["romane", "romanus", "romulus"])
print to_string(t)
t = map_to_patricia({"001":y, "100":x, "101":z})
print to_string(t)
t = list_to_patricia(["home", "bool", "bob", "b", "another", "an", "a"]);
print to_string(t)
These test cases will output a series of result like this.
(, (a, (an, (another))), (b, (bo, (bob), (bool))), (home))
(, (rom, (roman, (romane), (romanus)), (romulus)))
(, (001:y), (10, (100:x), (101:z)))
(, (a, (an, (another))), (b, (bo, (bob), (bool))), (home))
Recursive insertion algorithm for alphabetic Patricia
The insertion can also be implemented recursively. When doing insertion, the
program check all the children of the Patricia node, to see if there is a node can
match the key. Match means they have common prex. One special case is that
the keys are same, the program just overwrite the value of that child. If there
is no child can match the key, the program create a new leaf, and add it as a
new child.
1: function PATRICIA-INSERT(T, key, value)
2: if T = NIL then
3: T EmptyNode
4: p FIND MATCH(CHILDREN(T), key)
5: if p = NIL then
6: ADD(CHILDREN(T), CREATE LEAF(key, value))
154CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
7: else if KEY (p) = key then
8: V ALUE(p) value
9: else
10: q BRANCH(CREATE LEAF(key, value), p)
11: ADD(CHILDREN(T), q)
12: DELETE(CHILDREN(T), p)
13: return T
The recursion happens inside call to BRANCH. The longest common prex
of 2 nodes are extracted. If the key to be inserted is the sub-string of the node,
we just chain them together; If the prex of the node is the sub-string of the
key, we recursively insert the rest of the key to the node. In other case, we
create a new node with the common prex and set its two children.
1: function BRANCH(T1, T2)
2: prefix LONGEST COMMON PREFIX(T1, T2)
3: p EmptyNode
4: if prefix = KEY (T1) then
5: KEY (T2) KEY (T2) subtract prefix
6: p CREATE LEAF(prefix, V ALUE(T1))
7: ADD(CHILDREN(p), T2)
8: else if prefix = KEY (T2) then
9: KEY (T1) KEY (T1) subtract prefix
10: p PATRICIAINSERT

(T2, KEY (T1), V ALUE(T1))


11: KEY (p) prefix
12: else
13: KEY (T2) KEY (T2) subtract prefix
14: KEY (T1) KEY (T1) subtract prefix
15: ADD(CHILDREN(p), T1, T2)
16: KEY (p) prefix
17: return p
Insertion of alphabetic Patrica Tree in Haskell
By implementing the above algorithm in Recursive way, we can get a Haskell
program of Patricia insertion.
insert :: Patricia a Key a Patricia a
insert t k x = Patricia (value t) (ins (children t) k x) where
ins [] k x = [(k, Patricia (Just x) [])]
ins (p:ps) k x
| (fst p) == k
= (k, Patricia (Just x) (children (snd p))):ps --overwrite
| match (fst p) k
= (branch k x (fst p) (snd p)):ps
| otherwise
= p:(ins ps k x)
Function insert takes a Patricia tree, a key and a value. It will call an internal
function ins to insert the data into the children of the tree. If the tree has no
children, it simply create a leaf node and put it as the single child of the tree.
In other case it will check each child to see if any one has common prex with
5.6. ALPHABETIC PATRICIA TREE 155
the key. There is a special case, the child has very same key, we can overwrite
the data. If the child has common prex is located, we branch out a new node.
A function match is provided to determine if two keys have common prex
as below.
match :: Key Key Bool
match [] _ = False
match _ [] = False
match x y = head x == head y
This function is straightforward, only if the rst character of the two keys
are identical, we say they have common prex.
Branch out function and the longest common prex function are imple-
mented like the following.
branch :: Key a Key Patricia a (Key, Patricia a)
branch k1 x k2 t2
| k1 == k
-- ex: insert "an" into "another"
= (k, Patricia (Just x) [(k2, t2)])
| k2 == k
-- ex: insert "another" into "an"
= (k, insert t2 k1 x)
| otherwise = (k, Patricia Nothing [(k1, leaf x), (k2, t2)])
where
k = lcp k1 k2
k1 = drop (length k) k1
k2 = drop (length k) k2
lcp :: Key Key Key
lcp [] _ = []
lcp _ [] = []
lcp (x:xs) (y:ys) = if x==y then x:(lcp xs ys) else []
Function take a key, k1, a value, another key, k2, and a Patricia tree t2.
It will rst call lcp to get the longest common prex, k, and the dierent part
of the original key. If k is just k1, which means k1 is a sub-string of k2, we
create a new Patricia node, put the new value in it, then set the left part of
k2 and t2 as the single child of this new node. If k is just k2, which means
k2 is a sub-string of k1, we recursively insert the update key and value to t2.
Otherwise, we create a new node, along with the the longest common prex.
the new node has 2 children, one is t2, the other is a leaf node of the data to be
inserted. Each of them are binding to updated keys.
In order to test the above program, we provided some helper functions.
fromList :: [(Key, a)] Patricia a
fromList xs = foldl ins empty xs where
ins t (k, v) = insert t k v
sort :: (Ord a)[(a, b)] [(a, b)]
sort [] = []
sort (p:ps) = sort xs ++ [p] ++ sort ys where
xs = [x | xps, fst x fst p ]
ys = [y | yps, fst y > fst p ]
156CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
toString :: (Show a)Patricia a String
toString t = toStr t "" where
toStr t prefix = "(" ++ prefix ++ showMaybe (value t) ++
(concat $ map ((k, v)", " ++ toStr v (prefix++k))
(sort $children t))
++ ")"
showMaybe Nothing = ""
showMaybe (Just x) = ":" ++ show x
Function fromList can recursively insert each key-value pair into a Patricia
node. Function sort helps to sort a list of key-value pairs based on keys by
using quick sort algorithm. toString turns a Patricia tree into a string by using
modied pre-order.
After that, we can test our insert program with the following test cases.
testPatricia = "t1=" ++ (toString t1) ++ "n" ++
"t2=" ++ (toString t2)
where
t1 = fromList [("a", 1), ("an", 2), ("another", 7),
("boy", 3), ("bool", 4), ("zoo", 3)]
t2 = fromList [("zoo", 3), ("bool", 4), ("boy", 3),
("another", 7), ("an", 2), ("a", 1)]
main = do
putStrLn testPatricia
No matter whats the insert order, the 2 test cases output an identical result.
t1=(, (a:1, (an:2, (another:7))), (bo, (bool:4), (boy:3)), (zoo:3))
t2=(, (a:1, (an:2, (another:7))), (bo, (bool:4), (boy:3)), (zoo:3))
Insertion of alphabetic Patrica Tree in Scheme/Lisp
In Scheme/Lisp, If the root doesnt has child, we create a leaf node with the
value, and bind the key with this node; if the key binding to one child is equal
to the string we want to insert, we just overwrite the current value; If the key
has common prex of the string to be inserted, we branch out a new node.
(define (insert t k x)
(define (ins lst k x) ;; lst: [(key patrica)]
(if (null? lst) (list (make-child k (make-leaf x)))
(cond ((string=? (key (car lst)) k)
(cons (make-child k (make-trie x (children (tree (car lst)))))
(cdr lst)))
((match? (key (car lst)) k)
(cons (branch k x (key (car lst)) (tree (car lst)))
(cdr lst)))
(else (cons (car lst) (ins (cdr lst) k x))))))
(make-trie (value t) (ins (children t) k x)))
The match? function just test if two strings have common prex.
(define (match? x y)
(and (not (or (string-null? x) (string-null? y)))
(string=? (string-car x) (string-car y))))
5.6. ALPHABETIC PATRICIA TREE 157
Function branch takes 4 parameters, the rst key, the value to be inserted,
the second key, and the Patricia tree need to be branch out. If will rst nd the
longest common prex of the two keys, If it is equal to the rst key, it means
that the rst key is the prex of the second key, we just create a new node
with the value and chained the Patricia tree by setting it as the only one child
of this new node; If the longest common prex is equal to the second key, it
means that the second key is the prex of the rst key, we just recursively insert
the dierent part (the remove the prex part) into this Patricia tree; In other
case, we just create a branch node and set its two children, one is the leaf node
with the value to be inserted, the other is the Patricia tree passed as the fourth
parameter.
(define (branch k1 x k2 t2) ;; returns (key tree)
(let ((k (lcp k1 k2))
(k1-new (string-tail k1 (string-length k)))
(k2-new (string-tail k2 (string-length k))))
(cond ((string=? k1 k) ;; e.g. insert "an" into "another"
(make-child k (make-trie x (list (make-child k2-new t2)))))
((string=? k2 k) ;; e.g. insert "another" into "an"
(make-child k (insert t2 k1-new x)))
(else (make-child k (make-trie
()
(list (make-child k1-new (make-leaf x))
(make-child k2-new t2))))))))
Where the longest common prex is extracted as the following.
(define (lcp x y)
(let ((len (string-match-forward x y)))
(string-head x len)))
We can reuse the list-trie and trie-string functions to test our program.
(define (test-patricia)
(define t (listtrie (list ("a" 1) ("an" 2) ("another" 7)
("boy" 3) ("bool" 4) ("zoo" 3))))
(define t2 (listtrie (list ("zoo" 3) ("bool" 4) ("boy" 3)
("another" 7) ("an" 2) ("a" 1))))
(display (triestring t)) (newline)
(display (triestring t2)) (newline))
Evaluate this function will print t and t2 as below.
(test-patricia)
(., (a1, (an2, (another7))), (bo., (bool4), (boy3)), (zoo3))
(., (a1, (an2, (another7))), (bo., (bool4), (boy3)), (zoo3))
5.6.3 Look up in alphabetic Patricia tree
Dierent with Trie, we cant take each character from the key to look up. We
need check each child to see if it is a prex of the key to be found. If there is
such a child, we then remove the prex from the key, and search this updated
key in that child. if we cant nd any children as a prex of the key, it means
the looking up failed.
158CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Iterative look up algorithm for alphabetic Patricia tree
This algorithm can be described in pseudo code as below.
1: function PATRICIA-LOOK-UP(T, key)
2: if T = NIL then
3: return NIL
4: repeat
5: match FALSE
6: for each i in CHILDREN(T) do
7: if key = KEY (i) then
8: return V ALUE(TREE(i))
9: if KEY (i) IS-PREFIX-OF key then
10: match TRUE
11: key key subtract KEY (i)
12: T TREE(i)
13: break
14: until match = FALSE
15: return NIL
Look up in alphabetic Patrica Tree in C++
In C++, we abstract the key type as a template parameter. By refer to the
KeyType dened in Patricia, we can get the support of implicitly type conver-
sion. If we cant nd the key in a Patricia tree, the below program returns
default value of data type. One alternative is to throw exception.
template<class K, class V>
V lookup(Patricia<K, V> t, typename Patricia<K, V>::KeyType key){
typedef typename Patricia<K, V>::Children::iterator Iterator;
if(!t)
return V(); //or throw exception
for(;;){
bool match(false);
for(Iterator it=tchildren.begin(); it!=tchildren.end(); ++it){
K k = itfirst;
if(key == k)
return itsecondvalue;
K prefix = lcp(key, k);
if((!prefix.empty()) && k.empty()){
match = true;
t = itsecond;
break;
}
}
if(!match)
return V(); //or throw exception
}
}
To verify the look up program, we test it with the following simple test cases.
Patricia<std::string, int> t(0);
const char keys[] = {"a", "an", "another", "boy", "bool", "home"};
const int vals[] = {1, 2, 7, 3, 4, 4};
5.6. ALPHABETIC PATRICIA TREE 159
for(unsigned int i=0; i<sizeof(keys)/sizeof(char); ++i)
t = insert(t, keys[i], vals[i]);
std::cout<<"nlookupanother:"<<lookup(t, "another")
<<"nlookupboo:"<<lookup(t, "boo")
<<"nlookupboy:"<<lookup(t, "boy")
<<"nlookupby:"<<lookup(t, "by")
<<"nlookupboolean:"<<lookup(t, "boolean")<<"n";
delete t;
This program will output the the result like below.
lookup another: 7
lookup boo: 0
lookup boy: 3
lookup by: 0
lookup boolean: 0
Look up in alphabetic Patricia Tree in Python
The implementation of looking up in Python is similar to the pseudo code.
Because Python dont support repeat-until loop directly, a while loop is used
instead.
def lookup(t, key):
if t is None:
return None
while(True):
match = False
for k, tr in t.children.items():
if k == key:
return tr.value
(prefix, k1, k2) = lcp(key, k)
if prefix != "" and k2 == "":
match = True
key = k1
t = tr
break
if not match:
return None
We can verify the looking up program as below.
class PatriciaTest:
# ..
def test_lookup(self):
t = map_to_patricia({"a":1, "an":2, "another":7, "b":1, "bob":3,
"bool":4, "home":4})
print "searchtanother", lookup(t, "another")
print "searchtboo", lookup(t, "boo")
print "searchtbob", lookup(t, "bob")
print "searchtboolean", lookup(t, "boolean")
The test result output in console is like the following.
search t another 7
search t boo None
160CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
search t bob 3
search t boolean None
Recursive look up algorithm for alphabetic Patricia tree
To implement the look up recursively, we just look up among the children of
the Patricia tree.
1: function PATRICIA-LOOK-UP(T, key)
2: if T = NIL then
3: return NIL
4: else
5: return FIND IN CHILDREN(CHILDREN(T), key)
The real recursion happens in FIND-IN-CHILDREN call, we pass the chil-
dren list as an argument. If it is not empty, we take rst child, and check if the
prex of this child is equal to the key; the value of this child will be returned if
they are same; if the prex of the child is just a prex of the key, we recursively
nd in this child with updated key.
1: function FIND-IN-CHILDREN(l, key)
2: if l = NIL then
3: return NIL
4: else if KEY (FIRST(l)) = key then
5: return V ALUE(FIRST(l))
6: else if KEY (FIRST(l)) is prex of key then
7: key key subtract KEY (FIRST(l))
8: return PATRICIALOOK UP

(FIRST(l), key)
9: else
10: return FIND IN CHILDREN(REST(l), key)
Look up in alphabetic Patrica Tree in Haskell
In Haskell implementation, the above algorithm should be turned into recursive
way.
-- lookup
import qualified Data.List
find :: Patricia a Key Maybe a
find t k = find (children t) k where
find [] _ = Nothing
find (p:ps) k
| (fst p) == k = value (snd p)
| (fst p) Data.List.isPrefixOf k = find (snd p) (diff (fst p) k)
| otherwise = find ps k
diff k1 k2 = drop (length (lcp k1 k2)) k2
When we search a given key in a Patricia tree, we recursively check each of
the child. If there are no children at all, we stop the recursion and indicate a
look up failure. In other case, we pick the prex-node pair one by one. If the
prex is as same as the given key, it means the target node is found and the
value of the node is returned. If the key has common prex with the child, the
key will be updated by removing the longest common prex and we performs
looking up recursively.
5.6. ALPHABETIC PATRICIA TREE 161
We can verify the above Haskell program with the following simple cases.
testPatricia = "t1=" ++ (toString t1) ++ "n" ++
"find t1 another =" ++ (show (find t1 "another")) ++ "n" ++
"find t1 bo = " ++ (show (find t1 "bo")) ++ "n" ++
"find t1 boy = " ++ (show (find t1 "boy")) ++ "n" ++
"find t1 boolean = " ++ (show (find t1 "boolean"))
where
t1 = fromList [("a", 1), ("an", 2), ("another", 7), ("boy", 3),
("bool", 4), ("zoo", 3)]
main = do
putStrLn testPatricia
The output is as below.
t1=(, (a:1, (an:2, (another:7))), (bo, (bool:4), (boy:3)), (zoo:3))
find t1 another =Just 7
find t1 bo = Nothing
find t1 boy = Just 3
find t1 boolean = Nothing
Look up in alphabetic Patricia Tree in Scheme/Lisp
The Scheme/Lisp program is given as the following. The function delegate the
looking up to an inner function which will check each child to see if the key
binding to the child match the string we are looking for.
(define (lookup t k)
(define (find lst k) ;; lst, [(k patricia)]
(if (null? lst) ()
(cond ((string=? (key (car lst)) k) (value (tree (car lst))))
((string-prefix? (key (car lst)) k)
(lookup (tree (car lst))
(string-tail k (string-length (key (car lst))))))
(else (find (cdr lst) k)))))
(find (children t) k))
In order to verify this program, some simple test cases are given to search
in the Patricia we created in previous section.
(define (test-patricia)
(define t (listtrie (list ("a" 1) ("an" 2) ("another" 7)
("boy" 3) ("bool" 4) ("zoo" 3))))
(display (triestring t)) (newline)
(display "lookupanother:") (display (lookup t "another")) (newline)
(display "lookupbo:") (display (lookup t "bo")) (newline)
(display "lookupboy:") (display (lookup t "boy")) (newline)
(display "lookupby:") (display (lookup t "by")) (newline)
(display "lookupboolean:") (display (lookup t "boolean")) (newline))
This program will output the same result as the Haskell one.
(test-patricia)
(., (a1, (an2, (another7))), (bo., (bool4), (boy3)), (zoo3))
lookup another: 7
lookup bo: ()
162CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
lookup boy: 3
lookup by: ()
lookup boolean: ()
5.7 Trie and Patricia used in Industry
Trie and Patricia are widely used in software industry. Integer based Patricia
tree is widely used in compiler. Some daily used software has very interest-
ing features can be realized with Trie and Patricia. In the following sections,
Ill list some of them, including, e-dictionary, word auto-completion, t9 input
method etc. The commercial implementation typically doesnt adopt Trie or
Patricia directly. However, Trie and Patricia can be shown as a kind of example
realization.
5.7.1 e-dictionary and word auto-completion
Figure 5.12 shows a screen shot of an English-Chinese dictionary. In order to
provide good user experience, when user input something, the dictionary will
search its word library, and list all candidate words and phrases similar to what
user have entered.
Figure 5.12: e-dictionary. All candidates starting with what user input are
listed.
Typically such dictionary contains hundreds of thousands words, performs a
whole word search is expensive. Commercial software adopts complex approach,
including caching, indexing etc to speed up this process.
Similar with e-dictionary, gure 5.13 shows a popular Internet search engine,
when user input something, it will provide a candidate lists, with all items start
with what user has entered. And these candidates are shown in an order of
5.7. TRIE AND PATRICIA USED IN INDUSTRY 163
popularity. The more people search for a word, the upper position it will be
shown in the list.
Figure 5.13: Search engine. All candidates key words starting with what user
input are listed.
In both case, we say the software provide a kind of word auto-completion
support. In some modern IDE, the editor can even helps user to auto-complete
programmings.
In this section, Ill show a very simple implementation of e-dictionary with
Trie and Patricia. To simplify the problem, let us assume the dictionary only
support English - English information.
Typically, a dictionary contains a lot of key-value pairs, the keys are English
words or phrases, and the relative values are the meaning of the words.
We can store all words and their meanings to a Trie, the drawback for this
approach is that it isnt space eective. Well use Patricia as alternative later
on.
As an example, when user want to look up a, the dictionary does not only
return the meaning of then English word a, but also provide a list of candidate
words, which are all start with a, including abandon, about, accent, adam,
... Of course all these words are stored in Trie.
If there are too many candidates, one solution is only display the top 10
words for the user, and if he like, he can browse more.
Below pseudo code reuse the looking up program in previous sections and
Expand all potential top N candidates.
1: function TRIE-LOOK-UP-TOP-N(T, key, N)
2: p TRIE LOOK UP

(T, key)
3: return EXPAND TOP N(p, key, N)
Note that we should modify the TRIE-LOOK-UP a bit, instead of return
the value of the node, TRIE-LOOK-UP returns the node itself.
Another alternative is to use Patricia instead of Trie. It can save much
164CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
spaces.
Iterative algorithm of search top N candidate in Patricia
The algorithm is similar to the Patricia look up one, but when we found a node
which key start from the string we are looking for, we expand all its children
until we get N candidates.
1: function PATRICIA-LOOK-UP-TOP-N(T, key, N)
2: if T = NIL then
3: return NIL
4: prefix NIL
5: repeat
6: match FALSE
7: for each i in CHILDREN(T) do
8: if key is prex of KEY (i) then
9: return EXPAND TOP N(TREE(i), prefix, N)
10: if KEY (i) is prex of key then
11: match TRUE
12: key key subtract KEY (i)
13: T TREE(i)
14: prefix prefix +KEY (i)
15: break
16: until match = FALSE
17: return NIL
An e-dictionary in Python
In Python implementation, a function trie lookup is provided to perform search
all top N candidate started with a given string.
def trie_lookup(t, key, n):
if t is None:
return None
p = t
for c in key:
if not c in p.children:
return None
p=p.children[c]
return expand(key, p, n)
def expand(prefix, t, n):
res = []
q = [(prefix, t)]
while len(res)<n and len(q)>0:
(s, p) = q.pop(0)
if p.value is not None:
res.append((s, p.value))
for k, tr in p.children.items():
q.append((s+k, tr))
return res
5.7. TRIE AND PATRICIA USED IN INDUSTRY 165
Compare with the Trie look up function, the rst part of this program is
almost same. The dierence part is after we successfully located the node which
matches the key, all sub trees are expanded from this node in a bread-rst search
manner, and the top n candidates are returned.
This program can be veried by below simple test cases.
class LookupTest:
def __init__(self):
dict = {"a":"thefirstletterofEnglish",
"an":"...samedictasinHaskellexample"}
self.tt = trie.map_to_trie(dict)
def run(self):
self.test_trie_lookup()
def test_trie_lookup(self):
print "testlookuptop5"
print "searcha", trie_lookup(self.tt, "a", 5)
print "searchab", trie_lookup(self.tt, "ab", 5)
The test will output the following result.
test lookup to 5
search a [(a, the first letter of English), (an, "used instead of a
when the following word begins with a vowel sound"), (adam, a character in
the Bible who was the first man made by God), (about, on the subject of;
connected with), (abandon, to leave a place, thing or person forever)]
search ab [(about, on the subject of; connected with), (abandon, to
leave a place, thing or person forever)]
To save the spaces, we can also implement such a dictionary search by using
Patricia.
def patricia_lookup(t, key, n):
if t is None:
return None
prefix = ""
while(True):
match = False
for k, tr in t.children.items():
if string.find(k, key) == 0: #is prefix of
return expand(prefix+k, tr, n)
if string.find(key, k) ==0:
match = True
key = key[len(k):]
t = tr
prefix += k
break
if not match:
return None
In this program, we called Python string class to test if a string x is prex
of a string y. In case we locate a node with the key we are looking up is either
equal of as prex of the this sub tree, we expand it till we nd n candidates.
Function expand() can be reused here.
166CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
We can test this program with the very same test cases and the results are
identical to the previous one.
An e-dictionary in C++
In C++ implementation, we overload the look up function by providing an extra
integer n to indicate we want to search top n candidates. the result is a list of
key-value pairs,
//lookup top n candidate with prefix key in Trie
template<class K, class V>
std::list<std::pair<K, V> > lookup(Trie<K, V> t,
typename Trie<K, V>::KeyType key,
unsigned int n)
{
typedef std::list<std::pair<K, V> > Result;
if(!t)
return Result();
Trie<K, V> p(t);
for(typename K::iterator it=key.begin(); it!=key.end(); ++it){
if(pchildren.find(it) == pchildren.end())
return Result();
p = pchildren[it];
}
return expand(key, p, n);
}
The program is almost same as the Trie looking up one, except it will call
expand function when it located the node with the key. Function expand is as
the following.
template<class T>
std::list<std::pair<typename T::KeyType, typename T::ValueType> >
expand(typename T::KeyType prefix, T t, unsigned int n)
{
typedef typename T::KeyType KeyType;
typedef typename T::ValueType ValueType;
typedef std::list<std::pair<KeyType, ValueType> > Result;
Result res;
std::queue<std::pair<KeyType, T> > q;
q.push(std::make_pair(prefix, t));
while(res.size()<n && (!q.empty())){
std::pair<KeyType, T> i = q.front();
KeyType s = i.first;
T p = i.second;
q.pop();
if(pvalue != ValueType()){
res.push_back(std::make_pair(s, pvalue));
}
for(typename T::Children::iterator it = pchildren.begin();
it!=pchildren.end(); ++it)
q.push(std::make_pair(s+itfirst, itsecond));
}
5.7. TRIE AND PATRICIA USED IN INDUSTRY 167
return res;
}
This function use a bread-rst search approach to expand top N candidates,
it maintain a queue to store the node it is currently dealing with. Each time the
program picks a candidate node from the queue, expands all its children and
put them to the queue. the program will terminate when the queue is empty or
we have already found N candidates.
Function expand is generic well use it in later sections.
Then we can provide a helper function to convert the candidate list to read-
able string. Note that this list is actually a list of pairs so we can provide a
generic function.
//list of pairs to string
template<class Container>
std::string lop_to_str(Container coll){
typedef typename Container::iterator Iterator;
std::ostringstream s;
s<<"[";
for(Iterator it=coll.begin(); it!=coll.end(); ++it)
s<<"("<<itfirst<<","<<itsecond<<"),";
s<<"]";
return s.str();
}
After that, we can test the program with some simple test cases.
Trie<std::string, std::string> t(0);
const char dict[] = {
"a", "thefirstletterofEnglish",
"an", "usedinsteadofawhenthefollowingwordbeginswithavowelsound",
"another", "onemorepersonorthingoranextraamount",
"abandon", "toleaveaplace,thingorpersonforever",
"about", "onthesubjectof;connectedwith",
"adam", "acharacterintheBiblewhowasthefirstmanmadebyGod",
"boy", "amalechildor,moregenerally,amaleofanyage",
"body", "thewholephysicalstructurethatformsapersonoranimal",
"zoo", "anareainwhichanimals,especiallywildanimals,arekept"
"sothatpeoplecangoandlookatthem,orstudythem"};
const char first=dict;
const char last =dict + sizeof(dict)/sizeof(char);
for(;first!=last; ++first, ++first)
t = insert(t, first, (first+1));
}
std::cout<<"testlookuptop5inTrien"
<<"searcha"<<lop_to_str(lookup(t, "a", 5))<<"n"
<<"searchab"<<lop_to_str(lookup(t, "ab", 5))<<"n";
delete t;
The result print to the console is something like this:
test lookup top 5 in Trie
search a [(a, the first letter of English), (an, used instead of a
when the following word begins with a vowel sound), (adam, a character
168CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
in the Bible who was the first man made by God), (about, on the
subject of; connected with), (abandon, to leave a place, thing or
person forever), ]
search ab [(about, on the subject of; connected with), (abandon, to
leave a place, thing or person forever), ]
To save the the space with Patricia, we provide a C++ program to search
top N candidate as below.
template<class K, class V>
std::list<std::pair<K, V> > lookup(Patricia<K, V> t,
typename Patricia<K, V>::KeyType key,
unsigned int n)
{
typedef typename std::list<std::pair<K, V> > Result;
typedef typename Patricia<K, V>::Children::iterator Iterator;
if(!t)
return Result();
K prefix;
for(;;){
bool match(false);
for(Iterator it=tchildren.begin(); it!=tchildren.end(); ++it){
K k(itfirst);
if(is_prefix_of(key, k))
return expand(prefix+k, itsecond, n);
if(is_prefix_of(k, key)){
match = true;
prefix += k;
lcp<K>(key, k); //update key
t = itsecond;
break;
}
}
if(!match)
return Result();
}
}
The program iterate all children if the string we are looked up is prex of
one child, we expand this child to nd top N candidates; If the in the opposite
case, we update the string and go on examine into this child Patricia tree.
Where the function is prex of() is dened as below.
// x is prefix of y?
template<class T>
bool is_prefix_of(T x, T y){
if(x.size()y.size())
return std::equal(x.begin(), x.end(), y.begin());
return false;
}
end{lstlisitng}
We use STL equal function to check if x is prefix of y.
The test case is nearly same as the one in Trie.
5.7. TRIE AND PATRICIA USED IN INDUSTRY 169
begin{lstlisting}
Patricia<std::string, std::string> t(0);
const char dict[] = {
"a", "thefirstletterofEnglish",
"an", "usedinsteadofawhenthefollowingwordbeginswithavowelsound",
"another", "onemorepersonorthingoranextraamount",
"abandon", "toleaveaplace,thingorpersonforever",
"about", "onthesubjectof;connectedwith",
"adam", "acharacterintheBiblewhowasthefirstmanmadebyGod",
"boy", "amalechildor,moregenerally,amaleofanyage",
"body", "thewholephysicalstructurethatformsapersonoranimal",
"zoo", "anareainwhichanimals,especiallywildanimals,arekept"
"sothatpeoplecangoandlookatthem,orstudythem"};
const char first=dict;
const char last =dict + sizeof(dict)/sizeof(char);
for(;first!=last; ++first, ++first)
t = insert(t, first, (first+1));
}
std::cout<<"testlookuptop5inTrien"
<<"searcha"<<lop_to_str(lookup(t, "a", 5))<<"n"
<<"searchab"<<lop_to_str(lookup(t, "ab", 5))<<"n";
delete t;
This test case will output a very same result in console.
Recursive algorithm of search top N candidate in Patricia
This algorithm can also be implemented recursively, if the string we are looking
for is empty, we expand all children until we get N candidates. else we recursively
examine the children of the node to see if we can nd one has prex as this string.
1: function PATRICIA-LOOK-UP-TOP-N(T, key, N)
2: if T = NIL then
3: return NIL
4: if KEY = NIL then
5: return EXPAND TOP N(T, NIL, N)
6: else
7: return FINDINCHILDRENTOPN(CHILDREN(T), key, N)
8: function FIND-IN-CHILDREN-TOP-N(l, key, N)
9: if l = NIL then
10: return NIL
11: else if KEY (FIRST(l)) = key then
12: return EXPAND TOP N(FIRST(l), key, N)
13: else if KEY (FIRST(l) is prex of key then
14: return PATRICIA LOOK UP TOP N(FIRST(l), key
subtract KEY (FIRST(l)))
15: else if key is prex of KEY (FIRST(l) then
16: return PATRICIALOOKUP TOP N(FIRST(l), NIL, N)
17: else if
18: thenreturn FINDINCHILDRENTOPN(REST(l), key, N)
170CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
An e-dictionary in Haskell
In Haskell implementation, we provide a function named as ndAll. Thanks for
the lazy evaluation support, ndAll wont produce all candidates words until we
need them. we can use something like take 10 ndAll to get the top 10 words
easily.
ndAll is given as the following.
findAll:: Trie a String [(String, a)]
findAll t [] =
case value t of
Nothing enum (children t)
Just x ("", x):(enum (children t))
where
enum [] = []
enum (p:ps) = (mapAppend (fst p) (findAll (snd p) [])) ++ (enum ps)
findAll t (k:ks) =
case lookup k (children t) of
Nothing []
Just t mapAppend k (findAll t ks)
mapAppend x lst = map (p(x:(fst p), (snd p))) lst
function ndAll take a Trie, a word to be looked up, it will output a list of
pairs, the rst element of the pair is the candidate word, the second element of
the pair is the meaning of the word.
Compare with the nd function of Trie, the none-trivial case is very similar.
We take a letter form the words to be looked up, if there is no child starting
with this letter, the program returns empty list. If there is such a child starting
with this letter, this child should be a candidate. We use function mapAppend
to add this letter in front of all elements of recursively founded candidate words.
In case we consumed all letters, we next returns all potential words, which
means the program will traverse all children of the current node.
Note that only the node with value eld not equal to None is a meaningful
word in our dictionary. We need append the list with the right meaning.
With this function, we can construct a very simple dictionary and return
top 5 candidate to user. Here is the test program.
testFindAll = "nlook up a: " ++ (show $ take 5 $findAll t "a") ++
"nlook up ab: " ++ (show $ take 5 $findAll t "ab")
where
t = fromList [
("a", "the first letter of English"),
("an", "used instead of a when the following word begins with"
"a vowel sound"),
("another", "one more person or thing or an extra amount"),
("abandon", "to leave a place, thing or person forever"),
("about", "on the subject of; connected with"),
("adam", "a character in the Bible who was the first man made by God"),
("boy", "a male child or, more generally, a male of any age"),
("body", "the whole physical structure that forms a person or animal"),
("zoo", "an area in which animals, especially wild animals, are kept"
" so that people can go and look at them, or study them")]
5.7. TRIE AND PATRICIA USED IN INDUSTRY 171
main = do
putStrLn testFindAll
This program will out put a result like this:
look up a: [("a","the first letter of English"),("an","used instead of a
when the following word begins with a vowel sound"),("another","one more
person or thing or an extra amount"),("abandon","to leave a place, thing
or person forever"),("about","on the subject of; connected with")]
look up ab: [("abandon","to leave a place, thing or person forever"),
("about","on the subject of; connected with")]
The Trie solution wasts a lot of spaces. It is very easy to improve the above
program with Patricia. Below source code shows the Patricia approach.
findAll :: Patricia a Key [(Key, a)]
findAll t [] =
case value t of
Nothing enum $ children t
Just x ("", x):(enum $ children t)
where
enum [] = []
enum (p:ps) = (mapAppend (fst p) (findAll (snd p) [])) ++ (enum ps)
findAll t k = find (children t) k where
find [] _ = []
find (p:ps) k
| (fst p) == k
= mapAppend k (findAll (snd p) [])
| (fst p) Data.List.isPrefixOf k
= mapAppend (fst p) (findAll (snd p) (k diff (fst p)))
| k Data.List.isPrefixOf (fst p)
= findAll (snd p) []
| otherwise = find ps k
diff x y = drop (length y) x
mapAppend s lst = map (p(s++(fst p), snd p)) lst
If compare this program with the one implemented by Trie, we can nd they
are very similar to each other. In none-trivial case, we just examine each child
to see if any one match the key to be looked up. If one child is exactly equal
to the key, we then expand all its sub branches and put them to the candidate
list. If the child correspond to a prex of the key, the program goes on nd the
the rest part of the key along this child and concatenate this prex to all later
results. If the current key is prex to a child, the program will traverse this
child and return all its sub branches as candidate list.
This program can be tested with the very same case as above, and it will
output the same result.
An e-dictionary in Scheme/Lisp
In Scheme/Lisp implementation with Trie, a function named nd is used to
search all candidates start with a given string. If the string is empty, the program
will enumerate all sub trees as result; else the program calls an inner function
nd-child to search a child which matches the rst character of the given string.
172CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Then the program recursively apply the nd function to this child with the rest
characters of the string to be searched.
(define (find t k)
(define (find-child lst k)
(tree (find-matching-item lst (lambda (c) (string=? (key c) k)))))
(if (string-null? k)
(enumerate t)
(let ((t-new (find-child (children t) (string-car k))))
(if (null? t-new) ()
(map-string-append (string-car k) (find t-new (string-cdr k)))))))
Note that the map-string-append will insert the rst character to all the
elements (more accurately, each element is a pair with a key and a value, map-
string-append insert the character in front of each key) in the result returned
by recursive call. It is dened like this.
(define (map-string-append x lst) ;; lst: [(key value)]
(map (lambda (p) (cons (string-append x (car p)) (cdr p))) lst))
The enumerate function which can expend all sub trees is implemented as
the following.
(define (enumerate t) ;; enumerate all sub trees
(if (null? t) ()
(let ((res (append-map
(lambda (p)(map-string-append (key p)(enumerate (tree p))))
(children t))))
(if (null? (value t)) res
(cons (cons "" (value t)) res)))))
The test case is a very simple list of word-meaning pairs.
(define dict
(list ("a" "thefirstletterofEnglish")
("an" "usedinsteadofawhenthefollowingwordbeginswithavowelsound")
("another" "onemorepersonorthingoranextraamount")
("abandon" "toleaveaplace,thingorpersonforever")
("about" "onthesubjectof;connectedwith")
("adam" "acharacterintheBiblewhowasthefirstmanmadebyGod")
("boy" "amalechildor,moregenerally,amaleofanyage")
("body" "thewholephysicalstructurethatformsapersonoranimal")
("zoo" "anareainwhichanimals,especiallywildanimals,
arekeptsothatpeoplecangoandlookatthem,orstudythem")))
After feed this dict to a Trie, if user tries to nd a* or ab* like below.
(define (test-trie-find-all)
(define t (listtrie dict))
(display "finda:") (display (find t "a")) (newline)
(display "findab:") (display (find t "ab")) (newline))
The result is a list with all candidates start with the given string.
(test-trie-find-all)
find a*: ((a . the first letter of English) (an . used instead of a
when the following word begins with a vowel sound) (another . one more
person or thing or an extra amount) (abandon . to leave a place, thing
5.7. TRIE AND PATRICIA USED IN INDUSTRY 173
or person forever) (about . on the subject of; connected with) (adam
. a character in the Bible who was the first man made by God))
find ab*: ((abandon . to leave a place, thing or person forever)
(about . on the subject of; connected with))
Trie approach isnt space eective. Patricia can be one alternative to improve
in terms of space.
We can fully reuse the function enumerate, map-string-append which are
dened for trie. the nd function for Patricia is implemented as the following.
(define (find t k)
(define (find-child lst k)
(if (null? lst) ()
(cond ((string=? (key (car lst)) k)
(map-string-append k (enumerate (tree (car lst)))))
((string-prefix? (key (car lst)) k)
(let ((k-new (string-tail k (string-length (key (car lst))))))
(map-string-append (key (car lst)) (find (tree (car lst)) k-new))))
((string-prefix? k (key (car lst))) (enumerate (tree (car lst))))
(else (find-child (cdr lst) k)))))
(if (string-null? k)
(enumerate t)
(find-child (children t) k)))
If the same test cases of search all candidates of a* and ab* are fed we
can get a very same result.
5.7.2 T9 input method
Most mobile phones around year 2000 has a key pad. To edit a short mes-
sage/email with such key-pad, users typically have quite dierent experience
from PC. Because a mobile-phone key pad, or so called ITU-T key pad has few
keys. Figure g:itut-keypad shows an example.
Figure 5.14: an ITU-T keypad for mobile phone.
174CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
There are typical 2 methods to input an English word/phrase with ITU-T
key pad. For instance, if user wants to enter a word home, He can press the
key in below sequence.
Press key 4 twice to enter the letter h;
Press key 6 three times to enter the letter o;
Press key 6 twice to enter the letter m;
Press key 3 twice to enter the letter e;
Another high ecient way is to simplify the key press sequence like the
following.
Press key 4, 6, 6, 3, word home appears on top of the candidate
list;
Press key * to change a candidate word, so word good appears;
Press key * again to change another candidate word, next word gone
appears;
...
Compare these 2 method, we can see method 2 is much easier for the end
user, and it is operation ecient. The only overhead is to store a candidate
words dictionary.
Method 2 is called as T9 input method, or predictive input method [6],
[7]. The abbreviation T9 stands for textonym. In this section, Ill show an
example implementation of T9 by using Trie and Patricia.
In order to provide candidate words to user, a dictionary must be prepared
in advance. Trie or Patricia can be used to store the Dictionary. In the real
commercial software, complex indexing dictionary is used. We show the very
simple Trie and Patricia only for illustration purpose.
Iterative algorithm of T9 looking up
Below pseudo code shows how to realize T9 with Trie.
1: function TRIE-LOOK-UP-T9(T, key)
2: PUSH BACK(Q, NIL, key, T)
3: r NIL
4: while Q is not empty do
5: p, k, t POP FRONT(Q)
6: i FIRST LETTER(k)
7: for each c in T9 MAPPING(i) do
8: if c is in CHILDREN(t) then
9: k

k subtract i
10: if k

is empty then
11: APPEND(r, p +c)
12: else
13: PUSH BACK(Q, p +c, k

, CHILDREN(t)[c])
14: return r
5.7. TRIE AND PATRICIA USED IN INDUSTRY 175
This is actually a bread-rst search program. It utilizes a queue to store
the current node and key string we are examining. The algorithm takes the
rst digit from the key, looks up it in T9 mapping to get all English letters
corresponding to this digit. For each letter, if it can be found in the children
of current node, the node along with the English string found so far are push
back to the queue. In case all digits are examined, a candidate is found. Well
append this candidate to the result list. The loop will terminate when the queue
is empty.
Since Trie is not space eective, minor modication of the above program
can work with Patricia, which can help to save extra spaces.
1: function PATRICIA-LOOK-UP-T9(T, key)
2: PUSH BACK(Q, NIL, key, T)
3: r NIL
4: while Q is not empty do
5: p, k, t POP FRONT(Q)
6: for each child in CHILDREN(t) do
7: k

CONV ERT T9(KEY (child))


8: if k

IS-PREFIX-OF k then
9: if k

= k then
10: APPEND(r, p +KEY (child))
11: else
12: PUSH BACK(Q, p +KEY (child), k k

, child)
13: return r
T9 implementation in Python
In Python implementation, T9 looking up is realized in a typical bread-rst
search algorithm as the following.
T9MAP={2:"abc", 3:"def", 4:"ghi", 5:"jkl",
6:"mno", 7:"pqrs", 8:"tuv", 9:"wxyz"}
def trie_lookup_t9(t, key):
if t is None or key == "":
return None
q = [("", key, t)]
res = []
while len(q)>0:
(prefix, k, t) = q.pop(0)
i=k[0]
if not i in T9MAP:
return None #invalid input
for c in T9MAP[i]:
if c in t.children:
if k[1:]=="":
res.append((prefix+c, t.children[c].value))
else:
q.append((prefix+c, k[1:], t.children[c]))
return res
Function trie lookup t9 check if the parameters are valid rst. Then it push
the initial data into a queue. The program repeatedly pop the item from the
176CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
queue, including what node it will examine next, the number sequence string,
and the alphabetic string it has been searched.
For each popped item, the program takes the next digit from the number
sequence, and looks up in T9 map to nd the corresponding English letters.
With all these letters, if they can be found in the children of the current node,
well push this child along with the updated number sequence string and updated
alphabetic string into the queue. In case we process all numbers, we nd a
candidate result.
We can verify the above program with the following test cases.
class LookupTest:
def __init__(self):
t9dict = ["home", "good", "gone", "hood", "a", "another", "an"]
self.t9t = trie.list_to_trie(t9dict)
def test_trie_t9(self):
print "search4", trie_lookup_t9(self.t9t, "4")
print "search46", trie_lookup_t9(self.t9t, "46")
print "search4663", trie_lookup_t9(self.t9t, "4663")
print "search2", trie_lookup_t9(self.t9t, "2")
print "search22", trie_lookup_t9(self.t9t, "22")
If we run the test, it will output a very same result as the above Haskell
program.
search 4 [(g, None), (h, None)]
search 46 [(go, None), (ho, None)]
search 4663 [(gone, None), (good, None), (home, None), (hood, None)]
search 2 [(a, None)]
search 22 []
To save the spaces, Patricia can be used instead of Trie.
def patricia_lookup_t9(t, key):
if t is None or key == "":
return None
q = [("", key, t)]
res = []
while len(q)>0:
(prefix, key, t) = q.pop(0)
for k, tr in t.children.items():
digits = toT9(k)
if string.find(key, digits)==0: #is prefix of
if key == digits:
res.append((prefix+k, tr.value))
else:
q.append((prefix+k, key[len(k):], tr))
return res
Compare to the implementation with Trie, they are very similar. We also
used a bread-rst search approach. The dierent part is that we convert the
string of each child to number sequence string according to T9 mapping. if it is
prex of the key we are looking for, we push this child along with updated key
and prex. In case we examined all digits, we nd a candidate result.
The convert function is a reverse mapping process as below.
5.7. TRIE AND PATRICIA USED IN INDUSTRY 177
def toT9(s):
res=""
for c in s:
for k, v in T9MAP.items():
if string.find(v, c)0:
res+=k
break
#error handling skipped.
return res
For illustration purpose, the error handling for invalid letters is skipped.
If we feed the program with the same test cases, we can get a result as the
following.
search 4 []
search 46 [(go, None), (ho, None)]
search 466 []
search 4663 [(good, None), (gone, None), (home, None), (hood, None)]
search 2 [(a, None)]
search 22 []
The result is slightly dierent from the one output by Trie. The reason is
as same as what we analyzed in Haskell implementation. It is easily to modify
the program to output a similar result.
T9 implemented in C++
First we dene T9 mapping as a Singleton object, this is because we want to it
can be used both in Trie look up and Patricia look up programs.
struct t9map{
typedef std::map<char, std::string> Map;
Map map;
t9map(){
map[2]="abc";
map[3]="def";
map[4]="ghi";
map[5]="jkl";
map[6]="mno";
map[7]="pqrs";
map[8]="tuv";
map[9]="wxyz";
}
static t9map& inst(){
static t9map i;
return i;
}
};
Note in other languages or keypad layout, we can dene dierent mappings
and pass them as an argument to the looking up function.
178CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
With this mapping, the looking up in Trie can be given as below. Although
we want to keep the genericity of the program, for illustration purpose, we just
simply use the t9 mapping directly.
In order to keep the code as short as possible, a boost library tool, boost::tuple
is used. For more about boost::tuple, please refer to [8].
template<class K, class V>
std::list<std::pair<K, V> > lookup_t9(Trie<K, V> t,
typename Trie<K, V>::KeyType key)
{
typedef std::list<std::pair<K, V> > Result;
typedef typename Trie<K, V>::KeyType Key;
typedef typename Trie<K, V>::Char Char;
if((!t) | | key.empty())
return Result();
Key prefix;
std::map<Char, Key> m = t9map::inst().map;
std::queue<boost::tuple<Key, Key, Trie<K, V>> > q;
q.push(boost::make_tuple(prefix, key, t));
Result res;
while(!q.empty()){
boost::tie(prefix, key, t) = q.front();
q.pop();
Char c = key.begin();
key = Key(key.begin()+1, key.end());
if(m.find(c) == m.end())
return Result();
Key cs = m[c];
for(typename Key::iterator it=cs.begin(); it!=cs.end(); ++it)
if(tchildren.find(it)!=tchildren.end()){
if(key.empty())
res.push_back(std::make_pair(prefix+it, tchildren[it]value));
else
q.push(boost::make_tuple(prefix+it, key, tchildren[it]));
}
}
return res;
}
This program will rst check if the Patricia tree or the key are empty to deal
with with trivial case. It next initialize a queue, and push one tuple to it. the
tuple contains 3 elements, a prex to represent a string the program has been
searched, current key it need look up, and a node it will examine.
Then the program repeatedly pops the tuple from the queue, takes the rst
character from the key, and looks up in T9 map to get a candidate English
letter list. With each letter in this list, the program examine if it exists in the
children of current node. In case it nd such a child, if there is no left letter
to look up, it means we found a candidate result, we push it to the result list.
Else, we create a new tuple with updated prex, key and this child; the push it
to the queue for later process.
Below are some simple test cases for verication.
5.7. TRIE AND PATRICIA USED IN INDUSTRY 179
Trie<std::string, std::string> t9trie(0);
const char t9dict[] = {"home", "good", "gone", "hood", "a", "another", "an"};
t9trie = list_to_trie(t9dict, t9dict+sizeof(t9dict)/sizeof(char), t9trie);
std::cout<<"testt9lookupinTrien"
<<"search4"<<lop_to_str(lookup_t9(t9trie, "4"))<<"n"
<<"serach46"<<lop_to_str(lookup_t9(t9trie, "46"))<<"n"
<<"serach4663"<<lop_to_str(lookup_t9(t9trie, "4663"))<<"n"
<<"serach2"<<lop_to_str(lookup_t9(t9trie, "2"))<<"n"
<<"serach22"<<lop_to_str(lookup_t9(t9trie, "22"))<<"nn";
delete t9trie;
It will output the same result as the Python program.
test t9 lookup in Trie
search 4 [(g, ), (h, ), ]
serach 46 [(go, ), (ho, ), ]
serach 4663 [(gone, ), (good, ), (home, ), (hood, ), ]
serach 2 [(a, ), ]
serach 22 []
In order to save space, a looking up program for Patricia is also provided.
template<class K, class V>
std::list<std::pair<K, V> > lookup_t9(Patricia<K, V> t,
typename Patricia<K, V>::KeyType key)
{
typedef std::list<std::pair<K, V> > Result;
typedef typename Patricia<K, V>::KeyType Key;
typedef typename Key::value_type Char;
typedef typename Patricia<K, V>::Children::iterator Iterator;
if((!t) | | key.empty())
return Result();
Key prefix;
std::map<Char, Key> m = t9map::inst().map;
std::queue<boost::tuple<Key, Key, Patricia<K, V>> > q;
q.push(boost::make_tuple(prefix, key, t));
Result res;
while(!q.empty()){
boost::tie(prefix, key, t) = q.front();
q.pop();
for(Iterator it=tchildren.begin(); it!=tchildren.end(); ++it){
Key digits = t9map::inst().toT9(itfirst);
if(is_prefix_of(digits, key)){
if(digits == key)
res.push_back(std::make_pair(prefix+itfirst, itsecondvalue));
else{
key =Key(key.begin()+itfirst.size(), key.end());
q.push(boost::make_tuple(prefix+itfirst, key, itsecond));
}
}
}
}
return res;
180CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
}
The program is similar to the one with Trie very much. This is a typical
bread-rst search approach. Note that we added a member function to t9()
to convert a English word/phrase back to digit number string. This member
function is implemented as the following.
struct t9map{
//...
std::string to_t9(std::string s){
std::string res;
for(std::string::iterator c=s.begin(); c!=s.end(); ++c){
for(Map::iterator m=map.begin(); m!=map.end(); ++m){
std::string val = msecond;
if(std::find(val.begin(), val.end(), c)!=val.end()){
res.push_back(mfirst);
break;
}
}
} // skip error handling.
return res;
}
The error handling for invalid letters is omitted in order to keep the code
short for easy understanding. We can use the very similar test cases as above
except we need change the Trie to Patrica. It will output as below.
test t9 lookup in Patricia
search 4 []
serach 46 [(go, ), (ho, ), ]
serach 466 []
serach 4663 [(gone, ), (good, ), ]
serach 2 [(a, ), ]
serach 22 []
The result is slightly dierent, please refer to the Haskell section for the
reason of this dierence. It is very easy to modify the program to output the
very same result as Tries one.
Recursive algorithm of T9 looking up
T9 implemented in Haskell
In Haskell, we rst dene a map from key pad to English letter. When user
input a key pad number sequence, we take each number and check from the
Trie. All children match the number should be investigated. Below is a Haskell
program to realize T9 input.
mapT9 = [(2, "abc"), (3, "def"), (4, "ghi"), (5, "jkl"),
(6, "mno"), (7, "pqrs"), (8, "tuv"), (9, "wxyz")]
lookupT9 :: Char [(Char, b)] [(Char, b)]
lookupT9 c children = case lookup c mapT9 of
Nothing []
Just s foldl f [] s where
5.7. TRIE AND PATRICIA USED IN INDUSTRY 181
f lst x = case lookup x children of
Nothing lst
Just t (x, t):lst
-- T9-find in Trie
findT9:: Trie a String [(String, Maybe a)]
findT9 t [] = [("", Trie.value t)]
findT9 t (k:ks) = foldl f [] (lookupT9 k (children t))
where
f lst (c, tr) = (mapAppend c (findT9 tr ks)) ++ lst
ndT9 is the main function, it takes 2 parameters, a Trie and a number
sequence string. In non-trivial case, it calls lookupT9 function to examine all
children which match the rst number.
For each matched child, the program recursively calls ndT9 on it with the
left numbers, and we use mapAppend to insert the currently nding letter in
front of all results. The program use foldl to combine all these together.
Function lookupT9 is used to ltered all possible children who match a
number. It rst call lookup function on mapT9, so that a string of possible
English letters can be identied. Next we call lookup for each candidate letter
to see if there is a child can match the letter. We use foldl to collect all such
child together.
This program can be veried by using some simple test cases.
testFindT9 = "press 4: " ++ (show $ take 5 $ findT9 t "4")++
"npress 46: " ++ (show $ take 5 $ findT9 t "46")++
"npress 4663: " ++ (show $ take 5 $ findT9 t "4663")++
"npress 2: " ++ (show $ take 5 $ findT9 t "2")++
"npress 22: " ++ (show $ take 5 $ findT9 t "22")
where
t = Trie.fromList lst
lst = [("home", 1), ("good", 2), ("gone", 3), ("hood", 4),
("a", 5), ("another", 6), ("an", 7)]
The program will output below result.
press 4: [("g",Nothing),("h",Nothing)]
press 46: [("go",Nothing),("ho",Nothing)]
press 4663: [("gone",Just 3),("good",Just 2),("home",Just 1),("hood",Just 4)]
press 2: [("a",Just 5)]
press 22: []
The value of each child is just for illustration, we can put empty value instead
and only returns candidate keys for a real input application.
Tries consumes to many spaces, we can provide a Patricia version as alter-
native.
findPrefixT9 :: String [(String, b)] [(String, b)]
findPrefixT9 s lst = filter f lst where
f (k, _) = (toT9 k) Data.List.isPrefixOf s
toT9 :: String String
toT9 [] = []
toT9 (x:xs) = (unmapT9 x mapT9):(toT9 xs) where
182CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
unmapT9 x (p:ps) = if x elem (snd p) then (fst p) else unmapT9 x ps
findT9 :: Patricia a String [(String, Maybe a)]
findT9 t [] = [("", value t)]
findT9 t k = foldl f [] (findPrefixT9 k (children t))
where
f lst (s, tr) = (mapAppend s (findT9 tr (k diff s))) ++ lst
diff x y = drop (length y) x
In this program, we dont check one digit at a time, we take all the digit
sequence, and we examine all children of the Patricia node. For each child, the
program convert the prex string to number sequence by using function toT9, if
the result is prex of what user input, we go on search in this child and append
the prex in front of all further results.
If we tries the same test case, we can nd the result is a bit dierent.
press 4: []
press 46: [("go",Nothing),("ho",Nothing)]
press 466: []
press 4663: [("good",Just 2),("gone",Just 3),("home",Just 1),("hood",Just 4)]
press 2: [("a",Just 5)]
press 22: []
If user press key 4, because the dictionary (represent by Patricia) doesnt
contain any candidates matches it, user will get an empty candidates list. The
same situation happens when he enters 466. In real input method implemen-
tation, such user experience isnt good, because it displays nothing although
user presses the key several times. One improvement is to predict what user
will input next by display a partial result. This can be easily achieved by modify
the above program. (Hint: not only check
findPrefixT9 s lst = filter f lst where
f (k, _) = (toT9 k) Data.List.isPrefixOf s
but also check
f (k, _) = s Data.List.isPrefixOf (toT9 k)
)
T9 implemented in Scheme/Lisp
In Scheme/Lisp, T9 map is dened as a list of pairs.
(define map-T9 (list ("2" "abc") ("3" "def") ("4" "ghi") ("5" "jkl")
("6" "mno") ("7" "pqrs") ("8" "tuv") ("9" "wxyz")))
The main searching function is implemented as the following.
(define (find-T9 t k) ;; return [(key value)]
(define (accumulate-find lst child)
(append (map-string-append (key child) (find-T9 (tree child) (string-cdr k)))
lst))
(define (lookup-child lst c) ;; lst, list of childen [(key tree)], c, char
(let ((res (find-matching-item map-T9 (lambda (x) (string=? c (car x))))))
(if (not res) ()
5.7. TRIE AND PATRICIA USED IN INDUSTRY 183
(filter (lambda (x) (substring? (key x) (cadr res))) lst))))
(if (string-null? k) (list (cons k (value t)))
(fold-left accumulate-find () (lookup-child (children t) (string-car k)))))
This function contains 2 inner functions. If the string is empty, the program
returns a one element list. The element is a string value pair. For the none
trivial case, the program will call inner function to nd in each child and then
put them together by using fold-left high order function.
To test this T9 search function, a very simple dictionary is established by
using Trie insertion. Then we test by calling nd-T9 function on several digits
sequences.
(define dict-T9 (list ("home" ()) ("good" ()) ("gone" ()) ("hood" ())
("a" ()) ("another" ()) ("an" ())))
(define (test-trie-T9)
(define t (listtrie dict-T9))
(display "find4:") (display (find-T9 t "4")) (newline)
(display "find46:") (display (find-T9 t "46")) (newline)
(display "find4663:") (display (find-T9 t "4663")) (newline)
(display "find2:") (display (find-T9 t "2")) (newline)
(display "find22:") (display (find-T9 t "22")) (newline))
Evaluate this test function will output below result.
find 4: ((g) (h))
find 46: ((go) (ho))
find 4663: ((gone) (good) (hood) (home))
find 2: ((a))
find 22: ()
In order to be more space eective, Patricia can be used to replace Trie. The
search program modied as the following.
(define (find-T9 t k)
(define (accumulate-find lst child)
(append (map-string-append (key child) (find-T9 (tree child) (string- k (key child))))
lst))
(define (lookup-child lst k)
(filter (lambda (child) (string-prefix? (strt9 (key child)) k)) lst))
(if (string-null? k) (list (cons "" (value t)))
(fold-left accumulate-find () (lookup-child (children t) k))))
In this program a string helper function string- is dened to get the dierent
part of two strings. It is dened like below.
(define (string- x y)
(string-tail x (string-length y)))
Another function is str-t9 it will convert a alphabetic string back to digit
sequence base on T9 mapping.
(define (strt9 s)
(define (unmap-t9 c)
(car (find-matching-item map-T9 (lambda (x) (substring? c (cadr x))))))
(if (string-null? s) ""
(string-append (unmap-t9 (string-car s)) (strt9 (string-cdr s)))))
184CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
We can feed the almost same test cases, and the result is output as the
following.
find 4: ()
find 46: ((go) (ho))
find 466: ()
find 4663: ((good) (gone) (home) (hood))
find 2: ((a))
find 22: ()
Note the result is a bit dierent, the reason is described in Haskell section.
It is easy to modify the program, so that Trie and Patricia approach give the
very same result.
5.8 Short summary
In this post, we start from the Integer base trie and Patricia, the map data
structure based on integer patricia plays an important role in Compiler im-
plementation. Next, alphabetic Trie and Patricia are given, and I provide a
example implementation to illustrate how to realize a predictive e-dictionary
and a T9 input method. Although they are far from the real implementation in
commercial software. They show a very simple approach of manipulating text.
There are still some interesting problem can not be solved by Trie or Patrica
directly, how ever, some other data structures such as sux tree have close
relationship with them. Ill note something about sux tree in other post.
5.9 Appendix
All programs provided along with this article are free for downloading.
5.9.1 Prerequisite software
GNU Make is used for easy build some of the program. For C++ and ANSI C
programs, GNU GCC and G++ 3.4.4 are used. I use boost triple to reduce the
amount of our code lines, boost library version I am using is 1.33.1. The path
is in CXX variable in Makele, please change it to your path when compiling.
For Haskell programs GHC 6.10.4 is used for building. For Python programs,
Python 2.5 is used for testing, for Scheme/Lisp program, MIT Scheme 14.9 is
used.
all source les are put in one folder. Invoke make or make all will build
C++ and Haskell program.
Run make Haskell will separate build Haskell program. There will be two
executable le generated one is htest the other is happ (with .exe in Window
like OS). Run htest will test functions in IntTrie.hs, IntPatricia.hs, Trie.hs and
Patricia.hs. Run happ will execute the editionary and T9 test cases in EDict.hs.
Run make cpp will build c++ program. It will create a executable le
named cpptest (with .exe in Windows like OS). Run this program will test
inttrie.hpp, intpatricia.hpp, trie.hpp, patricia.hpp, and edict.hpp.
5.9. APPENDIX 185
Run make c will build the ANSI C program for Trie. It will create a
executable le named triec (with .exe in Windows like OS).
Python programs can run directly with interpreter.
Scheme/Lisp program need be loaded into Scheme evaluator and evaluate the
nal function in the program. Note that patricia.scm will hide some functions
dened in trie.scm.
Here is a detailed list of source les
5.9.2 Haskell source les
IntTrie.hs, Haskell version of little-endian integer Trie.
IntPatricia.hs, integer Patricia tree implemented in Haskell.
Trie.hs, Alphabetic Trie, implemented in Haskell.
Patricia.hs, Alphabetic Patricia, implemented in Haskell.
TestMain.hs, main module to test the above 4 programs.
EDict.hs, Haskell program for e-dictionary and T9.
5.9.3 C++/C source les
inttrie.hpp, Integer base Trie;
intpatricia.hpp, Integer based Patricia tree;
trie.c, Alphabetic Trie only for lowercase English language, implemented
in ANSI C.
trie.hpp, Alphabetic Trie;
patricia.hpp, Alphabetic Patricia;
trieutil.hpp, Some generic utilities;
edit.hpp, e-dictionary and T9 implemented in C++;
test.cpp, main program to test all above programs.
5.9.4 Python source les
inttrie.py, Python version of little-endian integer Trie, with test cases;
intpatricia.py, integer Patricia tree implemented in Python;
trie.py, Alphabetic Trie, implemented in Python;
patricia.py, Alphabetic Patricia implemented in Python;
trieutil.py, Common utilities;
edict.py, e-dictionary and T9 implemented in Python.
186CHAPTER 5. TRIE AND PATRICIA WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
5.9.5 Scheme/Lisp source les
inttrie.scm, Little-endian integer Trie, implemented in Scheme/Lisp;
intpatricia.scm, Integer based Patricia tree;
trie.scm, Alphabetic Trie;
patricia.scm, Alphabetic Patricia, reused many denitions in Trie;
trieutil.scm, common functions and utilities.
5.9.6 Tools
Besides them, I use graphviz to draw most of the gures in this post. In order
to translate the Trie, Patrica and Sux Tree output to dot language scripts. I
wrote a python program. it can be used like this.
trie2dot.py -o foo.dot -t patricia "1:x, 4:y, 5:z"
trie2dot.py -o foo.dot -t trie "001:one, 101:five, 100:four"
This helper scripts can also be downloaded with this article.
download position: http://sites.google.com/site/algoxy/trie/trie.zip
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[2] Chris Okasaki and Andrew Gill. Fast Mergeable Integer Maps. Workshop
on ML, September 1998, pages 77-86, http://www.cse.ogi.edu/ andy/pub-
/nite.htm
[3] D.R. Morrison, PATRICIA Practical Algorithm To Retrieve Information
Coded In Alphanumeric, Journal of the ACM, 15(4), October 1968, pages
514-534.
[4] Sux Tree, Wikipedia. http://en.wikipedia.org/wiki/Sux tree
[5] Trie, Wikipedia. http://en.wikipedia.org/wiki/Trie
[6] T9 (predictive text), Wikipedia. http://en.wikipedia.org/wiki/T9 (predictive text)
[7] Predictive text, Wikipedia. http://en.wikipedia.org/wiki/Predictive text
[8] Bjorn Karlsson. Beyond the C++ Standard Library: An Introduction to
Boost. Addison Wesley Professional, August 31, 2005, ISBN: 0321133544
Sux Tree with Functional and imperative implementation
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
187
188 Sux Tree
Chapter 6
Sux Tree with Functional
and imperative
implementation
6.1 abstract
Sux Tree is an important data structure. It is quite powerful in string and
DNA information manipulations. Sux Tree is introduced in 1973. The lat-
est on-line construction algorithm was found in 1995. This post collects some
existing result of sux tree, including the construction algorithms as well as
some typical applications. Some imperative and functional implementation
are given. There are multiple programming languages used, including C++,
Haskell, Python and Scheme/Lisp.
There may be mistakes in the post, please feel free to point out.
This post is generated by L
A
T
E
X2

, and provided with GNU FDL(GNU Free


Documentation License). Please refer to http://www.gnu.org/copyleft/fdl.html
for detail.
Keywords: Sux Tree
6.2 Introduction
Sux Tree is a special Patricia. There is no such a chapter in CLRS book.
To introduce sux tree together with Trie and Patricia will be a bit easy to
understand.
As a data structure, Sux tree allows for particularly fast implementation
189
190CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
of many important string operations[2]. And it is also widely used in bio-
information area such as DNA pattern matching[3].
The sux tree for a string S is a Patricia tree, with each edges are labeled
with some sub-string of S. Each sux of S corresponds to exactly one path from
root to a leaf. Figure 6.1 shows the sux tree for an English word banana.
anana banana nana
Figure 6.1: The sux tree for banana
Note that all suxes, banana, anana, nana, ana, na, a, can be
looked up in the above tree. Among them the rst 3 suxes are explicitly
shown; others are implicitly represented. The reason for why ana, na, a,
and are not shown explicitly is because they are prexes of some edges. In
order to make all suxes shown explicitly, we can append a special pad terminal
symbol, which is not seen to the string. Such terminator is typically denoted as
$. With this method, no sux will be a prex of the others. In this post, we
wont use terminal symbol for most cases.
Its very interesting that compare to the simple sux tree for banana, the
sux tree for bananas is quite dierent as shown in gure 6.2.
a bananas na s
na s
nas s
nas s
Figure 6.2: The sux tree for bananas
In this post, Ill rst introduce about sux Trie, and give the trivial method
about how to construct sux Trie and tree. Trivial methods utilize the insertion
algorithm for normal Trie and Patricia. They need much of computation and
spaces. Then, Ill explain about the on-line construction for sux Trie by using
sux link concept. After that, Ill show Ukkonens method, which is a linear
time on-line construction algorithm. For both sux Trie and sux tree, func-
tional approach is provided as well as the imperative one. In the last section,
6.3. SUFFIX TRIE 191
Ill list some typical string manipulation problems and show how to solve them
with sux tree.
This article provides example implementation in C, C++, Haskell, Python,
and Scheme/Lisp languages.
All source code can be downloaded in appendix 8.7, please refer to appendix
for detailed information about build and run.
6.3 Sux Trie
Just likes the relationship between Trie and Patricia, Sux Trie has much sim-
pler structure than sux tree. Figure 6.3 shows the sux Trie of banana.
a b n
n
a
n
a
a
n
a
n
a
a
n
a
Figure 6.3: Sux Trie of banana
Compare with gure 6.1, the dierence is that, instead of representing a
word for each edge, one edge in sux Trie only represents one character. Thus
sux Trie need more spaces. If we pack all nodes which has only one child, the
sux Trie can be turned into a sux tree.
Sux Trie can be a good start point for explaining the sux tree construc-
tion algorithm.
192CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
6.3.1 Trivial construction methods of Sux Tree
By repeatedly applying the insertion algorithms[5] for Trie and Patricia on each
suxes of a word, Sux Trie and tree can be built in a trivial way.
Below algorithm illustrates this approach for sux tree.
1: function TRIVIAL-SUFFIX-TREE(S)
2: T NIL
3: for i from 1 to LENGTH(S) do
4: T PATRICIAINSERT(T, RIGHT(S, i))
5: return T
Where function RIGHT(S, i) will extract sub-string of S from left to right
most. Similar functional algorithm can also be provided in this way.
1: function TRIVIAL-SUFFIX-TREE(S)
2: return FOLD LEFT(PATRICIAINSERT, NIL, TAILS(S))
Function TAILS() returns a list of all suxes for string S. In Haskell, Module
Data.List provides this function already. In Scheme/Lisp, it can be implemented
as below.
(define (tails s)
(if (string-null? s)
("")
(cons s (tails (string-tail s 1)))))
The trivial sux Trie/tree construction method takes O(n
2
) time, where n
is the length of the word. Although the string manipulation can be very fast by
using sux tree, slow construction will be the bottleneck of the whole process.
6.3.2 On-line construction of sux Trie
Analysis of construction for sux Trie can be a good start point in nding the
linear time sux tree construction algorithm. In Ukkonens paper[1], nite-
state automation, transition function and sux function are used to build the
mathematical model for sux Trie/tree.
In order to make it easy for understanding, lets explain the above concept
with the elements of Trie data structure.
With a set of alphabetic, a string with length n can be dened as S =
s
1
s
2
...s
n
. And we dene S[i] = s
1
s
2
...s
i
, which contains the rst i characters.
In a sux Trie, each node represents a sux string. for example in gure
6.4, node X represents sux a, by adding a character c, node X transfers
to node Y which represents sux ac. We say node X and edge labeled c
transfers to node Y. This relationship can be denoted in pseudo code as below.
Y CHILDREN(X)[c]
Its equal to the following C++ and Python code.
y = x.children[c]
We also say that node x has a c-child y.
If a node A in a sux Trie represents for a sux s
i
s
i+1
...s
n
, and node B
represents for sux s
i+1
s
i+2
...s
n
, we say node B represents for the sux of
node A. We can create a link from A to B. This link is dened as the sux
link of node A. In this post, we show sux link in dotted style. In gure 6.4,
sux link of node A points to node B, and sux link of node B points to node
6.3. SUFFIX TRIE 193
root
X
a c o
Y
c
C
o
a
A
o
B
a
c o
a
o
Figure 6.4: node X a, node Y ac, X transfers to Y with character c
sux string
s
1
s
2
...s
i
s
2
s
3
...s
i
...
s
i1
s
i
s
i

Table 6.1: suxes for S


i
C. Sux link is an important tool in Ukkonens on-line construction algorithm,
it is also used in some other algorithms running on the sux tree.
on-line construction algorithm for sux Trie
For a string S, Suppose we have construct sux Trie for its i-th prex s
1
s
2
...s
i
.
We denote the sux Trie for this i-th prex as SuffixTrie(S
i
). Lets consider
how can we obtain SuffixTrie(S
i+1
) from SuffixTrie(S
i
).
If we list all suxes corresponding to SuffixTrie(S
i
), from the longest S
i
to the shortest empty string, we get table 6.1. There are total i + 1 suxes.
The most straightforward way is to append s
i+1
to each of the sux in above
table, and add a new empty sux. This operation can be implemented as create
a new node, and append the new node as a child with edge bind to character
s
i+1
.
Algorithm 1 Initial version of update SuffixTrie(S
i
) to SuffixTrie(S
i+1
).
1: for each node in SuffixTrie(S
i
) do
2: CHILDREN(node)[s
i+1
] CREATE NEW NODE()
194CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
However, some node in SuffixTrie(S
i
) may have already s
i+1
-child. For
example, in gure 6.5, Node X and Y are corresponding for sux cac and
ac, they dont have a-child. While node Z, which represents for sux c has
a-child already.
root
a
Z
c
Y
c a
X
c
root
a c
c
a
a
c
a
Figure 6.5: Sux Trie of cac and caca
When we append s
i+1
, in this case it is a, to SuufixTrie(S
i
). We need
create a new node and append the new node to X and Y , however, we neednt
create new node to Z, because node Z has already a child node with edge a.
So SuffixTrie(S
i+1
), in this case it is for caca, is shown in right part of
gure 6.5.
If we check each node as the same order as in table 6.1, we can stop imme-
diately once we nd a node which has a s
i+1
-child. This is because if a node X
in SuffixTrie(S
i
) has already a s
i+1
-child, then according to the denition of
sux link, all sux nodes X

of X in SuffixTrie(S
i
) must also have s
i+1
-child.
In other words, let c = s
i+1
, if wc is a sub-string of S
i
, then every sux of wc
is also a sub-string of S
i
[1]. The only exception is root node, which represents
for empty string .
According to this fact, we can rene the algorithm 1 to
Algorithm 2 Revised version of update SuffixTrie(S
i
) to SuffixTrie(S
i+1
).
1: for each node in SuffixTrie(S
i
) in descending order of sux length do
2: if CHILDREN(node)[s
i+1
] = NIL then
3: CHILDREN(node)[s
i+1
] CREATE NEW NODE()
4: else
5: break
The next unclear question is how to iterate all nodes in SuffixTrie(S
i
) in
descending order of sux string length? We can dene the top of a sux Trie as
6.3. SUFFIX TRIE 195
the deepest leaf node, by using sux link for each node, we can traverse sux
Trie until the root. Note that the top of SuffixTrie(NIL) is root, so we can
get a nal version of on-line construction algorithm for sux Trie.
function INSERT(top, c)
if top = NIL then
top CREATE NEW NODE()
node top
node

CREATE NEW NODE()


while node = NIL CHILDREN(node)[c] = NIL do
CHILDREN(node)[c] CREATE NEW NODE()
SUFFIX LINK(node

) CHILDREN(node)[c]
node

CHILDREN(node)[c]
node SUFFIX LINK(node)
if node = NIL then
SUFFIX LINK(node

) CHILDREN(node)[c]
return CHILDREN(top)[c]
The above function INSERT(), can update SuffixTrie(S
i
) to SuffixTrie(S
i+1
).
It receives two parameters, one is the top node of SuffixTrie(S
i
), the other
is the character of s
i+1
. If the top node is NIL, which means that there is no
root node yet, it create the root node then. Compare to the algorithm given by
Ukkonen [1], I use a dummy node node

to keep tracking the previous created


new node. In the main loop, the algorithm check each node to see if it has s
i+1
-
child, if not, it will create new node, and bind the edge to character s
i+1
. Then
it go up along the sux link until either arrives at root node, or nd a node
which has s
i+1
-child already. After the loop, if the node point to some where
in the Trie, the algorithm will make the last sux link point to that place. The
new top position is returned as the nal result.
and the main part of the algorithm is as below:
1: procedure SUFFIX-TRIE(S)
2: t NIL
3: for i from 1 to LENGTH(S) do
4: t INSERT(t, s
i
)
Figure 6.6 shows the phases of on-line construction of sux Trie for cacao.
Only the last layer of sux links are shown.
According to the sux Trie on-line construction process, the computation
time is proportion to the size of sux Trie. However, in the worse case, this is
O(n
2
), where n = LENGTH(S). One example is S = a
n
b
n
, that there are n
characters of a and n characters of b.
Sux Trie on-line construction program in Python and C++
The above algorithm can be easily implemented with imperative languages such
as C++ and Python. In this section, Ill rst give the detions of the sux
Trie node. After that, Ill show the algorithms. In order to test and verify the
programs, Ill provide some helper functions to print the Trie as human readable
strings and give some look-up function as well.
196CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
root
empty
root
c
c
root
a c
a
ca
root
a
Z
c
Y
c a
X
c
cac
root
a c
c
a
a
c
a
caca
root
X
a c o
Y
c
C
o
a
A
o
B
a
c o
a
o
cacao
Figure 6.6: Construction of sux Trie for cacao, the 6 phases are shown, only
the last layer of sux links are shown in dotted arrow.
6.3. SUFFIX TRIE 197
Denition of sux Trie node in Python
With Python programming language, we can dene the node in sux Trie with
two elds, one is a dictionary of children, the key is the character binding to an
edge, and the value is the child node. The other eld is sux link, it points to
a node which represents for the sux string of this one.
class STrie:
def __init__(self, suffix=None):
self.children = {}
self.suffix = suffix
By default, the sux link for a node is initialized as empty, it will be set
during the main construction algorithm.
Denition of sux Trie node in C++
Sux Trie on-line construction algorithm in Python
The main algorithm of updating SuffixTrie(S
i
) to SuffixTrie(S
i+1
) is as
below. It takes the top position node and the character to be updated as
parameters.
def insert(top, c):
if top is None:
top=STrie()
node = top
new_node = STrie() #dummy init value
while (node is not None) and (c not in node.children):
new_node.suffix = node.children[c] = STrie(node)
new_node = node.children[c]
node = node.suffix
if node is not None:
new_node.suffix = node.children[c]
return top.children[c] #update top
The main entry point of the program iterates all characters of a given string,
and call the insert() function repeatedly.
def suffix_trie(str):
t = None
for c in str:
t = insert(t, c)
return root(t)
Because insert() function returns the updated top position, the program
calls a function to return the root node as the nal result. This function is
implemented as the following.
def root(node):
while node.suffix is not None:
node = node.suffix
return node
It will go along with the sux links until reach the root node.
In order to verify the program, we need convert the sux Trie to human
readable string. This is realized in a recursive way for easy illustration purpose.
198CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
def to_lines(t):
if len(t.children)==0:
return [""]
res = []
for c, tr in sorted(t.children.items()):
lines = to_lines(tr)
lines[0] = "| --"+c+"-"+lines[0]
if len(t.children)>1:
lines[1:] = map(lambda l: "| "+l, lines[1:])
else:
lines[1:] = map(lambda l: ""+l, lines[1:])
if res !=[]:
res.append("| ")
res += lines
return res
def to_str(t):
return "n".join(to_lines(t))
With the to str() helper function, we can test our program with some simple
cases.
class SuffixTrieTest:
def __init__(self):
print "startsuffixtrietest"
def run(self):
self.test_build()
def __test_build(self, str):
print "SuffixTrie("+str+"):n", to_str(suffix_trie(str)),"n"
def test_build(self):
str="cacao"
for i in range(len(str)):
self.__test_build(str[:i+1])
Run this test program will output the below result.
start suffix trie test
Suffix Trie (c):
|--c-->
Suffix Trie (ca):
|--a-->
|
|--c-->|--a-->
Suffix Trie (cac):
|--a-->|--c-->
|
|--c-->|--a-->|--c-->
Suffix Trie (caca):
6.3. SUFFIX TRIE 199
|--a-->|--c-->|--a-->
|
|--c-->|--a-->|--c-->|--a-->
Suffix Trie (cacao):
|--a-->|--c-->|--a-->|--o-->
| |
| |--o-->
|
|--c-->|--a-->|--c-->|--a-->|--o-->
| |
| |--o-->
|
|--o-->
Compare with gure 6.6, we can nd that the results are identical.
Sux Trie on-line construction algorithm in C++
With ISO C++, we dene the sux Trie node as a struct.
struct Node{
typedef std::string::value_type Key;
typedef std::map<Key, Node> Children;
Node(Node suffix_link=0):suffix(suffix_link){}
~Node(){
for(Children::iterator it=children.begin();
it!=children.end(); ++it)
delete itsecond;
}
Children children;
Node suffix;
};
The dierence between a standard Trie node denition is the sux link
member pointer.
The insert function will updating from the top position of the sux Trie.
Node insert(Node top, Node::Key c){
if(!top)
top = new Node();
Node dummy;
Node node(top), prev(&dummy);
while(node && (nodechildren.find(c)==nodechildren.end())){
nodechildren[c] = new Node(node);
prevsuffix = nodechildren[c];
node = nodesuffix;
}
if(node)
prevsuffix = nodechildren[c];
return topchildren[c];
}
200CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
If the top points to null pointer, it means the Trie hasnt been initialized
yet. Instead of using sentinel as Ukkonen did in his paper, I explicitly test if
the loop can be terminated when it goes back along the sux link to the root
node. A dummy node is used for simplify the logic. At the end of the program,
the new top position is returned.
In order to nd the root node of a sux Trie, a helper function is provided
as below.
Node root(Node node){
for(; nodesuffix; node=nodesuffix);
return node;
}
The main entry for the sux Trie on-line construction is dened like the
following.
Node suffix_trie(std::string s){
return root(std::accumulate(s.begin(), s.end(), (Node)0,
std::ptr_fun(insert)));
}
This C++ program will generate the same result as the Python program,
the output/printing part is skipped here. Please refer to section 6.4.1 for the
details about how to convert sux Trie to string.
6.3.3 Alternative functional algorithm
Because functional approach isnt suitable for on-line updating. Ill provide a
declarative style sux Trie build algorithm in later section together with sux
tree building algorithm.
6.4 Sux Tree
Sux Trie is helpful when study the on-line construction algorithm. However,
instead of sux Trie, sux tree is commonly used in real world. The above
sux Trie on-line construction algorithm is O(n
2
), and need a lot of memory
space. One trivial solution is to compress the sux Trie to sux tree[6], but it
is possible to nd much better method than it.
In this section, an O(n) on-line construction algorithm for sux tree is
introduced based on Ukkonens work[1].
6.4.1 On-line construction of sux tree
Active point and end point
Although the sux Trie construction algorithm is O(n
2
), it shows very impor-
tant facts about what happens when SuffixTrie(S
i
) is updated to SuffixTrie(S
i+1
).
Lets review the last 2 trees when we construct for cacao.
We can nd that there are two dierent types of updating.
1. All leaves are appended with a new node of s
i+1
-child;
2. Some non-leaf nodes are branched out with a new node of s
i+1
-child.
6.4. SUFFIX TREE 201
The rst type of updating is trivial, because for all new coming characters,
we need do this trivial work anyway. Ukkonen dened leaf as open node.
The second type of updating is important. We need gure out which internal
nodes need to be branched out. We only focus on these nodes and apply our
updating.
Lets review the main algorithm for sux Trie. We start from top position
of a Trie, process and go along with sux link. If a node hasnt s
i+1
-child, we
create a new child, then update the sux link and go on traverse with sux
link until we arrive at a node which has s
i+1
-child already or it is root node.
Ukkonen dened the path along sux link from top to the end as boundary
path. All nodes in boundary path are denoted as, n
1
, n
2
, ..., n
j
, ..., n
k
. These
nodes start from leaf node (the rst one is the top position), after the j-th node,
they are not leaves any longer, we need do branching from this time point until
we stop at the k-th node.
Ukkonen dened the rst none-leaf node n
j
as active point and the last one
n
k
as end point. Please note that end point can be the root node.
Reference pair
In sux Trie, we dene a node X transfer to node Y by edge labeled with a
character c as Y CHILDREN(X)[c], However, If we compress the Trie to
Patricia, we cant use this transfer concept anymore.
X
a bananas na s
Y
na s
nas s
nas s
Figure 6.7: Sux tree of bananas. X transfer to Y with sub-string na.
Figure 6.7 shows the sux tree of English word bananas. Node X repre-
sents for sux a, by adding a sub-string na, node X transfers to node Y ,
which represents for sux ana. Such transfer relationship can be denoted like
Y CHILDREN(X)[w], where w = ana

. In other words, we can represent


Y with a pair of node and a sub-string, like (X, w). Ukkonen dened such kind
of pair as reference pair. Not only the explicit node, but also the implicit posi-
tion in sux tree can be represented with reference pair. For example, (X, n

)
represents to a position which is not an explicit node. By using reference pair,
we can represent every position in a sux Trie for sux tree.
In order to save spaces, Ukkonen found that given a string S all sub-strings
202CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
can be represented as a pair of index (l, r), where l is the left index and r is the
right index of character for the sub-string. For instance, if S = bananas

, and
the index from 1, sub-string na can be represented with pair (3, 4). As the
result, there will be only one copy of the complete string, and all position in a
sux tree will be rened as (node, (l, r)). This is the nal form for reference
pair.
Lets dene the node transfer for sux tree as the following.
CHILDREN(X)[s
l
] ((l, r), Y ) Y (X, (l, r))
If s
i
= c, we say that node X has a c-child. Each node can have at most one
c-child.
canonical reference pair
Its obvious that the one position in a sux tree has multiple reference pairs.
For example, node Y in Figure 6.7 can be either denoted as (X, (3, 4)) or
(root, (2, 4)). And if we dene empty string = (i, i 1), Y can also be repre-
sented as (Y, ).
Ukkonen dened the canonical reference pair as the one which has the clos-
est node to the position. So among the reference pairs of (root, (2, 3)) and
(X, (3, 3)), the latter is the canonical reference pair. Specially, in case a posi-
tion is an explicit node, the canonical reference pair is (node, ), so (Y, ) is the
canonical reference pair of position corresponding to node Y .
Its easy to provide an algorithm to convert a reference pair (node, (l, r)) to
canonical reference pair (node

, (l

, r)). Note that r wont be changed, so this


algorithm can only return (node

, l

) as the result.
Algorithm 3 Convert reference pair to canonical reference pair
1: function CANONIZE(node, (l, r))
2: if node = NIL then
3: if (l, r) = then
4: return (NIL, l)
5: else
6: return CANONIZE(root, (l + 1, r))
7: while l r do
8: ((l

, r

), node

) CHILDREN(node)[s
l
]
9: if r l r

then
10: l l +LENGTH(l

, r

)
11: node node

12: else
13: break
14: return (node, l)
In case the node parameter is NIL, it means a very special case, typically
it is something like the following.
CANONIZE(SUFFIX LINK(root), (l, r))
Because the sux link of root points to NIL, the result should be (root, (l +
1, r)) if (l, r) is not . Else (NIL, ) is returned to indicate a terminal position.
Ill explain this special case in more detail later.
6.4. SUFFIX TREE 203
The algorithm
In 6.4.1, we mentioned, all updating to leaves is trivial, because we only need
append the new coming character to the leaf. With reference pair, it means,
when we update SuffixTree(S
i
) to SuffixTree(S
i+1
), For all reference pairs
with form (node, (l, i)), they are leaves, they will be change to (node, (l, i + 1))
next time. Ukkonen dened leaf as (node, (l, )), here means open to
grow. We can omit all leaves until the sux tree is completely constructed.
After that, we can change all to the length of the string.
So the main algorithm only cares about positions from active point to end
point. However, how to nd the active point and end point?
When we start from the very beginning, there is only a root node, there are
no branches nor leaves. The active point should be (root, ), or (root, (1, 0))
(the string index start from 1).
About the end point, its a position we can nish updating SuffixTree(S
i
).
According to the algorithm for sux Trie, we know it should be a position which
has s
i+1
-child already. Because a position in sux Trie may not be an explicit
node in sux tree. If (node, (l, r)) is the end point, there are two cases.
1. (l, r) = , it means node itself an end point, so node has a s
i+1
-child.
Which means CHILDREN(node)[s
i+1
] = NIL
2. otherwise, l r, end point is an implicit position. It must satisfy s
i+1
=
s
l

+|(l,r)|
, where CHILDREN(node)[s
l
] = ((l

, r

), node

). and |(l, r)|


means the length of string (l, r). it is equal to r l + 1. This is illus-
trate in gure 6.8. We can also say that (node, (l, r)) has a s
i+1
-child
implicitly.
Figure 6.8: Implicit end point
Ukkonen found a very important fact that if the (node, (l, i)) is the end point
of SuffixTree(S
i
), then (node, (l, i +1)) is the active point SuffixTree(S
i+1
).
This is because if (node, (l, i)) is the end point of SuffixTree(S
i
), It must
have a s
i+1
-child (either explicitly or implicitly). If the sux this end point
represents is s
k
s
k+1
...s
i
, it is the longest sux in SuffxiTree(S
i
) which sat-
ises s
k
s
k+1
...s
i
s
i+1
is a sub-string of S
i
. Consider S
i+1
, s
k
s
k+1
...s
i
s
i+1
must
204CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Figure 6.9: End point in SuffixTree(S
i
) and active point in
SuffixTree(S
i+1
).
occurs at least twice in S
i+1
, so position (node, (l, i + 1)) is the active point of
SuffixTree(S
i+1
). Figure 6.9 shows about this truth.
At this time point, the algorithm of Ukkonens on-line construction can be
given as the following.
1: function UPDATE(node, (l, i))
2: prev CREATE NEW NODE()
3: while TRUE do
4: (finish, node

) ENDPOINT BRANCH?(node, (l, i 1), s


i
)
5: if finish = TRUE then
6: BREAK
7: CHILDREN(node

)[s
i
] ((i, ), CREATENEW NODE())
8: SUFFIX LINK(prev) node

9: prev node

10: (node, l) = CANONIZE(SUFFIX LINK(node), (l, i 1))


11: SUFFIX LINK(prev) node
12: return (node, l)
This algorithm takes a parameter as reference pair (node, (l, i)), note that
position (node, (l, i1) is the active point for SuffixTree(S
i1
). Next we enter
a loop, this loop will go along with the sux link until it found the current
position (node, (l, i 1)) is the end point. If it is not end point, the function
END POINT BRANCH?() will return a position, from where the new
leaf node will be branch out.
END POINT BRANCH?() algorithm is implemented as below.
function END-POINT-BRANCH?(node, (l, r), c)
if (l, r) = then
6.4. SUFFIX TREE 205
if node = NIL then
return (TRUE, root)
else
return (CHILDREN(node)[c] = NIL?, node)
else
((l

, r

), node

) CHILDREN(node)[s
l
]
pos l

+|(l, r)|
if s
pos
= c then
return (TRUE, node)
else
p CREATE NEW NODE()
CHILDREN(node)[s
l
] ((l

, pos 1), p)
CHILDREN(p)[s
pos
] ((pos, r

), node

)
return (FALSE, p)
If the position is (root, ), which means we have gone along sux links to the
root, we return TRUE to indicate the updating can be nished for this round.
If the position is in form of (node, ), it means the reference pair represents an
explicit node, we just test if this node has already c-child, where c = s
i
. and if
not, we can just branching out a leaf from this node.
In other case, which means the position (node, (l, r)) points to a implicit
node. We need nd the exact position next to it to see if it is c-child implicitly.
If yes, we meet a end point, the updating loop can be nished, else, we make
the position an explicit node, and return it for further branching.
With the previous dened CANONIZE() function, we can nalize the
Ukkonens algorithm.
1: function SUFFIX-TREE(S)
2: root CREATE NEW NODE()
3: node root, l 0
4: for i 1 to LENGTH(S) do
5: (node, l) = UPDATE(node, (l, i))
6: (node, l) = CANONIZE(node, (l, i))
7: return root
Figure 6.10 shows the phases when constructing the sux tree for string
cacao with Ukkonens algorithm.
Note that it neednt setup sux link for leaf nodes, only branch nodes have
been set sux links.
Implementation of Ukkonens algorithm in imperative languages
The 2 main features of Ukkonens algorithm are intense use of sux link and
on-line update. So it will be very suitable to implement in imperative language.
Ukkonens algorithm in Python
The node denition is as same as the sux Trie, however, the exact meaning
for children eld are not same.
class Node:
def __init__(self, suffix=None):
self.children = {} # c:(word, Node), where word = (l, r)
206CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
root
empty
c
c
a ca
ca
ac cac
cac
aca caca
caca
a ca o
cao o cao o
cacao
Figure 6.10: Construction of sux tree for cacao, the 6 phases are shown,
only the last layer of sux links are shown in dotted arrow.
self.suffix = suffix
The children for sux tree actually represent to the node transition with
reference pair. if the transition is CHILDREN(node)[s
l
] ((l, r)node

), The
key type of the children is the character, which is corresponding to s
l
, the data;
the data type of the children is the reference pair.
Because there is only one copy of the complete string, all sub-strings are
represent in (left, right) pairs, and the leaf are open pairs as (left, ), so we
provide a tree denition in Python as below.
class STree:
def __init__(self, s):
self.str = s
self.infinity = len(s)+1000
self.root = Node()
The innity is dened as the length of the string plus a big number, well
benet from pythons list[a:b] expression that if the right index exceed to the
length of the list, the result is from left to the end of the list.
For convenience, I provide 2 helper functions for later use.
def substr(str, str_ref):
(l, r)=str_ref
return str[l:r+1]
def length(str_ref):
(l, r)=str_ref
return r-l+1
The main entry for Ukkonens algorithm is implemented as the following.
def suffix_tree(str):
t = STree(str)
node = t.root # init active point is (root, Empty)
6.4. SUFFIX TREE 207
l = 0
for i in range(len(str)):
(node, l) = update(t, node, (l, i))
(node, l) = canonize(t, node, (l, i))
return t
In the main entry, we initialize the tree and let the node points to the root,
at this time point, the active point is (root, ), which is (root, (0, -1)) in Python.
we pass the active point to update() function in a loop from the left most index
to the right most index of the string. Inside the loop, update() function returns
the end point, and we need convert it to canonical reference pair for the next
time update.
the update() function is realized like the following.
def update(t, node, str_ref):
(l, i) = str_ref
c = t.str[i] # current char
prev = Node() # dummy init
while True:
(finish, p) = branch(t, node, (l, i-1), c)
if finish:
break
p.children[c]=((i, t.infinity), Node())
prev.suffix = p
prev = p
# go up along suffix link
(node, l) = canonize(t, node.suffix, (l, i-1))
prev.suffix = node
return (node, l)
Dierent with Ukkonens original program, I didnt use sentinel node. The
reference passed in is (node, (l, i), the active point is (node, (l, i1)) actually, we
passed the active point to branch() function. If it is end point, branch() function
will return true as the rst element in the result. we then terminate the loop
immediately. Otherwise, branch() function will return the node which need to
branch out a new leaf as the second element in the result. The program then
create the new leaf, set it as open pair, and then go up along with sux link.
The prev variable rst point to a dummy node, this can simplify the logic, and
it used to record the position along the boundary path. by the end of the loop,
well nish the last updating of the sux link and return the end point. Since
the end point is always in form of (node, (l, i-1)), only (node, l) is returned.
Function branch() is used to test if a position is the end point and turn the
implicit node to explicit node if necessary.
def branch(t, node, str_ref, c):
(l, r) = str_ref
if length(str_ref)0: # (node, empty)
if node is None: #_| _
return (True, t.root)
else:
return ((c in node.children), node)
else:
((l1, r1), node1) = node.children[t.str[l]]
pos = l1+length(str_ref)
208CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
if t.str[pos]==c:
return (True, node)
else: # node--branch_node--node1
branch_node = Node()
node.children[t.str[l1]]=((l1, pos-1), branch_node)
branch_node.children[t.str[pos]] = ((pos, r1), node1)
return (False, branch_node)
Because I dont use sentinel node, the special case is handled in the rst
if-clause.
The canonize() function helps to convert a reference pair to canonical refer-
ence pair.
def canonize(t, node, str_ref):
(l, r) = str_ref
if node is None:
if length(str_ref)0:
return (None, l)
else:
return canonize(t, t.root, (l+1, r))
while lr: # str_ref is not empty
((l1, r1), child) = node.children[t.str[l]] # node--(l, r)-child
if r-l r1-l1: #node--(l,r)-child--...
l += r1-l1+1 # remove | (l,r)| chars from (l, r)
node = child
else:
break
return (node, l)
Before testing the sux tree construction algorithm, some helper functions
to convert the sux tree to human readable string are given.
def to_lines(t, node):
if len(node.children)==0:
return [""]
res = []
for c, (str_ref, tr) in sorted(node.children.items()):
lines = to_lines(t, tr)
edge_str = substr(t.str, str_ref)
lines[0] = "| --"+edge_str+"-"+lines[0]
if len(node.children)>1:
lines[1:] = map(lambda l: "| "+""(len(edge_str)+5)+l, lines[1:])
else:
lines[1:] = map(lambda l: ""+""(len(edge_str)+6)+l, lines[1:])
if res !=[]:
res.append("| ")
res += lines
return res
def to_str(t):
return "n".join(to_lines(t, t.root))
They are quite similar to the helper functions for sux Trie print. The
dierent part is mainly cause by the reference pair of string.
In order to verify the implementation, some very simple test cases are feed
to the algorithm as below.
6.4. SUFFIX TREE 209
class SuffixTreeTest:
def __init__(self):
print "startsuffixtreetest"
def run(self):
strs = ["cacao", "mississippi", "banana$"] #$ special terminator
for s in strs:
self.test_build(s)
def test_build(self, str):
for i in range(len(str)):
self.__test_build(str[:i+1])
def __test_build(self, str):
print "SuffixTree("+str+"):n", to_str(suffix_tree(str)),"n"
Here is a result snippet which shows the construction phases for string ca-
cao
Suffix Tree (c):
|--c-->
Suffix Tree (ca):
|--a-->
|
|--ca-->
Suffix Tree (cac):
|--ac-->
|
|--cac-->
Suffix Tree (caca):
|--aca-->
|
|--caca-->
Suffix Tree (cacao):
|--a-->|--cao-->
| |
| |--o-->
|
|--ca-->|--cao-->
| |
| |--o-->
|
|--o-->
The result is identical to the one which is shown in gure 6.10.
210CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Ukkonens algorithm in C++
Ukkonens algorithm has much use of pairs, including reference pair and sub-
string index pair. Although STL provided std::pair tool, it is lack of variable
binding ability, for example (x, y)=apair like assignment isnt legal C++ code.
boost::tuple provides a handy tool, tie(). Ill give a mimic tool like boost::tie so
that we can bind two variables to a pair.
template<typename T1, typename T2>
struct Bind{
Bind(T1& r1, T2& r2):x1(r1), x2(r2){}
Bind(const Bind& r):x1(r.x1), x2(r.x2){}
// Support implicit type conversion
template<typename U1, typename U2>
Bind& operator=(const std::pair<U1, U2>& p){
x1 = p.first;
x2 = p.second;
return this;
}
T1& x1;
T2& x2;
};
template<typename T1, typename T2>
Bind<T1, T2> tie(T1& r1, T2& r2){
return Bind<T1, T2>(r1, r2);
}
With this tool, we can tie variables like the following.
int l, r;
tie(l, r) = str_pair;
We dene sub-string index pair and reference pair like below. First is string
index reference pair.
struct StrRef: public std::pair<int, int>{
typedef std::pair<int, int> Pair;
static std::string str;
StrRef():Pair(){}
StrRef(int l, int r):Pair(l, r){}
StrRef(const Pair& ref):Pair(ref){}
std::string substr(){
int l, r;
tie(l, r) = this;
return str.substr(l, len());
}
int len(){
int l, r;
tie(l, r) = this;
return r-l+1;
}
};
6.4. SUFFIX TREE 211
std::string StrRef::str="";
Because there is only one copy of the complete string, a static variable is
used to store it. substr() function is used to convert a pair of left, right index
into the sub-string. Function len() is used to calculate the length of a sub-string.
Ukkonens reference pair is dened in the same way.
struct Node;
struct RefPair: public std::pair<Node, StrRef>{
typedef std::pair<Node, StrRef> Pair;
RefPair():Pair(){}
RefPair(Node n, StrRef s):Pair(n, s){}
RefPair(const Pair& p):Pair(p){}
Node node(){ return first; }
StrRef str(){ return second; }
};
With these denition, the node type of sux tree can be dened.
struct Node{
typedef std::string::value_type Key;
typedef std::map<Key, RefPair> Children;
Node():suffix(0){}
~Node(){
for(Children::iterator it=children.begin();
it!=children.end(); ++it)
delete itsecond.node();
}
Children children;
Node suffix;
};
The children of a node is dened as a map with reference pairs stored. In
order for easy memory management, a recursive approach is used.
The nal sux tree is dened with a string and a root node.
struct STree{
STree(std::string s):str(s),
infinity(s.length()+1000),
root(new Node)
{ StrRef::str = str; }
~STree() { delete root; }
std::string str;
int infinity;
Node root;
};
the innity is dened as the length of the string plus a big number. Innity
will be used for leaf node as the open to append meaning.
Next is the main entry of Ukkonens algorithm.
STree suffix_tree(std::string s){
212CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
STree t=new STree(s);
Node node = troot; // init active point as (root, empty)
for(unsigned int i=0, l=0; i<s.length(); ++i){
tie(node, l) = update(t, node, StrRef(l, i));
tie(node, l) = canonize(t, node, StrRef(l, i));
}
return t;
}
The program start from the initialized active point, and repeatedly call up-
date(), the returned end point will be canonized and used for the next active
point.
Function update() is implemented as below.
std::pair<Node, int> update(STree t, Node node, StrRef str){
int l, i;
tie(l, i)=str;
Node::Key c(tstr[i]); //current char
Node dummy, p;
Node prev(&dummy);
while((p=branch(t, node, StrRef(l, i-1), c))!=0){
pchildren[c]=RefPair(new Node(), StrRef(i, tinfinity));
prevsuffix = p;
prev = p;
// go up along suffix link
tie(node, l) = canonize(t, nodesuffix, StrRef(l, i-1));
}
prevsuffix = node;
return std::make_pair(node, l);
}
In this function, pair (node, (l, i-1)) is the real active point position. It is
fed to branch() function. If the position is end point, branch will return NULL
pointer, so the while loop terminates. else a node for branching out a new leaf
is returned. Then the program will go up along with sux links and update the
previous sux link accordingly. The end point will be returned as the result of
this function.
Function branch() is implemented as the following.
Node branch(STree t, Node node, StrRef str, Node::Key c){
int l, r;
tie(l, r) = str;
if(str.len()0){ // (node, empty)
if(node && nodechildren.find(c)==nodechildren.end())
return node;
else
return 0; // either node is empty (_| _), or is EP
}
else{
RefPair rp = nodechildren[tstr[l]];
int l1, r1;
tie(l1, r1) = rp.str();
int pos = l1+str.len();
if(tstr[pos]==c)
return 0;
6.4. SUFFIX TREE 213
else{ // node--branch_node--node1
Node branch_node = new Node();
nodechildren[tstr[l1]]=RefPair(branch_node, StrRef(l1, pos-1));
branch_nodechildren[tstr[pos]] = RefPair(rp.node(), StrRef(pos, r1));
return branch_node;
}
}
}
If the position is (NULL, empty), it means the program arrive at the root
position, NULL pointer is returned to indicate the updating can be terminated.
If the position is in form of (node, ), it then check if the node has already a
s
i
-child. In other case, it means the position point to a implicit node, we need
extra process to test if it is end point. If not, a splitting happens to convert the
implicit node to explicit one.
The function to canonize a reference pair is given below.
std::pair<Node, int> canonize(STree t, Node node, StrRef str){
int l, r;
tie(l, r)=str;
if(!node)
if(str.len()0)
return std::make_pair(node, l);
else
return canonize(t, troot, StrRef(l+1, r));
while(lr){ //str isntempty
RefPairrp=nodechildren[tstr[l]];
intl1,r1;
tie(l1,r1)=rp.str();
if(r-lr1-l1){
l+=rp.str().len();//removelen()from(l,r)
node=rp.node();
}
else
break;
}
returnstd::make_pair(node,l);
}
In order to test the program, some helper functions are provided to represent
the sux tree as string. Among them, some are very common tools.
// map (x+) coll in Haskell
// boost lambda: transform(first, last, first, x+_1)
template<class Iter, class T>
void map_add(Iter first, Iter last, T x){
std::transform(first, last, first,
std::bind1st(std::plus<T>(), x));
}
// x ++ y in Haskell
template<class Coll>
void concat(Coll& x, Coll& y){
std::copy(y.begin(), y.end(),
std::insert_iterator<Coll>(x, x.end()));
}
214CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
map add() will add a value to every element in a collection. concat can
concatenate tow collections together.
the sux tree to string function is nally provided like this.
std::list<std::string> to_lines(Node node){
typedef std::list<std::string> Result;
Result res;
if(nodechildren.empty()){
res.push_back("");
return res;
}
for(Node::Children::iterator it = nodechildren.begin();
it!=nodechildren.end(); ++it){
RefPair rp = itsecond;
Result lns = to_lines(rp.node());
std::string edge = rp.str().substr();
lns.begin() = "| --" + edge + "-" + (lns.begin());
map_add(++lns.begin(), lns.end(),
std::string("| ")+std::string(edge.length()+5, ));
if(!res.empty())
res.push_back("| ");
concat(res, lns);
}
return res;
}
std::string to_str(STree t){
std::list<std::string> ls = to_lines(troot);
std::ostringstream s;
std::copy(ls.begin(), ls.end(),
std::ostream_iterator<std::string>(s, "n"));
return s.str();
}
After that, the program can be veried by some simple test cases.
class SuffixTreeTest{
public:
SuffixTreeTest(){
std::cout<<"Startsuffixtreetestn";
}
void run(){
test_build("cacao");
test_build("mississippi");
test_build("banana$"); //$ as special terminator
}
private:
void test_build(std::string str){
for(unsigned int i=0; i<str.length(); ++i)
test_build_step(str.substr(0, i+1));
}
void test_build_step(std::string str){
STree t = suffix_tree(str);
std::cout<<"SuffixTree("<<str<<"):n"
6.4. SUFFIX TREE 215
<<to_str(t)<<"n";
delete t;
}
};
Below are snippet of sux tree construction phases.
Suffix Tree (b):
|--b-->
Suffix Tree (ba):
|--a-->
|
|--ba-->
Suffix Tree (ban):
|--an-->
|
|--ban-->
|
|--n-->
Suffix Tree (bana):
|--ana-->
|
|--bana-->
|
|--na-->
Suffix Tree (banan):
|--anan-->
|
|--banan-->
|
|--nan-->
Suffix Tree (banana):
|--anana-->
|
|--banana-->
|
|--nana-->
Suffix Tree (banana$):
|--$-->
|
|--a-->|--$-->
| |
| |--na-->|--$-->
| | |
| | |--na$-->
216CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
|
|--banana$-->
|
|--na-->|--$-->
| |
| |--na$-->
Functional algorithm for sux tree construction
Ukkonens algorithm is in a manner of on-ling updating and sux link plays
very important role. Such properties cant be realized in a functional approach.
Giegerich and Kurtz found Ukkonens algorithm can be transformed to Mc-
Creights algorithm[7]. These two and the algorithm found by Weiner are all
O(n)-algorithms. They also conjecture (although it isnt proved) that any se-
quential sux tree construction not base on the important concepts, such as
sux links, active suxes, etc., will fail to meet O(n)-criterion.
There is implemented in PLT/Scheme[10] based on Ukkonens algorithm,
However it updates sux links during the processing, so it is not pure functional
approach.
A lazy sux tree construction method is discussed in [8]. And this method
is contribute to Haskell Hackage by Bryan OSullivan. [9].
This method benet from the lazy evaluation property of Haskell program-
ming languages, so that the tree wont be constructed until it is traversed.
However, I think it is still a kind of brute-force method. In other functional
programming languages such as ML. It cant be O(n) algorithm.
I will provide a pure brute-force implementation which is similar but not
100% same in this post.
Brute-force sux tree construction in Haskell
For brute-force implementation, we neednt sux link at all. The denition of
sux node is plain straightforward.
data Tr = Lf | Br [(String, Tr)] deriving (Eq, Show)
type EdgeFunc = [String](String, [String])
The edge function plays interesting role. it takes a list of strings, and extract
a prex of these strings, the prex may not be the longest prex, and empty
string is also possible. Whether extract the longest prex or just return them
trivially can be dened by dierent edge functions.
For easy implementation, we limit the character set as below.
alpha = [a..z]++[A..Z]
This is only for illustration purpose, only the English lower case and upper
case letters are included. We can of course includes other characters if necessary.
The core algorithm is given in list comprehension style.
lazyTree::EdgeFunc [String] Tr
lazyTree edge = build where
build [[]] = Lf
build ss = Br [(a:prefix, build ss) |
aalpha,
xs@(x:_) [[cs | c:csss, c==a]],
6.4. SUFFIX TREE 217
(prefix, ss)[edge xs]]
lazyTree function takes a list of string, it will generate a radix tree (for
example a Trie or a Patricia) from these string.
It will categorize all strings with the rst letter in several groups, and remove
the rst letter for each elements in every group. For example, for the string list
[acac, cac, ac, c] the categorized group will be [(a, [cac, c]), (c,
[ac, ])]. For easy understanding, I left the rst letter, and write the groups
as tuple. then all strings with same rst letter (removed) will be fed to edge
function.
Dierent edge function produce dierent radix trees. The most trivial one
will build a Trie.
edgeTrie::EdgeFunc
edgeTrie ss = ("", ss)
If the edge function extract the longest common prex, then it will build a
Patricia.
-- ex:
-- edgeTree ["an", "another", "and"] = ("an", ["", "other", "d"])
-- edgeTree ["bool", "foo", "bar"] = ("", ["bool", "foo", "bar"])
--
-- some helper comments
-- let awss@((a:w):ss) = ["an", "another", "and"]
-- (a:w) = "an", ss = ["another", "and"]
-- a=a, w="n"
-- rests awss = w:[u| _:uss] = ["n", "nother", "nd"]
--
edgeTree::EdgeFunc
edgeTree [s] = (s, [[]])
edgeTree awss@((a:w):ss) | null [c| c:_ss, a/=c] = (a:prefix, ss)
| otherwise = ("", awss)
where (prefix, ss) = edgeTree (w:[u| _:uss])
edgeTree ss = ("", ss) -- (a:w):ss cant be match head ss == ""
We can build sux Trie and sux tree with the above two functions.
suffixTrie::StringTr
suffixTrie = lazyTree edgeTrie tails -- or init tails
suffixTree::StringTr
suffixTree = lazyTree edgeTree tails
Below snippet is the result of constructing sux Trie/tree for string mis-
sissippi.
SuffixTree("mississippi")=Br [("i",Br [("ppi",Lf),("ssi",Br [("ppi",Lf),
("ssippi",Lf)])]),("mississippi",Lf),("p",Br [("i",Lf),("pi",Lf)]),("s",
Br [("i",Br [("ppi",Lf),("ssippi",Lf)]),("si",Br [("ppi",Lf),("ssippi",
Lf)])])]
SuffixTrie("mississippi")=Br [("i",Br [("p",Br [("p",Br [("i",Lf)])]),
("s",Br [("s",Br [("i",Br [("p",Br [("p",Br [("i",Lf)])]),("s",Br [("s",
Br [("i",Br [("p",Br [("p",Br [("i",Lf)])])])])])])])])]),("m",Br [("i",
Br [("s",Br [("s",Br [("i",Br [("s",Br [("s",Br [("i",Br [("p",Br [("p",
Br [("i",Lf)])])])])])])])])])]),("p",Br [("i",Lf),("p",Br [("i",Lf)])]),
218CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
("s",Br [("i",Br [("p",Br [("p",Br [("i",Lf)])]),("s",Br [("s",Br [("i",
Br [("p",Br [("p",Br [("i",Lf)])])])])])]),("s",Br [("i",Br [("p",Br
[("p",Br [("i",Lf)])]),("s",Br [("s",Br [("i",Br [("p",Br [("p",Br
[("i",Lf)])])])])])])])])]
Function lazyTree is common for all radix trees, the normal Patricia and
Trie can also be constructed with it.
trie::[String]Tr
trie = lazyTree edgeTrie
patricia::[String]Tr
patricia = lazyTree edgeTree
Lets test it with some simple cases.
trie ["zoo", "bool", "boy", "another", "an", "a"]
patricia ["zoo", "bool", "boy", "another", "an", "a"]
The results are as below.
Br [("a",Br [("n",Br [("o",Br [("t",Br [("h",Br [("e",
Br [("r",Lf)])])])])])]),("b",Br [("o",Br [("o",Br [("l",Lf)]),
("y",Lf)])]),("z",Br [("o",Br [("o",Lf)])])]
Br [("another",Lf),("bo",Br [("ol",Lf),("y",Lf)]),("zoo",Lf)]
This is reason why I think the method is brute-force.
Brute-force sux tree construction in Scheme/Lisp
.
The Functional implementation in Haskell utilizes list comprehension, which
is a handy syntax tool. In Scheme/Lisp, we use functions instead.
In MIT scheme, there are special functions to manipulate strings, which is
a bit dierent from list. Below are helper functions to simulate car and cdr
function for string.
(define (string-car s)
(if (string=? s "")
""
(string-head s 1)))
(define (string-cdr s)
(if (string=? s "")
""
(string-tail s 1)))
The edge functions will extract common prex from a list of strings. For
Trie, only the rst common character will be extracted.
;; (edge-trie ("an" "another" "and"))
;; = ("a" "n" "nother" "nd")
(define (edge-trie ss)
(cons (string-car (car ss)) (map string-cdr ss)))
While for sux tree, we need extract the longest common prex.
6.4. SUFFIX TREE 219
;; (edge-tree ("an" "another" "and"))
;; = ("an" "" "other" "d")
(define (edge-tree ss)
(cond ((= 1 (length ss)) (cons (car ss) ()))
((prefix? ss)
(let ((res (edge-tree (map string-cdr ss)))
(prefix (car res))
(ss1 (cdr res)))
(cons (string-append (string-car (car ss)) prefix) ss1)))
(else (cons "" ss))))
;; test if a list of strings has common prefix
;; (prefix ("an" "another" "and")) = true
;; (prefix ("" "other" "d")) = false
(define (prefix? ss)
(if (null? ss)
()
(let ((c (string-car (car ss))))
(null? (filter (lambda (s) (not (string=? c (string-car s))))
(cdr ss))))))
For some old version of MIT scheme, there isnt denition for partition
function, so I dened one like below.
;; overwrite the partition if not support SRFI 1
;; (partition (> 5) (1 6 2 7 3 9 0))
;; =((6 7 9) 1 2 3 0)
(define (partition pred lst)
(if (null? lst)
(cons () ())
(let ((res (partition pred (cdr lst))))
(if (pred (car lst))
(cons (cons (car lst) (car res)) (cdr res))
(cons (car res) (cons (car lst) (cdr res)))))))
Function groups can group a list of strings based on the rst letter of each
string.
;; group a list of strings based on first char
;; ss shouldnt contains "" string, so filter should be done first.
;; (groups ("an" "another" "bool" "and" "bar" "c"))
;; = (("an" "another" "and") ("bool" "bar") ("c"))
(define (groups ss)
(if (null? ss)
()
(let ((c (string-car (car ss)))
(res (partition (lambda (x) (string=? c (string-car x))) (cdr ss))))
(append (list (cons (car ss) (car res)))
(groups (cdr res))))))
Function remove-empty can remove the empty string from the string list.
(define (remove-empty ss)
(filter (lambda (s) (not (string=? "" s))) ss))
With all the above tools, the core brute-force algorithm can be implemented
like the following.
220CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
(define (make-tree edge ss)
(define (bld-group grp)
(let ((res (edge grp))
(prefix (car res))
(ss1 (cdr res)))
(cons prefix (make-tree edge ss1))))
(let ((ss1 (remove-empty ss)))
(if (null? ss1) ()
(map bld-group (groups ss1)))))
The nal sux tree and sux Trie construction algorithms can be given.
(define (suffix-tree s)
(make-tree edge-tree (tails s)))
(define (suffix-trie s)
(make-tree edge-trie (tails s)))
Below snippet are quick verication of this program.
(suffix-trie "cacao")
;Value 66: (("c" ("a" ("c" ("a" ("o"))) ("o"))) ("a" ("c" ("a" ("o"))) ("o")) ("o"))
(suffix-tree "cacao")
;Value 67: (("ca" ("cao") ("o")) ("a" ("cao") ("o")) ("o"))
6.5 Sux tree applications
Sux tree can help to solve a lot of string/DNA manipulation problems partic-
ularly fast. For typical problems will be list in this section.
6.5.1 String/Pattern searching
There a plenty of string searching problems, among them includes the famous
KMP algorithm. Sux tree can perform same level as KMP[11], that string
searching in O(m) complexity, where m is the length of the sub-string.However,
O(n) time is required to build the sux tree in advance[12].
Not only sub-string searching, but also pattern matching, including regular
expression matching can be solved with sux tree. Ukkonen summarize this
kind of problems as sub-string motifs, and he gave the result that For a string
S, SuffixTree(S) gives complete occurrence counts of all sub-string motifs of
S in O(n) time, although S may have O(n
2
) sub-strings.
Note the facts of a SuffixTree(S) that all internal nodes is corresponding
to a repeating sub-string of S and the number of leaves of the sub-tree of a node
for string P is the number of occurrence of P in S.[13]
Algorithm of nding the number of sub-string occurrence
The algorithm is almost as same as the Patricia looking up algorithm, please
refer to [5] for detail, the only dierence is that the number of the children is
returned when a node matches the pattern.
6.5. SUFFIX TREE APPLICATIONS 221
Find number of sub-string occurrence in Python
In Ukkonens algorithm, there is only one copy of string, and all edges are
represent with index pairs. There are some changes because of this reason.
def lookup_pattern(t, node, s):
f = (lambda x: 1 if x==0 else x)
while True:
match = False
for _, (str_ref, tr) in node.children.items():
edge = t.substr(str_ref)
if string.find(edge, s)==0: #s isPrefixOf edge
return f(len(tr.children))
elif string.find(s, edge)==0: #edge isPrefixOf s
match = True
node = tr
s = s[len(edge):]
break
if not match:
return 0
return 0 # not found
In case a branch node matches the pattern, it means there is at least one
occurrence even if the number of children is zero. Thats why a local lambda
function is dened.
I added a member function in STree to convert a string index pair to string
as below.
class STree:
#...
def substr(self, sref):
return substr(self.str, sref)
In lookup pattern() function, it takes a sux tree which is built from the
string. A node is passed as the position to be looked up, it is root node when
starting. Parameter s is the string to be searched.
The algorithm iterate all children of the node, it convert the string index
reference pair to edge sub-string, and check if s is prex of the the edge string,
if matches, then the program can be terminated, the number of the branches of
this node will be returned as the number of the occurrence of this sub-string.
Note that no branch means there is only 1 occurrence. In case the edge is
prex of s, we then updated the node and string to be searched and go on the
searching.
Because construction of the sux tree is expensive, so we only do it when
necessary. We can do a lazy initialization as below.
TERM1 = $ # $: special terminator
class STreeUtil:
def __init__(self):
self.tree = None
def find_pattern(self, str, pattern):
if self.tree is None or self.tree.str!=str+TERM1:
self.tree = stree.suffix_tree(str+TERM1)
return lookup_pattern(self.tree, self.tree.root, pattern)
222CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
We always append special terminator to the string, so that there wont be
any sux becomes the prex of the other[2].
Some simple test cases are given to verify the program.
class StrSTreeTest:
def run(self):
self.test_find_pattern()
def test_find_pattern(self):
util=STreeUtil()
self.__test_pattern__(util, "banana", "ana")
self.__test_pattern__(util, "banana", "an")
self.__test_pattern__(util, "banana", "anan")
self.__test_pattern__(util, "banana", "nana")
self.__test_pattern__(util, "banana", "ananan")
def __test_pattern__(self, u, s, p):
print "findpattern", p, "in", s, ":", u.find_pattern(s, p)
And the output is like the following.
find pattern ana in banana : 2
find pattern an in banana : 2
find pattern anan in banana : 1
find pattern nana in banana : 1
find pattern ananan in banana : 0
Find the number of sub-string occurrence in C++
In C++, do-while is used as the repeat-until structure, the program is almost
as same as the standard Patricia looking up function.
int lookup_pattern(const STree t, std::string s){
Node node = troot;
bool match(false);
do{
match=false;
for(Node::Children::iterator it = nodechildren.begin();
it!=nodechildren.end(); ++it){
RefPair rp = itsecond;
if(rp.str().substr().find(s)==0){
int res = rp.node()children.size();
return res == 0? 1 : res;
}
else if(s.find(rp.str().substr())==0){
match = true;
node = rp.node();
s = s.substr(rp.str().substr().length());
break;
}
}
}while(match);
return 0;
}
6.5. SUFFIX TREE APPLICATIONS 223
An utility class is dened and it support lazy initialization to save the cost
of construction of sux tree.
class STreeUtil{
public:
STreeUtil():t(0){}
~STreeUtil(){ delete t; }
int find_pattern(std::string s, std::string pattern){
lazy(s);
return lookup_pattern(t, pattern);
}
private:
void lazy(std::string s){
if((!t) | | tstr != s+TERM1){
delete t;
t = suffix_tree(s+TERM1);
}
}
STree t;
};
The same test cases can be feed to this C++ program.
class StrSTreeTest{
public:
void test_find_pattern(){
__test_pattern("banana", "ana");
__test_pattern("banana", "an");
__test_pattern("banana", "anan");
__test_pattern("banana", "nana");
__test_pattern("banana", "ananan");
}
pivate:
void __test_pattern(std::string s, std::string ptn){
std::cout<<"findpattern"<<ptn<<"in"<<s<<":"
<<util.find_pattern(s, ptn)<<"n";
}
STreeUtil util;
};
And the same result will be obtained like the Python program.
Find the number of sub-string occurrence in Haskell
The Haskell program is just turn the looking up into recursive way.
lookupPattern :: Tr String Int
lookupPattern (Br lst) ptn = find lst where
find [] = 0
find ((s, t):xs)
| ptn isPrefixOf s = numberOfBranch t
| s isPrefixOf ptn = lookupPattern t (drop (length s) ptn)
| otherwise = find xs
numberOfBranch (Br ys) = length ys
224CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
numberOfBranch _ = 1
findPattern :: String String Int
findPattern s ptn = lookupPattern (suffixTree $ s++"$") ptn
To verify it, the test cases are fed to the program as the following
testPattern = ["find pattern "++p++" in banana: "++
(show $ findPattern "banana" p)
| p ["ana", "an", "anan", "nana", "anana"]]
Launching GHCi, evaluate the instruction can output the same result as the
above programs.
putStrLn $ unlines testPattern
Find the number of sub-string occurrence in Scheme/Lisp
Because the underground data structure of sux tree is list in Scheme/Lisp
program, we neednt dene a inner nd function as in Haskell program.
(define (lookup-pattern t ptn)
(define (number-of-branches node)
(if (null? node) 1 (length node)))
(if (null? t) 0
(let ((s (edge (car t)))
(tr (children (car t))))
(cond ((string-prefix? ptn s)(number-of-branches tr))
((string-prefix? s ptn)
(lookup-pattern tr (string-tail ptn (string-length s))))
(else lookup-pattern (cdr t) ptn)))))
The test cases are fed to this program via a list.
(define (test-pattern)
(define (test-ptn t s)
(cons (string-append "findpattern" s "inbanana" )
(lookup-pattern t s)))
(let ((t (suffix-tree "banana")))
(map (lambda (x) (test-ptn t x)) ("ana" "an" "anan" "nana" "anana"))))
Evaluate this test function can generate a result list as the following.
(test-pattern)
;Value 16: (("find pattern ana in banana" "ana") ("find pattern an in banana" "an") ("find pattern anan in banana" "anan") ("find pattern nana in banana" "nana") ("find pattern anana in banana" "anana"))
Complete pattern search
For search pattern like a**n with sux tree, please refer to [13] and [14].
6.5.2 Find the longest repeated sub-string
If we go one step ahead from 6.5.1, below result can be found.
After adding a special terminator character to string S, The longest repeated
sub-string can be found by searching the deepest branches in sux tree.
Consider the example sux tree shown in gure 6.11
6.5. SUFFIX TREE APPLICATIONS 225
$ i mississippi$ p s
$ ppi$
A
ssi
ppi$ ssippi$
i$ pi$
B
i
C
si
ppi$ ssippi$ ppi$ ssippi$
Figure 6.11: The sux tree for mississippi$
There are 3 branch nodes, A, B, and C which depth is 3. However, A
represents the longest repeated sub-string issi. B and C represent for si,
ssi, they are all shorter than A.
This example tells us that the depth of the branch node should be mea-
sured by the number of characters traversed from the root.
Find the longest repeated sub-string in imperative approach
According to the above analysis, to nd the longest repeated sub-string can be
turned into a BFS (Bread First Search) in a sux tree.
1: function LONGEST-REPEATED-SUBSTRING(T)
2: Q (NIL, ROOT(T))
3: R NIL
4: while Q is not empty do
5: (s, node) POP(Q)
6: for each ((l, r), node

) in CHILDREN(node) do
7: if node

is not leaf then


8: s

CONCATENATE(s, (l, r))


9: PUSH(Q, (s

, node

))
10: UPDATE(R, s

)
11: return R
where algorithm UPDATE() will compare the longest repeated sub-string
candidates. If two candidates have the same length, one simple solution is just
take one as the nal result, the other solution is to maintain a list contains all
candidates with same length.
1: function UPDATE(l, x)
2: if l = NIL or LENGTH(l[1]) < LENGTH(x) then
3: return l [x]
4: else if LENGTH(l[1]) = LENGTH(x) then
5: return APPEND(l, x)
226CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Note that the index of a list starts from 1 in this algorithm. This algorithm
will rst initialize a queue with a pair of an empty string and the root node.
Then it will repeatedly pop from the queue, examine the candidate node until
the queue is empty.
For each node, the algorithm will expand all children, if it is a branch node
(which is not a leaf), the node will be pushed back to the queue for future
examine. And the sub-string represented by this node will be compared to see
if it is a candidate of the longest repeated sub-string.
Find the longest repeated sub-string in Python
The above algorithm can be translated into Python program as the following.
def lrs(t):
queue = [("", t.root)]
res = []
while len(queue)>0:
(s, node) = queue.pop(0)
for _, (str_ref, tr) in node.children.items():
if len(tr.children)>0:
s1 = s+t.substr(str_ref)
queue.append((s1, tr))
res = update_max(res, s1)
return res
def update_max(lst, x):
if lst ==[] or len(lst[0]) < len(x):
return [x]
elif len(lst[0]) == len(x):
return lst + [x]
else:
return lst
In order to verify this program, some simple test cases are fed.
class StrSTreeTest:
#...
def run(self):
#...
self.test_lrs()
def test_lrs(self):
self.__test_lrs__("mississippi")
self.__test_lrs__("banana")
self.__test_lrs__("cacao")
self.__test_lrs__("foofooxbarbar")
def __test_lrs__(self, s):
print "longestrepeatedsubstringsof", s, "=", self.util.find_lrs(s)
By running the test case, the result like below can be obtained.
longest repeated substrings of mississippi = [issi]
longest repeated substrings of banana = [ana]
6.5. SUFFIX TREE APPLICATIONS 227
longest repeated substrings of cacao = [ca]
longest repeated substrings of foofooxbarbar = [bar, foo]
Find the longest repeated sub-string in C++
With C++, we can utilize the STL queue library in the implementation of
BFS(Bread First Search).
typedef std::list<std::string> Strings;
Strings lrs(const STree t){
std::queue<std::pair<std::string, Node> > q;
Strings res;
q.push(std::make_pair(std::string(""), troot));
while(!q.empty()){
std::string s;
Node node;
tie(s, node) = q.front();
q.pop();
for(Node::Children::iterator it = nodechildren.begin();
it!=nodechildren.end(); ++it){
RefPair rp = itsecond;
if(!(rp.node()children.empty())){
std::string s1 = s + rp.str().substr();
q.push(std::make_pair(s1, rp.node()));
update_max(res, s1);
}
}
}
return res;
}
Firstly, the empty string and root node is pushed to the queue as initialized
value. Then the program repeatedly pop from the queue, examine it to see if
any child of the node is not a leaf node, push it back to the queue and check if
it is the deepest one.
The function update max() is implemented to record all the longest strings.
void update_max(Strings& res, std::string s){
if(res.empty() | | (res.begin()).length() < s.length()){
res.clear();
res.push_back(s);
return;
}
if((res.begin()).length() == s.length())
res.push_back(s);
}
Since the cost of construction a sux tree is big (O(n) with Ukkonens
algorithm), some lazy initialization approach is used in the main entrance of
the nding program.
const char TERM1 = $;
class STreeUtil{
public:
228CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
STreeUtil():t(0){}
~STreeUtil(){ delete t; }
Strings find_lrs(std::string s){
lazy(s);
return lrs(t);
}
private:
void lazy(std::string s){
if((!t) | | tstr != s+TERM1){
delete t;
t = suffix_tree(s+TERM1);
}
}
STree t;
};
In order to verify the program, some test cases are provided. output for list
of strings can be easily realized by overloading operator.
class StrSTreeTest{
public:
StrSTreeTest(){
std::cout<<"startstringmanipulationoversuffixtreetestn";
}
void run(){
test_lrs();
}
void test_lrs(){
__test_lrs("mississippi");
__test_lrs("banana");
__test_lrs("cacao");
__test_lrs("foofooxbarbar");
}
private:
void __test_lrs(std::string s){
std::cout<<"longestrepeatedsubstirngof"<<s<<"="
<<util.find_lrs(s)<<"n";
}
STreeUtil util;
};
Running these test cases, we can obtain the following result.
start string manipulation over suffix tree test
longest repeated substirng of mississippi=[issi, ]
longest repeated substirng of banana=[ana, ]
longest repeated substirng of cacao=[ca, ]
longest repeated substirng of foofooxbarbar=[bar, foo, ]
6.5. SUFFIX TREE APPLICATIONS 229
Find the longest repeated sub-string in functional approach
Searching the deepest branch can also be realized in functional way. If the tree
is just a leaf node, empty string is returned, else the algorithm will try to nd
the longest repeated sub-string from the children of the tree.
1: function LONGEST-REPEATED-SUBSTRING(T)
2: if T is leaf then
3: return Empty
4: else
5: return PROC(CHILDREN(T))
1: function PROC(L)
2: if L is empty then
3: return Empty
4: else
5: (s, node) FIRST(L)
6: x s +LONGEST REPEATED SUBSTRING

(T)
7: y PROC(REST(L))
8: if LENGTH(x) > LENGTH(y) then
9: return x
10: else
11: return y
In PROC function, the rst element, which is a pair of edge string and a
child node, will be examine rstly. We recursively call the algorithm to nd
the longest repeated sub-string from the child node, and append it to the edge
string. Then we compare this candidate sub string with the result obtained
from the rest of the children. The longer one will be returned as the nal result.
Note that in case x and y have the same length, it is easy to modify the
program to return both of them.
Find the longest repeated sub-string in Haskell
Well provide 2 versions of Haskell implementation. One version just returns the
rst candidate in case there are multiple sub-strings which have the same length
as the longest sub-string. The other version returns all possible candidates.
isLeaf::Tr Bool
isLeaf Lf = True
isLeaf _ = False
lrs::TrString
lrs Lf = ""
lrs (Br lst) = find $ filter (not isLeaf snd) lst where
find [] = ""
find ((s, t):xs) = maximumBy (compare on length) [s++(lrs t), find xs]
In this version, we used the maximumBy function provided in Data.List
module. it will only return the rst maximum value in a list. In order to return
all maximum candidates, we need provide a customized function.
maxBy::(Ord a)(aaOrdering)[a][a]
maxBy _ [] = []
maxBy cmp (x:xs) = foldl maxBy [x] xs where
230CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
maxBy lst y = case cmp (head lst) y of
GT lst
EQ lst ++ [y]
LT [y]
lrs::Tr[String]
lrs Lf = [""]
lrs (Br lst) = find $ filter (not isLeaf snd) lst where
find [] = [""]
find ((s, t):xs) = maxBy (compare on length)
((map (s++) (lrs t)) ++ (find xs))
We can feed some simple test cases and compare the results of these 2 dif-
ferent program to see their dierence.
testLRS s = "LRS(" ++ s ++ ")=" ++ (show $ lrs $ suffixTree (s++"$")) ++ "n"
testLRS s = "LRS(" ++ s ++ ")=" ++ (lrs $ suffixTree (s++"$")) ++ "n"
test = concat [ f s | s["mississippi", "banana", "cacao", "foofooxbarbar"],
f[testLRS, testLRS]]
Below are the results printed out.
LRS(mississippi)=["issi"]
LRS(mississippi)=issi
LRS(banana)=["ana"]
LRS(banana)=ana
LRS(cacao)=["ca"]
LRS(cacao)=ca
LRS(foofooxbarbar)=["bar","foo"]
LRS(foofooxbarbar)=foo
Find the longest repeated sub-string in Scheme/Lisp
Because the underground data structure is list in Scheme/Lisp, in order to access
the sux tree components easily, some helper functions are provided.
(define (edge t)
(car t))
(define (children t)
(cdr t))
(define (leaf? t)
(null? (children t)))
Similar with the Haskell program, a function which can nd all the maximum
values on a special measurement rules are given.
(define (compare-on func)
(lambda (x y)
(cond ((< (func x) (func y)) lt)
((> (func x) (func y)) gt)
(else eq))))
6.5. SUFFIX TREE APPLICATIONS 231
(define (max-by comp lst)
(define (update-max xs x)
(case (comp (car xs) x)
(lt (list x))
(gt xs)
(else (cons x xs))))
(if (null? lst)
()
(fold-left update-max (list (car lst)) (cdr lst))))
Then the main function for searching the longest repeated sub-strings can
be implemented as the following.
(define (lrs t)
(define (find lst)
(if (null? lst)
("")
(let ((s (edge (car lst)))
(tr (children (car lst))))
(max-by (compare-on string-length)
(append
(map (lambda (x) (string-append s x)) (lrs tr))
(find (cdr lst)))))))
(if (leaf? t)
("")
(find (filter (lambda (x) (not (leaf? x))) t))))
(define (longest-repeated-substring s)
(lrs (suffix-tree (string-append s TERM1))))
Where TERM1 is dened as $ string.
Same test cases can be used to verify the results.
(define (test-main)
(let ((fs (list longest-repeated-substring))
(ss ("mississippi" "banana" "cacao" "foofooxbarbar")))
(map (lambda (f) (map f ss)) fs)))
This test program can be easily extended by adding new test functions as a
element of fs list. the result of the above function is as below.
(test-main)
;Value 16: ((("issi") ("ana") ("ca") ("bar" "foo")))
6.5.3 Find the longest common sub-string
The longest common sub-string of two strings, can also be quickly found by
using sux tree. A typical solution is to build a generalized sux tree for two
strings. If the two strings are denoted as txt
1
and txt
2
, a generalized sux tree
is SuffixTree(txt
1
$
1
txt
2
$
2
). Where $
1
is a special terminator character for
txt
1
, and $
2
is another special terminator character for txt
2
.
The longest common sub-string is indicated by the deepest branch node, with
two forks corresponding to both ...$
1
... and ...$
2
(no $
1
). The denition of
232CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
the deepest node is as same as the one for the longest repeated sub-string, it is
the number of characters traversed from root.
If a node has ...$
1
... beneath it, then the node must represent to a sub-
string of txt
1
, as $
1
is the terminator of txt
1
. On the other hand, since it also
has ...$
2
(without $
1
) child, this node must represent to a sub-string of txt
2
too. Because of its a deepest one satised this criteria. The node indicates to
the longest common sub-string.
Find the longest common sub-string imperatively
Based on the above analysis, a BFS (bread rst search) algorithm can be used
to nd the longest common sub-string.
1: function LONGEST-COMMON-SUBSTRING(T)
2: Q (NIL, ROOT(T))
3: R NIL
4: while Q is not empty do
5: (s, node) POP(Q)
6: if MATCH FORK(node) then
7: UPDATE(R, s)
8: for each ((l, r), node

) in CHILDREN(node) do
9: if node

is not leaf then


10: s

CONCATENATE(s, (l, r))


11: PUSH(Q, (s

, node

))
12: return R
The most part is as same as the algorithm for nding the longest repeated
sub-sting. The function MATCH FORK() will check if the children of a
node satisfy the common sub-string criteria.
Find the longest common sub-string in Python
By translate the imperative algorithm in Python, the following program can be
obtained.
def lcs(t):
queue = [("", t.root)]
res = []
while len(queue)>0:
(s, node) = queue.pop(0)
if match_fork(t, node):
res = update_max(res, s)
for _, (str_ref, tr) in node.children.items():
if len(tr.children)>0:
s1 = s + t.substr(str_ref)
queue.append((s1, tr))
return res
Where we dene the function match fork() as below.
def is_leaf(node):
return node.children=={}
def match_fork(t, node):
6.5. SUFFIX TREE APPLICATIONS 233
if len(node.children)==2:
[(_, (str_ref1, tr1)), (_, (str_ref2, tr2))]=node.children.items()
return is_leaf(tr1) and is_leaf(tr2) and
(t.substr(str_ref1).find(TERM2)!=-1) !=
(t.substr(str_ref2).find(TERM2)!=-1)
return False
In this function, it checks if the two children of a node are both leaf, and
one contains TERM2 character, while the other doesnt. This is because if one
child node is a leaf, it will always contains TERM1 character according to the
denition of sux tree.
Note, the main interface of the function is to add TERM2 to the rst string
and append TERM1 to the second string.
class STreeUtil:
def __init__(self):
self.tree = None
def __lazy__(self, str):
if self.tree is None or self.tree.str!=str+TERM1:
self.tree = stree.suffix_tree(str+TERM1)
def find_lcs(self, s1, s2):
self.__lazy__(s1+TERM2+s2)
return lcs(self.tree)
We can test this program like below:
util = STreeUtil()
print "longestcommonsubstringofababaandbaby=", util.find_lcs("ababa", "baby")
And the output will be something like:
longest common substring of ababx and baby = [bab]
Find the longest common sub-string in C++
In C++ implementation, we rst dene the special terminator characters for
the generalized sux tree of two strings.
const char TERM1 = $;
const char TERM2 = #;
Since the program need frequently test if a node is a branch node or leaf
node, a helper function is provided.
bool is_leaf(Node node){
return nodechildren.empty();
}
The criteria for a candidate node is that it has two children. One in pattern
...#..., the other in pattern ...$.
bool match_fork(Node node){
if(nodechildren.size() == 2){
RefPair rp1, rp2;
Node::Children::iterator it = nodechildren.begin();
rp1 = (it++)second;
234CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
rp2 = itsecond;
return (is_leaf(rp1.node()) && is_leaf(rp2.node())) &&
(rp1.str().substr().find(TERM2)!=std::string::npos)!=
(rp2.str().substr().find(TERM2)!=std::string::npos);
}
return false;
}
The main program in BFS(Bread First Search) approach is given as below.
Strings lcs(const STree t){
std::queue<std::pair<std::string, Node> > q;
Strings res;
q.push(std::make_pair(std::string(""), troot));
while(!q.empty()){
std::string s;
Node node;
tie(s, node) = q.front();
q.pop();
if(match_fork(node))
update_max(res, s);
for(Node::Children::iterator it = nodechildren.begin();
it!=nodechildren.end(); ++it){
RefPair rp = itsecond;
if(!is_leaf(rp.node())){
std::string s1 = s + rp.str().substr();
q.push(std::make_pair(s1, rp.node()));
}
}
}
return res;
}
After that we can nalize the interface in a lazy way as the following.
class STreeUtil{
public:
//...
Strings find_lcs(std::string s1, std::string s2){
lazy(s1+TERM2+s2);
return lcs(t);
}
//...
This C++ program can generate similar result as the Python one if same
test cases are given.
longest common substring of ababa, baby =[bab, ]
Find the longest common sub-string recursively
The longest common sub-string nding algorithm can also be realized in func-
tional way.
1: function LONGEST-COMMON-SUBSTRING(T)
2: if T is leaf then
3: return Empty
6.5. SUFFIX TREE APPLICATIONS 235
4: else
5: return PROC(CHILDREN(T))
If the generalized sux tree is just a leaf, empty string is returned to indicate
the trivial result. In other case, we need process the children of the tree.
1: function PROC(L)
2: if L is empty then
3: return Empty
4: else
5: (s, node) FIRST(L)
6: if MATCH FORK(node) then
7: x s
8: else
9: x LONGEST COMMON SUBSTRING

(node)
10: if x = Empty then
11: x s +x
12: y PROC(LEFT(L))
13: if LENGTH(x) > LENGTH(y) then
14: return x
15: else
16: return y
If the children list is empty, the algorithm returns empty string. In other
case, the rst element, as a pair of edge string and a child node, is rst picked,
if this child node match the fork criteria (one is in pattern ...$
1
..., the other in
pattern ...$
2
without $
1
), then the edge string is a candidate. The algorithm
will process the rest children list and compare with this candidate. The longer
one will be returned as the nal result. If it doesnt match the fork criteria, we
need go on nd the longest common sub-string from this child node recursively.
and do the similar comparison afterward.
Find the longest common sub-string in Haskell
Similar as the longest repeated sub-string problem, there are two alternative,
one is to just return the rst longest common sub-string. The other is to return
all the candidates.
lcs::Tr[String]
lcs Lf = []
lcs (Br lst) = find $ filter (not isLeaf snd) lst where
find [] = []
find ((s, t):xs) = maxBy (compare on length)
(if match t
then s:(find xs)
else (map (s++) (lcs t)) ++ (find xs))
Most of the program is as same as the one for nding the longest repeated
sub-string. The match function is dened to check the fork criteria.
match (Br [(s1, Lf), (s2, Lf)]) = ("#" isInfixOf s1) /= ("#" isInfixOf s2)
match _ = False
If the function maximumBy dened in Data.List is used, only the rst
candidate will be found.
236CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
lcs::TrString
lcs Lf = ""
lcs (Br lst) = find $ filter (not isLeaf snd) lst where
find [] = ""
find ((s, t):xs) = maximumBy (compare on length)
(if match t then [s, find xs]
else [tryAdd s (lcs t), find xs])
tryAdd x y = if y=="" then "" else x++y
We can test this program by some simple cases, below are the snippet of the
result in GHCi.
lcs $ suffixTree "baby#ababa$"
["bab"]
Find the longest common sub-string in Scheme/Lisp
It can be found from the Haskell program, that the common structure of the
lrs and lcs are very similar to each other, this hint us that we can abstract to a
common search function.
(define (search-stree t match)
(define (find lst)
(if (null? lst)
()
(let ((s (edge (car lst)))
(tr (children (car lst))))
(max-by (compare-on string-length)
(if (match tr)
(cons s (find (cdr lst)))
(append
(map (lambda (x) (string-append s x)) (search-stree tr match))
(find (cdr lst))))))))
(if (leaf? t)
()
(find (filter (lambda (x) (not (leaf? x))) t))))
This function takes a sux tree and a function to test if a node match a
certain criteria. It will lter out all leaf node rst, then repeatedly check if each
branch node match the criteria. If matches the function will compare the edge
string to see if it is the longest one, else, it will recursively check the child node
until either fails or matches.
The longest common sub-string function can be then implemented with this
function.
(define (xor x y)
(not (eq? x y)))
(define (longest-common-substring s1 s2)
(define (match-fork t)
(and (eq? 2 (length t))
(and (leaf? (car t)) (leaf? (cadr t)))
(xor (substring? TERM2 (edge (car t)))
(substring? TERM2 (edge (cadr t))))))
(search-stree (suffix-tree (string-append s1 TERM2 s2 TERM1)) match-fork))
6.5. SUFFIX TREE APPLICATIONS 237
We can test this function with some simple cases:
(longest-common-substring "xbaby" "ababa")
;Value 11: ("bab")
(longest-common-substring "ff" "bb")
;Value: ()
6.5.4 Find the longest palindrome in a string
A palindrome is a string, S, such that S = reverse(S), for instance, in English,
level, rotator, civic are all palindrome.
The longest palindrome in a string s
1
s
2
...s
n
can be found in O(n) time with
sux tree. The solution can be benet from the longest common sub-string
problem.
For string S, if a sub-string w is a palindrome, then it must be sub-string
of reverse(S) too. for instance, issi is a palindrome, it is a sub-string of
mississippi. When turns it reversed to ippississim, we found that issi is
also a sub-string.
Based on this truth, we can nd get the longest palindrome by nding the
longest common sub-string for S and reverse(S).
The algorithm is straightforward for both imperative and functional ap-
proach.
function LONGEST-PALINDROME(S)
return LONGESTCOMMONSUBSTRING(SUFFIXTREE(S+
REV ERSE(S)))
Find the longest palindrome in Python
In Python we can reverse a string s like: s[::-1], which means we start from the
beginning to ending with step -1.
class STreeUtil:
#...
def find_lpalindrome(self, s):
return self.find_lcs(s, s[::-1]) #l[::-1] = reverse(l)
We can feed some simple test cases to check if the program can nd the
palindrome.
class StrSTreeTest:
def test_lpalindrome(self):
self.__test_lpalindrome__("mississippi")
self.__test_lpalindrome__("banana")
self.__test_lpalindrome__("cacao")
self.__test_lpalindrome__("Woolloomooloo")
def __test_lpalindrome__(self, s):
print "longestpalindromeof", s, "=", self.util.find_lpalindrome(s)
The result is something like the following.
longest palindrome of mississippi = [ississi]
longest palindrome of banana = [anana]
238CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
longest palindrome of cacao = [aca, cac]
longest palindrome of Woolloomooloo = [loomool]
Find the longest palindrome in C++
C++ program is just delegate the call to longest common sub-string function.
Strings find_lpalindrome(std::string s){
std::string s1(s);
std::reverse(s1.begin(), s1.end());
return find_lcs(s, s1);
}
The test cases are added as the following.
class StrSTreeTest{
public:
//...
void test_lrs(){
__test_lrs("mississippi");
__test_lrs("banana");
__test_lrs("cacao");
__test_lrs("foofooxbarbar");
}
private:
//...
void __test_lpalindrome(std::string s){
std::cout<<"longestpalindromeof"<<s<<"="
<<util.find_lpalindrome(s)<<"n";
}
Running the test cases generate the same result.
longest palindrome of mississippi =[ississi, ]
longest palindrome of banana =[anana, ]
longest palindrome of cacao =[aca, cac, ]
longest palindrome of Woolloomooloo =[loomool, ]
Find the longest palindrome in Haskell
Haskell program of nding the longest palindrome is implemented as below.
longestPalindromes s = lcs $ suffixTree (s++"#"++(reverse s)++"$")
If some strings are fed to the program, results like the following can be
obtained.
longest palindrome(mississippi)=["ississi"]
longest palindrome(banana)=["anana"]
longest palindrome(cacao)=["aca","cac"]
longest palindrome(foofooxbarbar)=["oofoo"]
Find the longest palindrome in Scheme/Lisp
Scheme/Lisp program of nding the longest palindrome is realized as the fol-
lowing.
6.6. NOTES AND SHORT SUMMARY 239
(define (longest-palindrome s)
(longest-common-substring (string-append s TERM2)
(string-append (reverse-string s) TERM1)))
We can just add this function to the fs list in test-main program, so that the
test will automatically done.
(define (test-main)
(let ((fs (list longest-repeated-substring longest-palindrome))
(ss ("mississippi" "banana" "cacao" "foofooxbarbar")))
(map (lambda (f) (map f ss)) fs)))
The relative result snippet is as below.
(test-main)
;Value 12: (... (("ississi") ("anana") ("aca" "cac") ("oofoo")))
6.5.5 Others
Sux tree can also be used in data compression, Burrows-Wheeler transform,
LZW compression (LZSS) etc. [2]
6.6 Notes and short summary
Sux Tree was rst introduced by Weiner in 1973 [?]. In 1976, McCreight
greatly simplied the construction algorithm. McCreight construct the sux
tree from right to left. And in 1995, Ukkonen gave the rst on-line construction
algorithms from left to right. All the three algorithms are linear time (O(n)).
And some research shows the relationship among these 3 algorithms. [7]
6.7 Appendix
All programs provided along with this article are free for downloading.
6.7.1 Prerequisite software
GNU Make is used for easy build some of the program. For C++ and ANSI
C programs, GNU GCC and G++ 3.4.4 are used. For Haskell programs GHC
6.10.4 is used for building. For Python programs, Python 2.5 is used for testing,
for Scheme/Lisp program, MIT Scheme 14.9 is used.
all source les are put in one folder. Invoke make or make all will build
C++ and Haskell program.
Run make Haskell will separate build Haskell program. the executable le
is happ (with .exe in Window like OS). It is also possible to run the program
in GHCi.
6.7.2 Tools
Besides them, I use graphviz to draw most of the gures in this post. In order
to translate the Trie, Patrica and Sux Tree output to dot language scripts. I
wrote a python program. it can be used like this.
240CHAPTER 6. SUFFIX TREE WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
st2dot -o filename.dot -t type "string"
Where lename.dot is the output le for the dot script, type can be either
trie or tree, the default value is tree. it can generate sux Trie/tree from the
string input and turns the tree/Trie into dot script.
This helper scripts can also be downloaded with this article.
download position: http://sites.google.com/site/algoxy/stree/stree.zip
Bibliography
[1] Esko Ukkonen. On-line construction of sux trees. Al-
gorithmica 14 (3): 249260. doi:10.1007/BF01206331.
http://www.cs.helsinki./u/ukkonen/SuxT1withFigs.pdf
[2] Sux Tree, Wikipedia. http://en.wikipedia.org/wiki/Sux tree
[3] Esko Ukkonen. Sux tree and sux array techniques for pattern analysis
in strings. http://www.cs.helsinki./u/ukkonen/Erice2005.ppt
[4] Trie, Wikipedia. http://en.wikipedia.org/wiki/Trie
[5] Liu Xinyu. Trie and Patricia, with Functional and imperative implemen-
tation. http://sites.google.com/site/algoxy/trie
[6] Sux Tree (Java). http://en.literateprograms.org/Sux tree (Java)
[7] Robert Giegerich and Stefan Kurtz. From Ukkonen to McCreight
and Weiner: A Unifying View of Linear-Time Sux Tree Con-
struction. Science of Computer Programming 25(2-3):187-218, 1995.
http://citeseer.ist.psu.edu/giegerich95comparison.html
[8] Robert Giegerich and Stefan Kurtz. A Comparison of Imper-
ative and Purely Functional Sux Tree Constructions. Algo-
rithmica 19 (3): 331353. doi:10.1007/PL00009177. www.zbh.uni-
hamburg.de/pubs/pdf/GieKur1997.pdf
[9] Bryan OSullivan. suxtree: Ecient, lazy sux tree implementation.
http://hackage.haskell.org/package/suxtree
[10] Danny. http://hkn.eecs.berkeley.edu/ dyoo/plt/suxtree/
[11] Zhang Shaojie. Lecture of Sux Trees.
http://www.cs.ucf.edu/ shzhang/Combio09/lec3.pdf
[12] Lloyd Allison. Sux Trees. http://www.allisons.org/ll/AlgDS/Tree/Sux/
[13] Esko Ukkonen. Sux tree and sux array techniques for pattern analysis
in strings. http://www.cs.helsinki./u/ukkonen/Erice2005.ppt
[14] Esko Ukkonen Approximate string-matching over sux trees. Proc. CPM
93. Lecture Notes in Computer Science 684, pp. 228-242, Springer 1993.
http://www.cs.helsinki./u/ukkonen/cpm931.ps
241
242 B-Trees
B-Trees with Functional and imperative implementation
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
Chapter 7
B-Trees with Functional
and imperative
implementation
7.1 abstract
B-Tree is introduced by Introduction to Algorithms book[2] as one of the
advanced data structures. It is important to the modern le systems, some of
them are implemented based on B+ tree, which is extended from B-tree. It is
also widely used in many database systems. This post provides some implemen-
tation of B-trees both in imperative way as described in [2] and in functional
way with a kind of modify-and-x approach. There are multiple programming
languages used, including C++, Haskell, Python and Scheme/Lisp.
There may be mistakes in the post, please feel free to point out.
This post is generated by L
A
T
E
X2

, and provided with GNU FDL(GNU Free


Documentation License). Please refer to http://www.gnu.org/copyleft/fdl.html
for detail.
Keywords: B-Trees
7.2 Introduction
In Introduction to Algorithm book, B-tree is introduced with the the problem
of how to access a large block of data on magnetic disks or secondary storage
devices[2]. B-tree is commonly used in databases and le-systems.
It is also helpful to understand B-tree as a generalization of balanced binary
search tree[2].
243
244CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Refer to the Figure 7.1, It is easy to found the dierence and similarity of
B-tree regarding to binary search tree.
M
C G P T W
A B D E F H I J K N O Q R S U V X Y Z
Figure 7.1: An example of B-Tree
Lets review the denition of binary search tree [3].
A binary search tree is
either an empty node;
or a node contains 3 parts, a value, a left child which is a binary search
tree and a right child which is also a binary search tree.
An it satises the constraint that.
all the values in left child tree is less than the value of of this node;
the value of this node is less than any values in its right child tree.
The constraint can be represented as the following. for any node n, it satises
the below equation.
x LEFT(n), y RIGHT(n) V ALUE(x) < V ALUE(n) < V ALUE(y)
(7.1)
If we extend this denition to allow multiple keys and children, we get the
below denition.
A B-tree is
either an empty node;
or a node contains n keys, and n+1 children, each child is also a B-Tree, we
denote these keys and children as key
1
, key
2
, ..., key
n
and c
1
, c
2
, ..., c
n
, c
n+1
.
Figure 7.2 illustrates a B-Tree node.
C[1] K[1] C[2] K[2] ... C[n] K[n] C[n+1]
Figure 7.2: A B-Tree node
The keys and children in a node satisfy the following order constraints.
7.3. DEFINITION 245
Keys are stored in non-decreasing order. that is key
1
key
2
... key
n
;
for each key
i
, all values stored in child c
i
are no bigger than key
i
, while
all values stored in child c
i+1
are no less than key
i
.
The constraints can be represented as in equation refeq:btree-order as well.
x
i
c
i
, i = 0, ..., n, x
1
key
1
x
2
key
2
... x
n
key
n
x
n+1
(7.2)
Finally, if we added some constraints to make the tree balanced, we get the
complete denition of B-tree.
All leaves have the same depth;
We dene an integral number, t, as the minimum degree of a B-tree;
each node can have at most 2t 1 keys;
each node can have at least t 1 keys, except the root;
In this post, Ill rst introduce How to generate B-trees by insertion algo-
rithm. Two dierent methods will be explained. One method is discussed in
[2] book, the other is a kind of modify-x approach which quite similar to the
algorithm Okasaki used in red-black tree[4]. This method is also discussed in
wikipedia[2]. After that, how to delete element from B-tree is explained. As the
last part, algorithm for searching in B-tree is also provided.
This article provides example implementation in C, C++, Haskell, Python,
and Scheme/Lisp languages.
All source code can be downloaded in appendix 8.7, please refer to appendix
for detailed information about build and run.
7.3 Denition
Similar as Binary search tree, B-tree can be dened recursively. Because there
are multiple of keys and children, a collection container can be used to store
them.
Denition of B-tree in C++
ISO C++ support using const integral number as template parameter. This
feature can used to dene B-tree with dierent minimum degree as dierent
type.
// t: minimum degree of B-tree
template<class K, int t>
struct BTree{
typedef K key_type;
typedef std::vector<K> Keys;
typedef std::vector<BTree> Children;
BTree(){}
246CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
~BTree(){
for(typename Children::iterator it=children.begin();
it!=children.end(); ++it)
delete (it);
}
bool full(){ return keys.size() == 2t-1; }
bool leaf(){
return children.empty();
}
Keys keys;
Children children;
};
In order to support random access to keys and children, the inner data
structure uses STL vector. The node will recursively release all its children.
and a two simple auxiliary member functions full and leaf are provided to
test if a node is full or is a leaf node.
Denition of B-tree in Python
If the minimum degree is 2, the B-tree is commonly called as 2-3-4 tree. For
illustration purpose, I set 2-3-4 tree as default.
TREE_2_3_4 = 2 #by default, create 2-3-4 tree
class BTreeNode:
def __init__(self, t=TREE_2_3_4, leaf=True):
self.leaf = leaf
self.t = t
self.keys = [] #self.data = ..
self.children = []
Its quite OK for B-tree not only store keys, but also store satellite data.
However, satellite data is omitted in this post.
Also there are some auxiliary member functions dened
class BTreeNode:
#...
def is_full(self):
return len(self.keys) == 2self.t-1
This member function is used to test if a node is full.
Denition of B-tree in Haskell
In Haskell, record syntax is used to dene BTree, so that keys and children can
be access easily later on. Some auxiliary functions are also provided.
data BTree a = Node{ keys :: [a]
, children :: [BTree a]
, degree :: Int} deriving (Eq, Show)
-- Auxiliary functions
7.4. INSERTION 247
empty deg = Node [] [] deg
full::BTree a Bool
full tr = (length $ keys tr) > 2(degree tr)-1
Denition of B-tree in Scheme/Lisp
In Scheme/Lisp, because a list can contain both children and keys at same time,
we can organize a B-tree with children and keys interspersed in list. for instance,
below list represents a B-tree, the root has one key c and two children, the
left child is a leaf node, with keys A and B, while the right child is also a
leaf with keys D and E.
(("A" "B") "C" ("D" "E"))
However, this denition doesnt hold the information of minimum degree t.
The solution is to pass t as an argument for all operations.
Some auxiliary functions are provided so that we can access and test a B-tree
easily.
(define (keys tr)
(if (null? tr)
()
(if (list? (car tr))
(keys (cdr tr))
(cons (car tr) (keys (cdr tr))))))
(define (children tr)
(if (null? tr)
()
(if (list? (car tr))
(cons (car tr) (children (cdr tr)))
(children (cdr tr)))))
(define (leaf? tr)
(or (null? tr)
(not (list? (car tr)))))
Here we assume the key is a simple value, such as a number, or a string, but
not a list. In case we nd a element is a list, it represents a child B-tree. All
above functions are dened based on this assumption.
7.4 Insertion
Insertion is the basic operation to B-tree, a B-tree can be created by inserting
keys repeatedly. The essential idea of insertion is similar to the binary search
tree. If the keys to be inserted is x, we examine the keys in a node to nd a
position where all the keys on the left are less than x, while all the keys on the
right hand are greater than x. after that we can recursively insert x to the child
node at this position.
However, this basic idea need to be ne tuned. The rst thing is what the
recursion termination criteria is. This problem can be easily solved by dene
248CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
the rule that, in case the node to be inserted is a leaf node. We neednt do
inserting recursively. This is because leaf node dont have children at all. We
can just put the x between all left hand keys and right hand keys, which cause
the keys number of a leaf node increased by one.
The second thing is how to keep the balance properties of a B-tree when
inserting. if a leaf has already 2t 1 keys, it will break the rule of each node
can has at most 2t 1 keys after we insert x to it. Below sections will show 2
major methods to solve this problem.
7.4.1 Splitting
Regarding to the problem of insert a key to a node, which has already 2t 1
keys, one solution is to split the node before insertion.
In this case, we can divide the node into 3 parts as shown in Figure 7.3. the
left part contains rst t 1 keys and t children, while the right part contains
the last t 1 keys and t children. Both left part and right part are valid B-tree
nodes. the middle part is just the t-th key. It is pushed up to its parent node (if
it already root node, then the t-th key, with 2 children turn be the new root).
K[1] K[2] ... K[t] ... K[2t-1]
C[1] C[2] ... C[t] C[t+1] ... C[2t-1] C[2t]
a. Before split,
K[1] K[2] ... K[t-1]
C[1] C[2] ... C[t]
... K[t] ...
K[t+1] ... K[2t-1]
C[t+1] ... C[2t-1]
b. After split,
Figure 7.3: Split node
Imperative splitting
If we skip the disk accessing part as explained in [2]. The imperative splitting
algorithms can be shown as below.
1: procedure B-TREE-SPLIT-CHILD(node, i)
2: x CHILDREN(node)[i]
3: y CREATE NODE()
4: INSERT(KEY S(node), i, KEY S(x)[t])
5: INSERT(CHILDREN(node), i + 1, y)
6: KEY S(y) KEY S(x)[t + 1...2t 1]
7: KEY S(x) KEY S(x)[1...t 1]
7.4. INSERTION 249
8: if y is not leaf then
9: CHILDREN(y) CHILDREN(x)[t + 1...2t]
10: CHILDREN(x) CHILDREN(x)[1...t]
This algorithm take 2 parameters, one is a B-tree node, the other is the
index to indicate which child of this node will be split.
Split implemented in C++
The algorithm can be implemented in C++ as a member function of B-tree
node.
template<class K, int t>
struct BTree{
//...
void split_child(int i){
BTree<K, t> x = children[i];
BTree<K, t> y = new BTree<K, t>();
keys.insert(keys.begin()+i, xkeys[t-1]);
children.insert(children.begin()+i+1, y);
ykeys = Keys(xkeys.begin()+t, xkeys.end());
xkeys = Keys(xkeys.begin(), xkeys.begin()+t-1);
if(!xleaf()){
ychildren = Children(xchildren.begin()+t, xchildren.end());
xchildren = Children(xchildren.begin(), xchildren.begin()+t);
}
}
Split implemented in Python
We can dene splitting operation as a member method of B-tree as the following.
class BTreeNode:
#...
def split_child(self, i):
t = self.t
x = self.children[i]
y = BTreeNode(t, x.leaf)
self.keys.insert(i, x.keys[t-1])
self.children.insert(i+1, y)
y.keys = x.keys[t:]
x.keys = x.keys[:t-1]
if not y.leaf:
y.children = x.children[t:]
x.children = x.children[:t]
Functional splitting
For functional algorithm, splitting will return a tuple, which contains the left
part and right as B-Trees, along with a key.
1: function B-TREE-SPLIT(node)
2: ks KEY S(node)[1...t 1]
250CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
3: ks

KEY S(node)[t + 1...2t 1]


4: if node is not leaf then
5: cs CHILDREN(node)[1...t]
6: cs

CHILDREN(node)[t...2t]
7: return (CREATE B TREE(ks, cs), KEY S(node)[t], CREATE
B TREE(ks

, cs

))
Split implemented in Haskell
Haskell prelude provide take/drop functions to get the part of the list. These
functions just returns empty list if the list passed in is empty. So there is no
need to test if the node is leaf.
split :: BTree a (BTree a, a, BTree a)
split (Node ks cs t) = (c1, k, c2) where
c1 = Node (take (t-1) ks) (take t cs) t
c2 = Node (drop t ks) (drop t cs) t
k = head (drop (t-1) ks)
Split implemented in Scheme/Lisp
As mentioned previously, the minimum degree t is passed as an argument. The
splitting is performed according to t.
(define (split tr t)
(if (leaf? tr)
(list (list-head tr (- t 1))
(list-ref tr (- t 1))
(list-tail tr t))
(list (list-head tr (- ( t 2) 1))
(list-ref tr (- ( t 2) 1))
(list-tail tr ( t 2)))))
When splitting a leaf node, because there is no child at all, the program
simply take the rst t 1 keys and the last t 1 keys to form two child, and
left the t-th key as the only key of the new node. It will return these 3 parts in
a list. When splitting a branch node, children must be also taken into account,
thats why the rst 2t 1 and the last 2t 1 elements are taken.
7.4.2 Split before insert method
Note that the split solution will push a key up to its parent node, It is possible
that the parent node be full if it has already 2t 1 keys.
Regarding to this issue, the [2] provides a solution to check every node from
root along the path until leaf, in case there is a node in this path is full. the
split is applied. Since the parent of this node has been examined except the
root node, which ensure the parent node has less than 2t 1 keys, the pushing
up of one key wont make the parent full. This approach need only a single pass
down the tree without need of any back-tracking.
The main insert algorithm will rst check if the root node need split. If
yes, it will create a new node, and set the root as the only child, then performs
splitting. and set the new node as the root. After that, the algorithm will try
to insert the key to the non-full node.
7.4. INSERTION 251
1: function B-TREE-INSERT(T, k)
2: r T
3: if r is full then
4: s CREATE NODE()
5: APPEND(CHILDREN(s), r)
6: B TREE SPLIT CHILD(s, 1)
7: r s
8: B TREE INSERT NONFULL(r, k)
9: return r
The algorithm B TREE INSERT NONFUL assert that the node
passed in is not full. If it is a leaf node, the new key is just inserted to the
proper position based on its order. If it is a branch node. The algorithm nds
a proper child node to which the new key will be inserted. If this child node is
full, the splitting will be performed rstly.
1: procedure B-TREE-INSERT-NONFUL(T, k)
2: if T is leaf then
3: i 1
4: while i LENGTH(KEY S(T)) and k > KEY S(T)[i] do
5: i i + 1
6: INSERT(KEY S(T), i, k)
7: else
8: i LENGTH(KEY S(T))
9: while i > 1andk < KEY S(T)[i] do
10: i i 1
11: if CHILDREN(T)[i] is full then
12: B TREE SPLIT CHILD(T, i)
13: if k > KEY S(T)[i] then
14: i i + 1
15: B TREE INSERT NONFULL(CHILDREN(T)[i], k)
Note that this algorithm is actually recursive. Consider B-tree typically has
minimum degree t relative to magnetic disk structure, and it is balanced tree,
Even small depth can support huge amount of data (with t = 10, maximum to
10 billion data can be stored in a B-tree with height of 10). Of course it is easy
to eliminate the recursive call to improve the algorithm.
In the below language specic implementations, Ill eliminate recursion in
C++ program, and show the recursive version in Python program.
Insert implemented in C++
The main insert program in C++ examine if the root is full and performs
splitting accordingly. Then it will call insert nonfull to do the further process.
template<class K, int t>
BTree<K, t> insert(BTree<K, t> tr, K key){
BTree<K, t> root(tr);
if(rootfull()){
BTree<K, t> s = new BTree<K, t>();
schildren.push_back(root);
ssplit_child(0);
root = s;
252CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
}
return insert_nonfull(root, key);
}
The recursion is eliminated in insert nonfull function. If the current node
is leaf, it will call ordered insert to insert the key to the correct position. If
it is branch node, the program will nd the proper child tree and set it as the
current node in next loop. Splitting is performed if the child tree is full.
template<class K, int t>
BTree<K, t> insert_nonfull(BTree<K, t> tr, K key){
typedef typename BTree<K, t>::Keys Keys;
typedef typename BTree<K, t>::Children Children;
BTree<K, t> root(tr);
while(!trleaf()){
unsigned int i=0;
while(i < trkeys.size() && trkeys[i] < key)
++i;
if(trchildren[i]full()){
trsplit_child(i);
if(key > trkeys[i])
++i;
}
tr = trchildren[i];
}
ordered_insert(trkeys, key);
return root;
}
Where the ordered insert is dened as the following.
template<class Coll>
void ordered_insert(Coll& coll, typename Coll::value_type x){
typename Coll::iterator it = coll.begin();
while(it != coll.end() && it < x)
++it;
coll.insert(it, x);
}
For convenience, I dened auxiliary functions to convert a list of keys into
the B-tree.
template<class T>
T insert_key(T t, typename T::key_type x){
return insert(t, x);
}
template<class Iterator, class T>
T list_to_btree(Iterator first, Iterator last, T t){
return std::accumulate(first, last, t,
std::ptr_fun(insert_key<T>));
}
In order to print the result as human readable string, a recursive convert
function is provided.
7.4. INSERTION 253
template<class T>
std::string btree_to_str(T tr){
typename T::Keys::iterator k;
typename T::Children::iterator c;
std::ostringstream s;
s<<"(";
if(trleaf()){
k=trkeys.begin();
s<<k++;
for(; k!=trkeys.end(); ++k)
s<<","<<k;
}
else{
for(k=trkeys.begin(), c=trchildren.begin();
k!=trkeys.end(); ++k, ++c)
s<<btree_to_str(c)<<","<<k<<",";
s<<btree_to_str(c);
}
s<<")";
return s.str();
}
With all the above dened program, some simple test cases can be fed to
verify the program.
const char ss[] = {"G", "M", "P", "X", "A", "C", "D", "E", "J", "K",
"N", "O", "R", "S", "T", "U", "V", "Y", "Z"};
BTree<std::string, 2> tr234=list_to_btree(ss, ss+sizeof(ss)/sizeof(char),
new BTree<std::string, 2>);
std::cout<<"2-3-4treeof";
std::copy(ss, ss+sizeof(ss)/sizeof(char),
std::ostream_iterator<std::string>(std::cout, ","));
std::cout<<"n"<<btree_to_str(tr234)<<"n";
delete tr234;
BTree<std::string, 3> tr = list_to_btree(ss, ss+sizeof(ss)/sizeof(char),
new BTree<std::string, 3>);
std::cout<<"B-treewitht=3of";
std::copy(ss, ss+sizeof(ss)/sizeof(char),
std::ostream_iterator<std::string>(std::cout, ","));
std::cout<<"n"<<btree_to_str(tr)<<"n";
delete tr;
Run these lines will generate the following result:
2-3-4 tree of G, M, P, X, A, C, D, E, J, K, N, O, R, S, T, U, V, Y, Z,
(((A), C, (D)), E, ((G, J, K), M, (N, O)), P, ((R), S, (T), U, (V), X, (Y, Z)))
B-tree with t=3 of G, M, P, X, A, C, D, E, J, K, N, O, R, S, T, U, V, Y, Z,
((A, C), D, (E, G, J, K), M, (N, O), P, (R, S), T, (U, V, X, Y, Z))
Figure 7.4 shows the result.
254CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
E P
C M S U X
A D G J K N O R T V Y Z
a. Insert result of a 2-3-4 tree,
D M P T
A C E G J K N O R S U V X Y Z
b. Insert result of a B-tree with minimum degree of 3.
Figure 7.4: insert result
Insert implemented in Python
Implement the above insertion algorithm in Python is straightforward, we change
the index starts from 0 instead of 1.
def B_tree_insert(tr, key): # + data parameter
root = tr
if root.is_full():
s = BTreeNode(root.t, False)
s.children.insert(0, root)
s.split_child(0)
root = s
B_tree_insert_nonfull(root, key)
return root
And the insertion to non-full node is implemented as the following.
def B_tree_insert_nonfull(tr, key):
if tr.leaf:
ordered_insert(tr.keys, key)
#disk_write(tr)
else:
i = len(tr.keys)
while i>0 and key < tr.keys[i-1]:
i = i-1
#disk_read(tr.children[i])
if tr.children[i].is_full():
tr.split_child(i)
if key>tr.keys[i]:
i = i+1
B_tree_insert_nonfull(tr.children[i], key)
Where the function ordered insert function is used to insert an element
to an ordered list. Since Python standard list dont support order information.
The program is written as below.
7.4. INSERTION 255
def ordered_insert(lst, x):
i = len(lst)
lst.append(x)
while i>0 and lst[i]<lst[i-1]:
(lst[i-1], lst[i]) = (lst[i], lst[i-1])
i=i-1
For the array based collection, append on the tail is much more eective
than insert in other position, because the later takes O(n) time, if the length of
the collection is n. This program will rst append the new element at the end
of the existing collection, then iterate from the last element to the rst one, and
check if the current two elements next to each other are ordered. If not, these
two elements will be swapped.
For easily creating a B-tree from a list of keys, we can write a simple helper
function.
def list_to_B_tree(l, t=TREE_2_3_4):
tr = BTreeNode(t)
for x in l:
tr = B_tree_insert(tr, x)
return tr
By default, this function will create a 2-3-4 tree, and user can specify the
minimum degree as the second parameter. The rst parameter is a list of keys.
This function will repeatedly insert every key into the B-tree which starts from
an empty tree.
In order to print the B-tree out for verication, an auxiliary printing function
is provided.
def B_tree_to_str(tr):
res = "("
if tr.leaf:
res += ",".join(tr.keys)
else:
for i in range(len(tr.keys)):
res+= B_tree_to_str(tr.children[i]) + "," + tr.keys[i] + ","
res += B_tree_to_str(tr.children[len(tr.keys)])
res += ")"
return res
After that, some smoke test cases can be use to verify the insertion program.
class BTreeTest:
def run(self):
self.test_insert()
def test_insert(self):
lst = ["G", "M", "P", "X", "A", "C", "D", "E", "J", "K",
"N", "O", "R", "S", "T", "U", "V", "Y", "Z"]
tr = list_to_B_tree(lst)
print B_tree_to_str(tr)
print B_tree_to_str(list_to_B_tree(lst, 3))
Run the test cases prints two dierent B-trees. They are identical to the
C++ program outputs.
256CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
(((A), C, (D)), E, ((G, J, K), M, (N, O)), P, ((R), S, (T), U, (V), X, (Y, Z)))
((A, C), D, (E, G, J, K), M, (N, O), P, (R, S), T, (U, V, X, Y, Z))
7.4.3 Insert then x method
Another approach to implement B-tree insertion algorithm is just nd the po-
sition for the new key and insert it. Since such insertion may violate B-tree
properties. We can then apply a xing procedure after that. If a leaf contains
too much keys, we split it into 2 leafs and push a key up to the parent branch
node. Of course this operation may cause the parent node violate the B-tree
properties, so the algorithm need traverse from leaf to root to perform the xing.
By using recursive implementation these xing method can also be realized
from top to bottom.
1: function B-TREE-INSERT(T, k)
2: return FIX ROOT(RECURSIV E INSERT(T, k))
Where FIXROOT examine if the root node contains too many keys, and
do splitting if necessary.
1: function FIX-ROOT(T)
2: if FULL?(T) then
3: T B TREE SPLIT(T)
4: return T
And the inner function INSERT(T, k) will rst check if T is leaf node or
branch node. It will do directly insertion for leaf and recursively do insertion
for branch.
1: function RECURSIVE-INSERT(T, k)
2: if LEAF?(T) then
3: INSERT(KEY S(T), k)
4: return T
5: else
6: initialize empty arrays of k

, k

, c

, c

7: i 1
8: while i <= LENGTH(KEY S(T)) and KEY S(T)[i] < k do
9: APPEND(k

, KEY S(T)[i])
10: APPEND(c

, CHILDREN(T)[i])
11: i i + 1
12: k

KEY S(T)[i...LENGTH(KEY S(T))]


13: c

CHILDREN(T)[i + 1...LENGTH(CHILDREN(T))]]
14: c CHILDREN(T)[i]
15: left (k

, c

)
16: right (k

, c

)
17: return MAKEBTREE(left, RECURSIV EINSERT(c, k), right)
Figure 7.5 shows the branch case. The algorithm rst locates the position.
for certain key k
i
, if the new key k to be inserted satisfy k
i1
< k < k
i
, Then
we need recursively insert k to child c
i
.
This position divides the node into 3 parts, the left part, the child c
i
and
the right part.
The procedure MAKE B TREE take 3 parameters, which relative to
the left part, the result after insert k to c
i
and right part. It tries to merge these
7.4. INSERTION 257
k, K[i-1]<k<K[i]
K[1] K[2] ... K[i-1] K[i] ... K[n]
insert to
C[1] C[2] ... C[i-1] C[i] C[i+1] ... C[n] C[n+1]
a. locate the child to insert,
K[1] K[2] ... K[i-1]
C[1] C[2] ... C[i-1]
k, K[i-1]<k<K[i]
C[i]
recursive insert
K[i] K[i+1] ... K[n]
C[i+1] ... C[n+1]
b. recursive insert,
Figure 7.5: Insert a key to a branch node
3 parts into a new B-tree branch node.
However, insert key into a child may make this child violate the B-tree
property if it exceed the limitation of the number of keys a node can have.
MAKE B TREE will detect such situation and try to x the problem by
splitting.
1: function MAKE-B-TREE(L, C, R)
2: if FULL?(C) then
3: return FIX FULL(L, C, R)
4: else
5: T CREATE NEW NODE()
6: KEY S(T) KEY S(L) +KEY S(R)
7: CHILDREN(T) CHILDREN(L) + [C] +CHILDREN(R)
8: return T
Where FIX FULL just calls splitting process.
1: function FIX-FULL(L, C, R)
2: (C

, K, C

) B TREE SPLIT(C)
3: T CREATE NEW NODE()
4: KEY S(T) KEY S(L) + [K] +KEY S(R)
5: CHILDREN(T) CHILDREN(L) + [C

, C

] +CHILDREN(R)
6: return T
Note that splitting may push one extra key up to the parent node. However,
even the push-up causes the violation of B-tree property, it will be recursively
xed.
Insert implemented in Haskell
Realize the above recursive algorithm in Haskell can implement this insert-xing
program.
258CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
The main program is provided as the following.
insert :: (Ord a) BTree a a BTree a
insert tr x = fixRoot $ ins tr x
It will just call an auxiliary function ins then examine and x the root node
if contains too many keys.
import qualified Data.List as L
--...
ins :: (Ord a) BTree a a BTree a
ins (Node ks [] t) x = Node (L.insert x ks) [] t
ins (Node ks cs t) x = make (ks, cs) (ins c x) (ks, cs)
where
(ks, ks) = L.partition (<x) ks
(cs, (c:cs)) = L.splitAt (length ks) cs
The ins function uses pattern matching to handle the two dierent cases.
If the node to be inserted is leaf, it will call insert function dened in Haskell
standard library, which can insert the new key x into the proper position to
keep the order of the keys.
If the node to be inserted is a branch node, the program will recursively
insert the key to the child which has the range of keys cover x. After that,
it will call make function to combine the result together as a new node. the
examine and xing are performed also by make function.
The function xRoot rst check if the root node contains too many keys,
if it exceeds the limit, splitting will be applied. The split result will be used to
make a new node, so the total height of the tree increases.
fixRoot :: BTree a BTree a
fixRoot (Node [] [tr] _) = tr -- shrink height
fixRoot tr = if full tr then Node [k] [c1, c2] (degree tr)
else tr
where
(c1, k, c2) = split tr
The following is the implementation of make function.
make :: ([a], [BTree a]) BTree a ([a], [BTree a]) BTree a
make (ks, cs) c (ks, cs)
| full c = fixFull (ks, cs) c (ks, cs)
| otherwise = Node (ks++ks) (cs++[c]++cs) (degree c)
While xFull are given like below.
fixFull :: ([a], [BTree a]) BTree a ([a], [BTree a]) BTree a
fixFull (ks, cs) c (ks, cs) = Node (ks++[k]++ks)
(cs++[c1,c2]++cs) (degree c)
where
(c1, k, c2) = split c
In order to print B-tree content out, an auxiliary function toString is pro-
vided to convert a B-tree to string.
toString :: (Show a)BTree a String
toString (Node ks [] _) = "("++(L.intercalate ", " (map show ks))++")"
7.4. INSERTION 259
toString tr = "("++(toStr (keys tr) (children tr))++")" where
toStr (k:ks) (c:cs) = (toString c)++", "++(show k)++", "++(toStr ks cs)
toStr [] [c] = toString c
With all the above denition, the insertion program can be veried with
some simple test cases.
listToBTree::(Ord a)[a]IntBTree a
listToBTree lst t = foldl insert (empty t) lst
testInsert = do
putStrLn $ toString $ listToBTree "GMPXACDEJKNORSTUVYZ" 3
putStrLn $ toString $ listToBTree "GMPXACDEJKNORSTUVYZ" 2
Run testInsert will generate the following result.
((A, C, D, E), G, (J, K), M, (N, O), P, (R, S),
T, (U, V, X, Y, Z))
(((A), C, (D)), E, ((G, J, K), M, (N)), O, ((P),
R, (S), T, (U), V, (X, Y, Z)))
E O
C M R T V
A D G J K N P S U X Y Z
a. Insert result of a 2-3-4 tree (insert-xing method),
G M P T
A C D E J K N O R S U V X Y Z
b. Insert result of a B-tree with minimum degree of 3 (insert-xing method).
Figure 7.6: insert and xing results
Compare the results output by C++ or Python program with this one, as
shown in gure 7.6 we can found that there are dierent points. However, the
B-tree built by Haskell program is still valid because all B-tree properties are
satised. The main reason for this dierence is because of the approaching
change.
Insert implemented in Scheme/Lisp
The main function for insertion in Scheme/Lisp is given as the following.
(define (btree-insert tr x t)
(define (ins tr x)
260CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
(if (leaf? tr)
(ordered-insert tr x) ;;leaf
(let ((res (partition-by tr x))
(left (car res))
(c (cadr res))
(right (caddr res)))
(make-btree left (ins c x) right t))))
(fix-root (ins tr x) t))
The program simply calls an internal function and performs xing on it. The
internal ins function examine if the current node is a leaf node. In case the
node is a leaf, it only contains keys, we can located the position and insert the
new key there. Otherwise, we partition the node into 3 parts, the left part, the
child which the recursive insertion will performed on, and the right part. The
program will do the recursive insertion and then combine these three part to a
new node. xing will be happened during the combination.
Function ordered-insert can help to traverse a ordered list and insert the
new key to proper position as below.
(define (ordered-insert lst x)
(define (insert-by less-p lst x)
(if (null? lst)
(list x)
(if (less-p x (car lst))
(cons x lst)
(cons (car lst) (insert-by less-p (cdr lst) x)))))
(if (string? x)
(insert-by string<? lst x)
(insert-by < lst x)))
In order to deal with B-trees with key types both as string and as number,
we abstract the less-than function as a parameter and pass it to an internal
function.
Function partition-by uses a similar approach.
(define (partition-by tr x)
(define (part-by pred tr x)
(if (= (length tr) 1)
(list () (car tr) ())
(if (pred (cadr tr) x)
(let ((res (part-by pred (cddr tr) x))
(left (car res))
(c (cadr res))
(right (caddr res)))
(list (cons-pair (car tr) (cadr tr) left) c right))
(list () (car tr) (cdr tr)))))
(if (string? x)
(part-by string<? tr x)
(part-by < tr x)))
Where cons-pair is a helper function which can put a key, a child in front
of a B-tree.
(define (cons-pair c k lst)
(cons c (cons k lst)))
7.5. DELETION 261
In order to xing the root of a B-tree, which contains too many keys, a
x-root function is provided.
(define (full? tr t) ;; t: minimum degree
(> (length (keys tr))
(- ( 2 t) 1)))
(define (fix-root tr t)
(cond ((full? tr t) (split tr t))
(else tr)))
When we turn the recursive insertion result to a new node, we need do xing
if the result node contains too many keys.
(define (make-btree l c r t)
(cond ((full? c t) (fix-full l c r t))
(else (append l (cons c r)))))
(define (fix-full l c r t)
(append l (split c t) r))
With all above facilities, we can test the program for verication.
In order to build the B-tree easily from a list of keys, some simple helper
functions are given.
(define (listbtree lst t)
(fold-left (lambda (tr x) (btree-insert tr x t)) () lst))
(define (strslist s)
(if (string-null? s)
()
(cons (string-head s 1) (strslist (string-tail s 1)))))
A same simple test case as the Haskell one is feed to our program.
(define (test-insert)
(listbtree (strslist "GMPXACDEJKNORSTUVYZBFHIQW") 3))
Evaluate test-insert function can get a B-tree.
((("A" "B") "C" ("D" "E" "F") "G" ("H" "I" "J" "K")) "M"
(("N" "O") "P" ("Q" "R" "S") "T" ("U" "V") "W" ("X" "Y" "Z")))
It is as same as the result output by the Haskell program.
7.5 Deletion
Deletion is another basic operation of B-tree. Delete a key from a B-tree may
cause violating of B-tree balance properties, that a node cant contains too few
keys (no less than t 1 keys, where t is minimum degree).
Similar to the approaches for insertion, we can either do some preparation
so that the node from where the key will be deleted contains enough keys; or
do some xing after the deletion if the node has too few keys.
262CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
7.5.1 Merge before delete method
In textbook[2], the delete algorithm is given as algorithm description. The
pseudo code is left as exercises. The description can be used as a good reference
when writing the pseudo code.
Merge before delete algorithm implemented imperatively
The rst case is the trivial, if the key k to be deleted can be located in node x
and x is a leaf node. we can directly remove k from x.
Note that this is a terminal case. For most B-trees which have not only a
leaf node as the root. The program will rst examine non-leaf nodes.
The second case states that, the key k can be located in node x, however, x
isnt a leaf node. In this case, there are 3 sub cases.
If the child node y precedes k contains enough keys (more than t). We
replace k in node x with k

, which is the predecessor of k in child y. And


recursively remove k

from y.
The predecessor of k can be easily located as the last key of child y.
If y doesnt contains enough keys, while the child node z follows k contains
more than t keys. We replace k in node x with k

, which is the successor


of k in child z. And recursively remove k

from z.
The successor of k can be easily located as the rst key of child z.
Otherwise, if neither y, nor z contains enough keys, we can merge y, k
and z into one new node, so that this new node contains 2t 1 keys. After
that, we can then recursively do the removing.
Note that after merge, if the current node doesnt contain any keys, which
means k is the only key in x, y and z are the only two children of x. we
need shrink the tree height by one.
The case 2 is illustrated as in gure 7.7, 7.8, and 7.9.
Note that although we use recursive way to delete keys in case 2, the recur-
sion can be turned into pure imperative way. Well show such program in C++
implementation.
the last case states that, if k cant be located in node x, the algorithm need
try to nd a child node c
i
of x, so that sub-tree c
i
may contains k. Before the
deletion is recursively applied in c
i
, we need be sure that there are at least t
keys in c
i
. If there are not enough keys, we need do the following adjustment.
We check the two sibling of c
i
, which are c
i1
and c
i+1
. If either one of
them contains enough keys (at least t keys), we move one key from x down
to c
i
, and move one key from the sibling up to x. Also we need move the
relative child from the sibling to c
i
.
This operation makes c
i
contains enough keys OK for deletion. we can
next try to delete k from c
i
recursively.
In case neither one of the two siblings contains enough keys, we then merge
c
i
, a key from x, and either one of the sibling into a new node, and do the
deletion on this new node.
7.5. DELETION 263
Figure 7.7: case 2a. Replace and delete from predecessor.
Figure 7.8: case 2b. Replace and delete from successor.
264CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Figure 7.9: case 2c. Merge and delete.
Figure 7.10: case 3a. Borrow from left sibling.
7.5. DELETION 265
Figure 7.11: case 3b. Borrow Merge and delete.
Case 3 is illustrated in gure 7.10, 7.11.
By implementing the above 3 cases into pseudo code, the B-tree delete al-
gorithm can be given as the following.
First there are some auxiliary functions to do some simple test and operations
on a B-tree.
1: function CAN-DEL(T)
2: return number of keys of T t
Function CAN DEL test if a B-tree node contains enough keys (no less
than t keys).
1: procedure MERGE-CHILDREN(T, i) Merge children i and i + 1
2: x CHILDREN(T)[i]
3: y CHILDREN(T)[i + 1]
4: APPEND(KEY S(x), KEY S(T)[i])
5: CONCAT(KEY S(x), KEY S(y))
6: CONCAT(CHILDREN(x), CHILDREN(y)
7: REMOV E(KEY S(T), i)
8: REMOV E(CHILDREN(T), i + 1)
Procedure MERGECHILDREN merges the i-th child, the i-th key, and
i + 1-th child of node T into a new child, and remove the i-th key and i + 1-th
child after merging.
With these helper functions, the main algorithm of B-tree deletion is de-
scribed as below.
1: function B-TREE-DELETE(T, k)
266CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
2: i 1
3: while i <= LENGTH(KEY S(T)) do
4: if k = KEY S(T)[i] then
5: if T is leaf then case 1
6: REMOV E(KEY S(T), k)
7: else case 2
8: if CAN DEL(CHILDREN(T)[i]) then case 2a
9: KEY S(T)[i] LAST KEY (CHILDREN(T)[i])
10: BTREEDELETE(CHILDREN(T)[i], KEY S(T)[i])
11: else if CAN DEL(CHILDREN(T)[i +1]) then case 2b
12: KEY S(T)[i] FIRST KEY (CHILDREN(T)[i +1])
13: BTREEDELETE(CHILDREN(T)[i+1], KEY S(T)[i])
14: else case 2c
15: MERGE CHILDREN(T, i)
16: B TREE DELETE(CHILDREN(T)[i], k)
17: if KEY S(T) = NIL then
18: T CHILDREN(T)[i] Shrinks height
19: return T
20: else if k < KEY S(T)[i] then
21: BREAK
22: else
23: i i + 1
24: if T is leaf then
25: return T k doesnt exist in T at all
26: if not CAN DEL(CHILDREN(T)[i]) then case 3
27: if i > 1 and CAN DEL(CHILDREN(T)[i 1]) then case 3a:
left sibling
28: INSERT(KEY S(CHILDREN(T)[i]), KEY S(T)[i 1])
29: KEY S(T)[i1] POP BACK(KEY S(CHILDREN(T)[i
1]))
30: if CHILDREN(T)[i] isnt leaf then
31: c POP BACK(CHILDREN(CHILDREN(T)[i 1]))
32: INSERT(CHILDREN(CHILDREN(T)[i]), c)
33: else if i <= LENGTH(CHILDREN(T)) and CANDEL(CHILDREN(T)[i+
1] then case 3a: right sibling
34: APPEND(KEY S(CHILDREN(T)[i]), KEY S(T)[i])
35: KEY S(T)[i] POP FRONT(KEY S(CHILDREN(T)[i +
1]))
36: if CHILDREN(T)[i] isnt leaf then
37: c POPFRONT(CHILDREN(CHILDREN(T)[i+1]))
38: APPEND(CHILDREN(CHILDREN(T)[i]), c)
39: else case 3b
40: if i > 1 then
41: MERGE CHILDREN(T, i 1)
42: else
43: MERGE CHILDREN(T, i)
44: B TREE DELETE(CHILDREN(T)[i], k) recursive delete
45: if KEY S(T) = NIL then Shrinks height
7.5. DELETION 267
46: T CHILDREN(T)[1]
47: return T
Merge before deletion algorithm implemented in C++
The C++ implementation given here isnt simply translate the above pseudo
code into C++. The recursion can be eliminated in a pure imperative program.
In order to simplify some B-tree node operation, some auxiliary member
functions are added to the B-tree node class denition.
template<class K, int t>
struct BTree{
//...
// merge children[i], keys[i], and children[i+1] to one node
void merge_children(int i){
BTree<K, t> x = children[i];
BTree<K, t> y = children[i+1];
xkeys.push_back(keys[i]);
concat(xkeys, ykeys);
concat(xchildren, ychildren);
keys.erase(keys.begin()+i);
children.erase(children.begin()+i+1);
ychildren.clear();
delete y;
}
key_type replace_key(int i, key_type key){
keys[i]=key;
return key;
}
bool can_remove(){ return keys.size() t; }
//...
Function replace key can update the i-th key of a node with a new value.
Typically, this new value is pulled from a child node as described in deletion
algorithm. It will return the new value.
Function can remove will test if a node contains enough keys for further
deletion.
Function merge children can merge the i-th child, the i-th key, and the
i +1-th children into one node. This operation is reverse operation of splitting,
it can double the keys of a node, so that such adjustment can ensure a node has
enough keys for further deleting.
Note that, unlike the other languages equipped with GC, in C++ program,
the memory must be released after merging.
This function uses concat function to concatenate two collections. It is
dened as the following.
template<class Coll>
void concat(Coll& x, Coll& y){
std::copy(y.begin(), y.end(),
std::insert_iterator<Coll>(x, x.end()));
}
268CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
With these helper functions, the main program of B-tree deleting is given as
below.
template<class T>
T del(T tr, typename T::key_type key){
T root(tr);
while(!trleaf()){
unsigned int i = 0;
bool located(false);
while(i < trkeys.size()){
if(key == trkeys[i]){
located = true;
if(trchildren[i]can_remove()){ //case 2a
key = trreplace_key(i, trchildren[i]keys.back());
trchildren[i]keys.pop_back();
tr = trchildren[i];
}
else if(trchildren[i+1]can_remove()){ //case 2b
key = trreplace_key(i, trchildren[i+1]keys.front());
trchildren[i+1]keys.erase(trchildren[i+1]keys.begin());
tr = trchildren[i+1];
}
else{ //case 2c
trmerge_children(i);
if(trkeys.empty()){ //shrinks height
T temp = trchildren[0];
trchildren.clear();
delete tr;
tr = temp;
}
}
break;
}
else if(key > trkeys[i])
i++;
else
break;
}
if(located)
continue;
if(!trchildren[i]can_remove()){ //case 3
if(i>0 && trchildren[i-1]can_remove()){
// case 3a: left sibling
trchildren[i]keys.insert(trchildren[i]keys.begin(),
trkeys[i-1]);
trkeys[i-1] = trchildren[i-1]keys.back();
trchildren[i-1]keys.pop_back();
if(!trchildren[i]leaf()){
trchildren[i]children.insert(trchildren[i]children.begin(),
trchildren[i-1]children.back());
trchildren[i-1]children.pop_back();
}
}
else if(i<trchildren.size() && trchildren[i+1]can_remove()){
// case 3a: right sibling
7.5. DELETION 269
trchildren[i]keys.push_back(trkeys[i]);
trkeys[i] = trchildren[i+1]keys.front();
trchildren[i+1]keys.erase(trchildren[i+1]keys.begin());
if(!trchildren[i]leaf()){
trchildren[i]children.push_back(trchildren[i+1]children.front());
trchildren[i+1]children.erase(trchildren[i+1]children.begin());
}
}
else{
if(i>0)
trmerge_children(i-1);
else
trmerge_children(i);
}
}
tr = trchildren[i];
}
trkeys.erase(remove(trkeys.begin(), trkeys.end(), key),
trkeys.end());
if(rootkeys.empty()){ //shrinks height
T temp = rootchildren[0];
rootchildren.clear();
delete root;
root = temp;
}
return root;
}
Please note how the recursion be eliminated. The main loop terminates only
if the current node which is examined is a leaf. Otherwise, the program will go
through the B-tree along the path which may contains the key to be deleted, and
do proper adjustment including borrowing keys from other nodes, or merging
to make the candidate nodes along this path all have enough keys to perform
deleting.
In order to verify this program, a quick and simple parsing function which
can turn a B-tree description string into a B-tree is provided. Error handling of
parsing is omitted for illusion purpose.
template<class T>
T parse(std::string::iterator& first, std::string::iterator last){
T tr = new T;
++first; //(
while(first!=last){
if(first==(){ //child
trchildren.push_back(parse<T>(first, last));
}
else if(first == , | | first == )
++first; //skip deliminator
else if(first == )){
++first;
return tr;
}
else{ //key
typename T::key_type key;
270CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
while(first!=, && first!=))
key+=first++;
trkeys.push_back(key);
}
}
//should never run here
return 0;
}
template<class T>
T str_to_btree(std::string s){
std::string::iterator first(s.begin());
return parse<T>(first, s.end());
}
After that, the testing can be performed as below.
void test_delete(){
std::cout<<"testdelete...n";
const char s="(((A,B),C,(D,E,F),G,(J,K,L),M,(N,O)),"
"P,((Q,R,S),T,(U,V),X,(Y,Z)))";
typedef BTree<std::string, 3> BTr;
BTr tr = str_to_btree<BTr>(s);
std::cout<<"beforedelete:n"<<btree_to_str(tr)<<"n";
const char ks[] = {"F", "M", "G", "D", "B", "U"};
for(unsigned int i=0; i<sizeof(ks)/sizeof(char); ++i)
tr=__test_del__(tr, ks[i]);
delete tr;
}
template<class T>
T __test_del__(T tr, typename T::key_type key){
std::cout<<"delete"<<key<<"=n";
tr = del(tr, key);
std::cout<<btree_to_str(tr)<<"n";
return tr;
}
Run test delete will generate the below result.
test delete...
before delete:
(((A, B), C, (D, E, F), G, (J, K, L), M, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))
delete F==>
(((A, B), C, (D, E), G, (J, K, L), M, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))
delete M==>
(((A, B), C, (D, E), G, (J, K), L, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))
delete G==>
(((A, B), C, (D, E, J, K), L, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))
delete D==>
((A, B), C, (E, J, K), L, (N, O), P, (Q, R, S), T, (U, V), X, (Y, Z))
delete B==>
((A, C), E, (J, K), L, (N, O), P, (Q, R, S), T, (U, V), X, (Y, Z))
delete U==>
((A, C), E, (J, K), L, (N, O), P, (Q, R), S, (T, V), X, (Y, Z))
7.5. DELETION 271
Figure 7.12, 7.13, and 7.14 show this deleting test process step by step. The
nodes modied are shaded. The rst 5 steps are as same as the example shown
in textbook[2] gure 18.8.
P
C G M T X
A B D E F J K L N O Q R S U V Y Z
a. A B-tree before performing deleting;
P
C G M T X
A B D E J K L N O Q R S U V Y Z
b. After delete key F, case 1;
Figure 7.12: Result of B-tree deleting program (1)
Merge before deletion algorithm implemented in Python
In Python implementation, detailed memory management can be handled by
GC. Similar as the C++ program, some auxiliary member functions are added
to B-tree node denition.
class BTreeNode:
#...
def merge_children(self, i):
#merge children[i] and children[i+1] by pushing keys[i] down
self.children[i].keys += [self.keys[i]]+self.children[i+1].keys
self.children[i].children += self.children[i+1].children
self.keys.pop(i)
self.children.pop(i+1)
def replace_key(self, i, key):
self.keys[i] = key
return key
def can_remove(self):
return len(self.keys) self.t
The member function names are same with the C++ program, so that the
meaning for each of them can be referred in previous sub section.
272CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
P
C G L T X
A B D E J K N O Q R S U V Y Z
c. After delete key M, case 2a;
P
C L T X
A B D E J K N O Q R S U V Y Z
d. After delete key G, case 2c;
Figure 7.13: Result of B-tree deleting program (2)
C L P T X
A B E J K N O Q R S U V Y Z
e. After delete key D, case 3b, and height is shrunk;
E L P T X
A C J K N O Q R S U V Y Z
f. After delete key B, case 3a, borrow from right sibling;
E L P S X
A C J K N O Q R T V Y Z
g. After delete key U, case 3a, borrow from left sibling;
Figure 7.14: Result of B-tree deleting program (3)
7.5. DELETION 273
In contrast to the C++ program, a recursion approach similar to the pseudo
code is used in this Python program.
def B_tree_delete(tr, key):
i = len(tr.keys)
while i>0:
if key == tr.keys[i-1]:
if tr.leaf: # case 1 in CLRS
tr.keys.remove(key)
#disk_write(tr)
else: # case 2 in CLRS
if tr.children[i-1].can_remove(): # case 2a
key = tr.replace_key(i-1, tr.children[i-1].keys[-1])
B_tree_delete(tr.children[i-1], key)
elif tr.children[i].can_remove(): # case 2b
key = tr.replace_key(i-1, tr.children[i].keys[0])
B_tree_delete(tr.children[i], key)
else: # case 2c
tr.merge_children(i-1)
B_tree_delete(tr.children[i-1], key)
if tr.keys==[]: # tree shrinks in height
tr = tr.children[i-1]
return tr
elif key > tr.keys[i-1]:
break
else:
i = i-1
# case 3
if tr.leaf:
return tr #key doesnt exist at all
if not tr.children[i].can_remove():
if i>0 and tr.children[i-1].can_remove(): #left sibling
tr.children[i].keys.insert(0, tr.keys[i-1])
tr.keys[i-1] = tr.children[i-1].keys.pop()
if not tr.children[i].leaf:
tr.children[i].children.insert(0, tr.children[i-1].children.pop())
elif i<len(tr.children) and tr.children[i+1].can_remove(): #right sibling
tr.children[i].keys.append(tr.keys[i])
tr.keys[i]=tr.children[i+1].keys.pop(0)
if not tr.children[i].leaf:
tr.children[i].children.append(tr.children[i+1].children.pop(0))
else: # case 3b
if i>0:
tr.merge_children(i-1)
else:
tr.merge_children(i)
B_tree_delete(tr.children[i], key)
if tr.keys==[]: # tree shrinks in height
tr = tr.children[0]
return tr
In order to verify the deletion program, similar test cases are fed to the
function.
def test_delete():
print "testdelete"
274CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
t = 3
tr = BTreeNode(t, False)
tr.keys=["P"]
tr.children=[BTreeNode(t, False), BTreeNode(t, False)]
tr.children[0].keys=["C", "G", "M"]
tr.children[0].children=[BTreeNode(t), BTreeNode(t), BTreeNode(t), BTreeNode(t)]
tr.children[0].children[0].keys=["A", "B"]
tr.children[0].children[1].keys=["D", "E", "F"]
tr.children[0].children[2].keys=["J", "K", "L"]
tr.children[0].children[3].keys=["N", "O"]
tr.children[1].keys=["T", "X"]
tr.children[1].children=[BTreeNode(t), BTreeNode(t), BTreeNode(t)]
tr.children[1].children[0].keys=["Q", "R", "S"]
tr.children[1].children[1].keys=["U", "V"]
tr.children[1].children[2].keys=["Y", "Z"]
print B_tree_to_str(tr)
lst = ["F", "M", "G", "D", "B", "U"]
reduce(__test_del__, lst, tr)
def __test_del__(tr, key):
print "delete", key
tr = B_tree_delete(tr, key)
print B_tree_to_str(tr)
return tr
In this test case, the B-tree is constructed manually. It is identical to the
B-tree built in C++ deleting test case. Run the test function will generate the
following result.
test delete
(((A, B), C, (D, E, F), G, (J, K, L), M, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))
delete F
(((A, B), C, (D, E), G, (J, K, L), M, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))
delete M
(((A, B), C, (D, E), G, (J, K), L, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))
delete G
(((A, B), C, (D, E, J, K), L, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))
delete D
((A, B), C, (E, J, K), L, (N, O), P, (Q, R, S), T, (U, V), X, (Y, Z))
delete B
((A, C), E, (J, K), L, (N, O), P, (Q, R, S), T, (U, V), X, (Y, Z))
delete U
((A, C), E, (J, K), L, (N, O), P, (Q, R), S, (T, V), X, (Y, Z))
This result is as same as the one output by C++ program.
7.5.2 Delete and x method
From previous sub-sections, we see how complex is the deletion algorithm, There
are several cases, and in each case, there are sub cases to deal.
Another approach to design the deleting algorithm is a kind of delete-then-x
way. It is similar to the insert-then-x strategy.
7.5. DELETION 275
When we need delete a key from a B-tree, we rstly try to locate which
node this key is contained. This will be a traverse process from the root node
towards leaves. We start from root node, If the key doesnt exist in the node,
well traverse deeper and deeper until we rich a node.
If this node is a leaf node, we can remove the key directly, and then examine
if the deletion makes the node contains too few keys to maintain the B-tree
balance properties.
If it is a branch node, removing the key will break the node into two parts,
we need merge them together. The merging is a recursive process which can be
shown in gure 7.15.
Figure 7.15: Delete a key from a branch node. Removing k
i
breaks the node into
2 parts, left part and right part. Merging these 2 parts is a recursive process.
When the two parts are leaves, the merging terminates.
When do merging, if the two nodes are not leaves, we merge the keys to-
gether, and recursively merge the last child of the left part and the rst child of
the right part as one new child node. Otherwise, if they are leaves, we merely
put all keys together.
Till now, we do the deleting in straightforward way. However, deleting will
decrease the number of keys of a node, and it may result in violating the B-
tree balance properties. The solution is to perform a xing along the path we
traversed from root.
When we do recursive deletion, the branch node is broken into 3 parts.
The left part contains all keys less than k, say k
1
, k
2
, ..., k
i1
, and children
c
1
, c
2
, ..., c
i1
, the right part contains all keys greater than k, say k
i
, k
i+1
, ..., k
n+1
,
and children c
i+1
, c
i+2
, ..., c
n+1
, the child c
i
which recursive deleting applied be-
comes c

i
. We need make these 3 parts to a new node as shown in gure 7.16.
276CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Figure 7.16: Denote c

i
as the result of recursively deleting key k, from child c
i
,
we should do xing when making the left part, c

i
and right part together to a
new node.
At this time point, we can examine if c

i
contains enough keys, it the number
of keys is to less (less than t 1, but not t in contrast to merge and delete
approach), we can either borrow a key-child pair from left part or right part,
and do a inverse operation of splitting. Figure 7.17 shows an example of borrow
from left part.
In case both left part and right part is empty, we can simply push c

i
up.
Delete and x algorithm implemented functionally
By summarize all above analysis, we can draft the delete and x algorithm.
1: function B-TREE-DELETE(T, k)
2: return FIX ROOT(DEL(T, k))
3: function DEL(T, k)
4: if CHILDREN(T) = NIL then leaf node
5: DELETE(KEY S(T), k)
6: return T
7: else branch node
8: n LENGTH(KEY S(T))
9: i LOWER BOUND(KEY S(T), k)
10: if KEY S(T)[i] = k then
11: k
l
KEY S(T)[1, ..., i 1]
12: k
r
KEY S(T)[i + 1, ..., n]
13: c
l
CHILDREN(T)[1, ..., i]
14: c
r
CHILDREN(T)[i + 1, ..., n + 1]
15: return MERGE(CREATE B TREE(k
l
, c
l
), CREATE
7.5. DELETION 277
Figure 7.17: Borrow a key-child pair from left part and un-split to a new child.
B TREE(k
r
, c
r
))
16: else
17: k
l
KEY S(T)[1, ..., i 1]
18: k
r
KEY S(T)[i, ..., n]
19: c CHILDREN(T)[i]
20: c
l
CHILDREN(T)[1, ..., i 1]
21: c
r
CHILDREN(T)[i + 1, ..., n + 1]
22: return MAKE((k
l
, c
l
), c, (k
r
, c
r
))
The main delete function will call an internal DEL function to performs the
work, after that, it will apply FIX ROOT to check if need to shrink the tree
height. So the FIXROOT function we dened in insertion section should be
updated as the following.
1: function FIX-ROOT(T)
2: if KEY S(T) = NIL then Single child, shrink the height
3: T CHILDREN(T)[1]
4: else if FULL?(T) then
5: T B TREE SPLIT(T)
6: return T
For the recursive merging, the algorithm is given as below. The left part
and right part are passed as parameters. If they are leaves, we just put all keys
together. Otherwise, we recursively merge the last child of left and the rst
child of right to a new child, and make this new merged child and the other two
parts it breaks into a new node.
1: function MERGE(L, R)
2: if L, R are leaves then
278CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
3: T CREATE NEW NODE()
4: KEY S(T) KEY S(L) +KEY S(R)
5: return T
6: else
7: mgetsLENGTH(KEY S(L))
8: ngetsLENGTH(KEY S(R))
9: k
l
KEY S(L)
10: k
r
KEY S(R)
11: c
l
CHILDREN(L)[1, ..., m1]
12: c
r
CHILDREN(R)[2, ..., n]
13: c MERGE(CHILDREN(L)[m], CHILDREN(R)[1])
14: return MAKE B TREE((k
l
, c
l
), c, (k
r
, c
r
))
In order to make the three parts, the left L, the right R and the child c

i
into
a node, we need examine if c
i
contains enough keys, together with the process of
ensure it contains not too much keys during insertion, we updated the algorithm
like the following.
1: function MAKE-B-TREE(L, C, R)
2: if FULL?(C) then
3: return FIX FULL(L, C, R)
4: else if LOW?(C) then
5: return FIX LOW(L, C, R)
6: else
7: T CREATE NEW NODE()
8: KEY S(T) KEY S(L) +KEY S(R)
9: CHILDREN(T) CHILDREN(L) + [C] +CHILDREN(R)
10: return T
Where FIX LOW is dened as the following. In case the left part isnt
empty, it will borrow a key-child pair from the left, and do un-split to make the
child contains enough keys, then recursively call MAKE B TREE; If the
left part is empty, it will try to borrow key-child pair from the right part, and if
both sides are empty, it will returns the child node as result, so that the height
shrinks.
1: function FIX-LOW(L, C, R)
2: k
l
, c
l
L
3: k
r
, c
r
R
4: m LENGTH(k
l
)
5: n LENGTH(k
r
)
6: if k
l
= NIL then
7: k

l
k
l
[1, ..., m1]
8: c

l
c
l
[1, ..., m1]
9: C

UN SPLIT(c
l
[m], k
l
[m], C)
10: return MAKE B TREE((k

l
, c

l
), C

, R)
11: else if k
r
= NIL then
12: k

r
k
r
[2, ..., n]
13: c

r
c
r
[2, ..., n]
14: C

UN SPLIT(C, k
r
[1], c
r
[1])
15: return MAKE B TREE(L, C

, (k

r
, c

r
))
16: else
17: return C
7.5. DELETION 279
Function UN SPLIT denes as the inverses operation of splitting.
1: function UN-SPLIT(L, k, R)
2: T CREATE B TREE NODE()
3: KEY S(T) KEY S(L) + [k] +KEY S(R)
4: CHILDREN(T) CHILDREN(L) +CHILDREN(R)
5: return T
Delete and x algorithm implemented in Haskell
Based on the analysis of delete-then-xing approach, a Haskell program can be
provided accordingly.
The core deleting function is simple, it just call an internal removing func-
tion, then examine the root node to see if the height of the tree can be shrunk.
import qualified Data.List as L
delete :: (Ord a) BTree a a BTree a
delete tr x = fixRoot $ del tr x
del:: (Ord a) BTree a a BTree a
del (Node ks [] t) x = Node (L.delete x ks) [] t
del (Node ks cs t) x =
case L.elemIndex x ks of
Just i merge (Node (take i ks) (take (i+1) cs) t)
(Node (drop (i+1) ks) (drop (i+1) cs) t)
Nothing make (ks, cs) (del c x) (ks, cs)
where
(ks, ks) = L.partition (<x) ks
(cs, (c:cs)) = L.splitAt (length ks) cs
Lets focus on the del function, if try to delete a key from a leaf node, it
just calls delete function dened in Data.List library. If the key doesnt exist
at all, the pre-dened delete function will simply return the list without any
modication. For the case of deleting a key from a branch node, it will rst
examine if the key can be located in this node, and apply recursive merge after
remove this key. Otherwise, it will locate the proper child and do recursive
delete-then-xing on this child.
Note that partition and splitAt functions dened in Data.List can help to
split the key and children list at the position that all elements on the left is less
than the key while the right part are greater than the key.
The recursive merge program has two patterns, merge two leaves and merge
two branches. It is given as the following.
merge :: BTree a BTree a BTree a
merge (Node ks [] t) (Node ks [] _) = Node (ks++ks) [] t
merge (Node ks cs t) (Node ks cs _) = make (ks, init cs)
(merge (last cs) (head cs))
(ks, tail cs)
Where init, last, tail functions are used to manipulate list which are
dened in Haskell prelude.
The xing part of delete-then-xing is dened inside make function.
make :: ([a], [BTree a]) BTree a ([a], [BTree a]) BTree a
280CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
make (ks, cs) c (ks, cs)
| full c = fixFull (ks, cs) c (ks, cs)
| low c = fixLow (ks, cs) c (ks, cs)
| otherwise = Node (ks++ks) (cs++[c]++cs) (degree c)
Where function low is used to test if a node contains too few keys.
low::BTree a Bool
low tr = (length $ keys tr) < (degree tr)-1
The real xing is implemented by try to borrow keys either from left sibling
or right sibling as the following.
fixLow :: ([a], [BTree a]) BTree a ([a], [BTree a]) BTree a
fixLow (ks@(_:_), cs) c (ks, cs) = make (init ks, init cs)
(unsplit (last cs) (last ks) c)
(ks, cs)
fixLow (ks, cs) c (ks@(_:_), cs) = make (ks, cs)
(unsplit c (head ks) (head cs))
(tail ks, tail cs)
fixLow _ c _ = c
Note that by using x@( : ) like pattern can help to ensure x is not empty.
Here function unsplit is used which will do inverse splitting operation like
below.
unsplit :: BTree a a BTree a BTree a
unsplit c1 k c2 = Node ((keys c1)++[k]++(keys c2))
((children c1)++(children c2)) (degree c1)
In order to verify the Haskell program, we can provide some simple test
cases.
import Control.Monad (foldM_, mapM_)
testDelete = foldM_ delShow (listToBTree "GMPXACDEJKNORSTUVYZBFHIQW" 3) "EGAMU"
where
delShow tr x = do
let tr = delete tr x
putStrLn $ "delete "++(show x)
putStrLn $ toString tr
return tr
Where function listToBTree and toString are dened in previous section
when we explain insertion algorithm.
Run this function will generate the following result.
delete E
(((A, B), C, (D, F), G, (H, I, J, K)), M,
((N, O), P, (Q, R, S), T, (U, V), W, (X, Y, Z)))
delete G
(((A, B), C, (D, F), H, (I, J, K)), M,
((N, O), P, (Q, R, S), T, (U, V), W, (X, Y, Z)))
delete A
((B, C, D, F), H, (I, J, K), M, (N, O),
P, (Q, R, S), T, (U, V), W, (X, Y, Z))
delete M
7.5. DELETION 281
((B, C, D, F), H, (I, J, K, N, O), P,
(Q, R, S), T, (U, V), W, (X, Y, Z))
delete U
((B, C, D, F), H, (I, J, K, N, O), P,
(Q, R, S, T, V), W, (X, Y, Z))
If we try to delete the same key from the same B-tree as in merge and xing
approach, we can found that the result is dierent by using delete-then-xing
methods. Although the results are not as same as each other, both satisfy the
B-tree properties, so they are all correct.
M
C G P T W
A B D E F H I J K N O Q R S U V X Y Z
a. A B-tree before performing deleting;
M
C G P T W
A B D F H I J K N O Q R S U V X Y Z
b. After delete key E
Figure 7.18: Result of delete-then-xing (1)
Delete and x algorithm implemented in Scheme/Lisp
In order to implement delete program in Scheme/Lisp, we provide an extra
function to test if a node contains too few keys after deletion.
(define (low? tr t) ;; t: minimum degree
(< (length (keys tr))
(- t 1)))
And some general purpose list manipulation functions are dened.
(define (rest lst k)
(list-tail lst (- (length lst) k)))
(define (except-rest lst k)
(list-head lst (- (length lst) k)))
(define (first lst)
282CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
M
C H P T W
A B D F I J K N O Q R S U V X Y Z
c. After delete key G;
H M P T W
B C D F I J K N O Q R S U V X Y Z
d. After delete key A;
Figure 7.19: Result of delete-then-xing (2)
H P T W
B C D F I J K N O Q R S U V X Y Z
e. After delete key M;
H P W
B C D F I J K N O Q R S T V X Y Z
f. After delete key U;
Figure 7.20: Result of delete-then-xing (3)
7.5. DELETION 283
(if (null? lst) () (car lst)))
(define (last lst)
(if (null? lst) () (car (last-pair lst))))
(define (inits lst)
(if (null? lst) () (except-last-pair lst)))
Function rest can extract the last k elements from a list, while except-rest
used to extract all except the last k elements. rst can be treat as a safe
car, it will return empty list but not throw exception when the list is empty.
Function last returns the last element of a list, and if the list is empty, it will
return empty result. Function inits returns all excluding the last element.
And a inversion operation of splitting is provided.
(define (un-split lst)
(let ((c1 (car lst))
(k (cadr lst))
(c2 (caddr lst)))
(append c1 (list k) c2)))
end{lstlising}
The main function of deletion is defined as the following.
begin{lstlisting}
(define (btree-delete tr x t)
(define (del tr x)
(if (leaf? tr)
(delete x tr)
(let ((res (partition-by tr x))
(left (car res))
(c (cadr res))
(right (caddr res)))
(if (equal? (first right) x)
(merge-btree (append left (list c)) (cdr right) t)
(make-btree left (del c x) right t)))))
(fix-root (del tr x) t))
It is implemented in a similar way as the insertion, call an internal dened
del function then apply xing process on it. In the internal deletion ction, if
the B-tree is a leaf node, the standard list deleting function dened in standard
library is applied. If it is a branch node, we call the partition-by function
dened previously. This function will divide the node into 3 parts, all children
and keys less than x as the left part, a child node next, all keys not less than
(greater than or equal to) x and children s the right part.
If the rst key in right part is equal to x, it means x can be located in
this node, we remove x from right and then call merge-btree to merge left+c,
right-x to one new node.
(define (merge-btree tr1 tr2 t)
(if (leaf? tr1)
(append tr1 tr2)
(make-btree (inits tr1)
(merge-btree (last tr1) (car tr2) t)
(cdr tr2)
284CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
t)))
Otherwise, x may be located in c, so we need recursively try to delete x from
c.
Function x-root is updated to handle the cases for deletion as below.
(define (fix-root tr t)
(cond ((null? tr) ()) ;; empty tree
((full? tr t) (split tr t))
((null? (keys tr)) (car tr)) ;; shrink height
(else tr)))
We added one case to handle if a node contains too few keys after deleting
in make-btree.
(define (make-btree l c r t)
(cond ((full? c t) (fix-full l c r t))
((low? c t) (fix-low l c r t))
(else (append l (cons c r)))))
Where x-low is dened to try to borrow a key and a child either from left
sibling or right sibling.
(define (fix-low l c r t)
(cond ((not (null? (keys l)))
(make-btree (except-rest l 2)
(un-split (append (rest l 2) (list c)))
r t))
((not (null? (keys r)))
(make-btree l
(un-split (cons c (list-head r 2)))
(list-tail r 2) t))
(else c)))
In order to verify the the deleting program, a simple test is fed to the above
dened function.
(define (test-delete)
(define (del-and-show tr x)
(let ((r (btree-delete tr x 3)))
(begin (display r) (display "n") r)))
(fold-left del-and-show
(listbtree (strslist "GMPXACDEJKNORSTUVYZBFHIQW") 3)
(strslist "EGAMU")))
Run the test will generate the following result.
(((A B) C (D F) G (H I J K)) M ((N O) P (Q R S) T (U V) W (X Y Z)))
(((A B) C (D F) H (I J K)) M ((N O) P (Q R S) T (U V) W (X Y Z)))
((B C D F) H (I J K) M (N O) P (Q R S) T (U V) W (X Y Z))
((B C D F) H (I J K N O) P (Q R S) T (U V) W (X Y Z))
((B C D F) H (I J K N O) P (Q R S T V) W (X Y Z))
Compare with the output by the Haskell program in previous section, it can
be found they are same.
7.6. SEARCHING 285
7.6 Searching
Although searching in B-tree can be considered as a generalized form of tree
search which extended from binary search tree, its good to mention that in
disk access case, instead of just returning the satellite data corresponding to
the key, its more meaningful to return the whole node, which contains the key.
7.6.1 Imperative search algorithm
When searching in Binary tree, there are only 2 dierent directions, left and
right to go further searching, however, in B-tree, we need extend the search
directions to cover the number of children in a node.
1: function B-TREE-SEARCH(T, k)
2: loop
3: i 1
4: while i LENGTH(KEY S(T)) and k > KEY S(T)[i] do
5: k k + 1
6: if i LENGTH(KEY S(T)) and k = KEY S(T)[i] then
7: return (T, i)
8: if T is leaf then
9: return NIL k doesnt exist at all
10: else
11: T CHILDREN(T)[i]
When doing search, the program examine each key from the root node by
traverse from the smallest towards the biggest one. in case it nd a matched
key, it returns the current node as well as the index of this keys. Otherwise, if
it nds this key satisfying k
i
< k < k
i+1
, The program will update the current
node to be examined as child node c
i+1
. If it fails to nd this key in a leaf node,
empty value is returned to indicate the fail case.
Note that in Introduction to Algorithm, this program is described with
recursion, Here the recursion is eliminated.
search program in C++
In C++ implementation, we can use pair provided in STL library as the return
type.
template<class T>
std::pair<T, unsigned int> search(T t, typename T::key_type k){
for(;;){
unsigned int i(0);
for(; i < tkeys.size() && k > tkeys[i]; ++i);
if(i < tkeys.size() && k == tkeys[i])
return std::make_pair(t, i);
if(tleaf())
break;
t = tchildren[i];
}
return std::make_pair((T)0, 0); //not found
}
286CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
And the test cases are given as below.
void test_search(){
std::cout<<"testsearch...n";
const char ss[] = {"G", "M", "P", "X", "A", "C", "D", "E", "J", "K",
"N", "O", "R", "S", "T", "U", "V", "Y", "Z"};
BTree<std::string, 3> tr = list_to_btree(ss, ss+sizeof(ss)/sizeof(char),
new BTree<std::string, 3>);
std::cout<<"n"<<btree_to_str(tr)<<"n";
for(unsigned int i=0; i<sizeof(ss)/sizeof(char); ++i)
__test_search(tr, ss[i]);
__test_search(tr, "W");
delete tr;
}
template<class T>
void __test_search(T t, typename T::key_type k){
std::pair<T, unsigned int> res = search(t, k);
if(res.first)
std::cout<<"found"<<res.firstkeys[res.second]<<"n";
else
std::cout<<"notfound"<<k<<"n";
}
Run test search function will generate the following result.
test search...
((A, C), D, (E, G, J, K), M, (N, O), P, (R, S), T, (U, V, X, Y, Z))
found G
found M
...
found Z
not found W
Here the program can nd all keys we inserted.
search program in Python
Change a bit the above algorithm in Python gets the program corresponding to
the pseudo code mentioned in Introduction to Algorithm textbook.
def B_tree_search(tr, key):
for i in range(len(tr.keys)):
if key tr.keys[i]:
break
if key == tr.keys[i]:
return (tr, i)
if tr.leaf:
return None
else:
if key>tr.keys[-1]:
i=i+1
#disk_read
return B_tree_search(tr.children[i], key)
7.6. SEARCHING 287
There is a minor modication from the original pseudo code. We uses for-
loop to iterate the keys, the the boundary check is done by compare the last
key in the node and adjust the index if necessary.
Lets feed some simple test cases to this program.
def test_search():
lst = ["G", "M", "P", "X", "A", "C", "D", "E", "J", "K",
"N", "O", "R", "S", "T", "U", "V", "Y", "Z"]
tr = list_to_B_tree(lst, 3)
print "testsearchn", B_tree_to_str(tr)
for i in lst:
__test_search__(tr, i)
__test_search__(tr, "W")
def __test_search__(tr, k):
res = B_tree_search(tr, k)
if res is None:
print k, "notfound"
else:
(node, i) = res
print "found", node.keys[i]
Run the function test search will generate the following result.
found G
found M
...
found Z
W not found
7.6.2 Functional search algorithm
The imperative algorithm can be turned into Functional by performing recursive
search on a child in case key cant be located in current node.
1: function B-TREE-SEARCH(T, k)
2: i FIND FIRST(
x
x >= k, KEY S(T))
3: if i exists and k = KEY S(T)[i] then
4: return (T, i)
5: if T is leaf then
6: return NIL k doesnt exist at all
7: else
8: return B TREE SEARCH(CHILDREN(T)[i], k)
Search program in Haskell
In Haskell program, we rst lter out all keys less than the key to be searched.
Then check the rst element in the result. If it matches, we return the current
node along with the index as a tuple. Where the index start from 0. If it
doesnt match, We then do recursive search till leaf node.
search :: (Ord a) BTree a a Maybe (BTree a, Int)
search tr@(Node ks cs _) k
| matchFirst k $ drop len ks = Just (tr, len)
288CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
| otherwise = if null cs then Nothing
else search (cs !! len) k
where
matchFirst x (y:_) = x==y
matchFirst x _ = False
len = length $ filter (<k) ks
The verication test cases are provided as the following.
testSearch = mapM_ (showSearch (listToBTree lst 3)) $ lst++"L"
where
showSearch tr x = do
case search tr x of
Just (_, i) putStrLn $ "found" ++ (show x)
Nothing putStrLn $ "notfound" ++ (show x)
lst = "GMPXACDEJKNORSTUVYZBFHIQW"
Here we construct a B-tree from a series of string, then we check if each
element in this string can be located. Finally, an non-existed element L is fed
to verify the failure case.
Run this test function generates the following results.
foundG
foundM
...
foundW
not foundL
Search program in Scheme/Lisp
Because we intersperse children and keys in one list in Scheme/Lisp B-tree
denition, the search function just move one step a head to locate the key in a
node.
(define (btree-search tr x)
;; find the smallest index where keys[i] x
(define (find-index tr x)
(let ((pred (if (string? x) string? )))
(if (null? tr)
0
(if (and (not (list? (car tr))) (pred (car tr) x))
0
(+ 1 (find-index (cdr tr) x))))))
(let ((i (find-index tr x)))
(if (and (< i (length tr)) (equal? x (list-ref tr i)))
(cons tr i)
(if (leaf? tr) #f (btree-search (list-ref tr (- i 1)) x)))))
The program denes an inner function to nd the index of the rst element
which is greater or equal to the key we are searching.
If the key pointed by this index matches, we are done. Otherwise, this index
points to a child which may contains this key. The program will return false
result in case the current node is a leaf node.
We can run the below testing function to verify this searching program.
7.7. NOTES AND SHORT SUMMARY 289
(define (test-search)
(define (search-and-show tr x)
(if (btree-search tr x)
(display (list "found" x))
(display (list "notfound" x))))
(let ((lst (strslist "GMPXACDEJKNORSTUVYZBFHIQW"))
(tr (listbtree lst 3)))
(map (lambda (x) (search-and-show tr x)) (cons "L" lst))))
end{lstlisitng}
A non-existed key L is firstly fed, and then all elements
which used to form the B-tree are looked up for verification.
begin{lstlisting}
(not found L)(found G)(found M) .. (found W)
7.7 Notes and short summary
In this post, we explained the B-tree data structure as a kind of extension
from binary search tree. The background knowledge of magnetic disk access
is skipped, user can refer to [2] for detail. For the three main operations, in-
sertion, deletion, and searching, both imperative and functional algorithms are
illustrated. The complexity isnt discussed here, However, since B-tree are de-
ned to maintain the balance properties, all operations mentioned here perform
O(lgN) where N is the number of the keys in a B-tree.
7.8 Appendix
All programs provided along with this article are free for downloading.
7.8.1 Prerequisite software
GNU Make is used for easy build some of the program. For C++ and ANSI
C programs, GNU GCC and G++ 3.4.4 are used. For Haskell programs GHC
6.10.4 is used for building. For Python programs, Python 2.5 is used for testing,
for Scheme/Lisp program, MIT Scheme 14.9 is used.
all source les are put in one folder. Invoke make or make all will build
C++ and Haskell program.
Run make Haskell will separate build Haskell program. the executable le
is htest (with .exe in Window like OS). It is also possible to run the program
in GHCi.
7.8.2 Tools
Besides them, I use graphviz to draw most of the gures in this post. In order
to translate the B-tree output to dot script. A Haskell tool is provided. It can
be used like this.
bt2dot filename.dot "string"
290CHAPTER 7. B-TREES WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Where lename.dot is the output le for the dot script. It can parse the
string which describes B-tree content and translate it into dot script.
This source code of this tool is BTr2dot.hs, it can also be downloaded with
this article.
download position: http://sites.google.com/site/algoxy/btree/btree.zip
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.
[2] B-tree, Wikipedia. http://en.wikipedia.org/wiki/B-tree
[3] Liu Xinyu. Comparison of imperative and functional implementation of
binary search tree. http://sites.google.com/site/algoxy/bstree
[4] Chris Okasaki. FUNCTIONAL PEARLS Red-Black Trees in a Functional
Setting. J. Functional Programming. 1998
291
292 BIBLIOGRAPHY
Part II
Heaps
293
Imperative and Functional 295
Binary Heaps with Functional and imperative implementation
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
296 Binary Heaps
Chapter 8
Binary Heaps with
Functional and imperative
implementation
8.1 abstract
Heap is one of the elementary data structure. It is widely used to solve some
practical problems, such as sorting, prioritized scheduling, and graph algorithms[2].
Most popular implementations of heap are using a kind of implicit binary
heap by array, which is described in Introduction to Algorithm textbook[2].
Examples include C++/STL heap and Python heapq.
However, heaps can be generalized and implemented with varies of other
data structure besides array. In this post, explicit binary tree is used to realize
heaps. It leads to Leftist heaps, Skew heaps, and Splay heaps, which are suitable
for pure functional implementation as shown by Okasaki[6].
There are multiple programming languages used, including C++, Haskell,
Python and Scheme/Lisp.
There may be mistakes in the post, please feel free to point out.
This post is generated by L
A
T
E
X2

, and provided with GNU FDL(GNU Free


Documentation License). Please refer to http://www.gnu.org/copyleft/fdl.html
for detail.
Keywords: Binary Heaps, Leftist Heaps, Skew Heaps, Splay Heaps
297
298CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
8.2 Introduction
Heap is an important elementary data structure. Most of the algorithm text-
books introduce heap, especially about binary heap and heap sort.
Some popular implementation, such as C++/STL heap and Python heapq
are based on binary heaps (implicit binary heap by array more precisely). And
the fastest heap sort algorithm is also written with binary heap as proposed by
R. W. Floyd [3] [5].
In this post, we use a general denition of heap, so that varies of under-
ground data structures can be used for implementation. And binary heap is
also extended to a wide concept under this denition.
A heap is a data structure that satises the following heap property.
Top operation always returns the minimum (maximum) element;
Pop operation removes the top element from the heap while the heap
property should be kept, so that the new top element is still the minimum
(maximum) one;
Insert a new element to heap should keep the heap property. That the
new top is still the minimum (maximum) element;
Other operations including merge etc should all keep the heap property.
This is a kind of recursive denition, while it doesnt limit the under ground
data structure.
We call the heap with top returns the minimum element min-heap, while if
top returns the maximum element, we call it max-heap.
In this post, Ill rst give the denition of binary heap. Then Ill review the
traditional imperative way of implicit heap by array. After that, by considering
explicit heap by binary trees, Ill explain the Leftist heap, Skew heap, Splay
heap, and provide pure functional implementation for them based on Okasakis
result[6].
As the last part, the computation complexity will be mentioned, and Ill
show in what situation, Leftist heap performs bad.
Ill introduce some other heaps, including Binomial heaps, Fibonacci heaps,
and Pairing heaps in a separate post.
This article provides example implementation in C++, Haskell, Python, and
Scheme/Lisp languages.
All source code can be downloaded in appendix 8.7, please refer to appendix
for detailed information about build and run.
8.3 Implicit binary heap by array
Considering the heap denition in previous section, one option to implement
heap is by using trees. A straightforward solution is to store the minimum
(maximum) element in the root node of the tree, so for top operation, we
simply return the root as the result. And for pop operation, we can remove
the root and rebuild the tree from the children.
If the tree which is used to implement heap is a binary tree, we can call it
binary heap. There are three types of binary heap implementation explained in
this post. All of them are based on binary tree.
8.3. IMPLICIT BINARY HEAP BY ARRAY 299
8.3.1 Denition
The rst one is implicit binary tree indeed. Consider the problem how to rep-
resent a complete binary tree with array. (For example, try to represent a
complete binary tree in a programming language doesnt support structure or
record data type, so that only array can be used). One solution is to pack all
element from top level (root) down to bottom level (leaves).
Figure 8.1 shows a complete binary tree and its corresponding array repre-
sentation.
16
14 10
8 7
2 4 1
9 3
16 14 10 8 7 9 3 2 4 1
Figure 8.1: Mapping between a complete binary tree and array
This mapping relationship between tree and array can be denote as the
following equations (Note that the array index starts from 1).
1: function PARENT(i)
2: return
i
2

3: function LEFT(i)
4: return 2i
5: function LEFT(i)
6: return 2i + 1
For a given tree node which is represented as the i-th element in an array,
since the tree is complete, we can easily nd its parent node as the i/2-th
element in the array; Its left child with index of 2i and right child with index of
2i +1. If the index of the child exceeds the length of the array, it simply means
this node dont have such child.
This mapping calculation can be performed fast if bit-wise operation is used.
300CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Denition of implicit binary heap by array in C++
In C++ language, array index starts from zero, but not one. The mapping from
array to binary tree should be adjusted accordingly.
template<class T>
T parent(T i){ return ((i+1)>>1)-1; }
template<class T>
T left(T i){ return (i<<1)+1; }
template<class T>
T right(T i){ return (i+1)<<1; }
The type T must support bit-wise operation.
Denition of implicit binary heap by array in Python
Similar as C/C++, the array index in Python starts from 0, so we provide the
mapping functions as below.
def parent(i):
return (i+1)//2-1
def left(i):
return 2i+1
def right(i):
return 2(i+1)
8.3.2 Heapify
The most important thing for heap algorithm is to maintain the heap property,
that the top element should be the minimum (maximum) one.
For the implicit binary heap by array, it means for a given node, which is
represented as the i-th index, we must develop a algorithm to check if all its two
children conform to this property and in case there is violation, we need swap
the parent and child to x the problem.
In Introduction to Algorithms book[2], this algorithm is given in a recur-
sive way, here we show a pure imperative solution. Lets take min-heap for
example.
1: function HEAPIFY(A, i)
2: n LENGTH(A)
3: loop
4: l LEFT(i)
5: r RIGHT(i)
6: smallest i
7: if l < n and A[l] < A[i] then
8: smallest l
9: if r < n and A[r] < A[smallest] then
10: smallest r
11: if smallest = i then
12: exchangeA[i] A[smallest]
8.3. IMPLICIT BINARY HEAP BY ARRAY 301
13: i smallest
14: else
15: return
This algorithm assume that for a given node, the children all conform to the
heap property, however, we are not sure if the value of this node is the smallest
compare to its tow children.
For array A and a given index i, we need check none its left child or right
child is bigger than A[i], in case we nd violation, we pick the smallest one,
and set it as the new value for A[i], the previous value of A[i] is then set as the
new value of the child, and we need go along the child tree to repeat this check
and xing process until we either reach a leaf node or there is no heap property
violation.
Note that the HEAPIFY algorithm takes O(lg N) time.
Heapify in C++
For C++ program, we need it cover both traditional C compatible array, and
the modern container abstraction. There are several options to realize this
requirement.
One method is to pass iterators as argument to heap algorithm. C++/STL
implementation (at the time the author wrote this post) uses this approach.
The advantage of using iterator is that some random access iterator is just
implemented as C pointers, so it compatible with C array well.
However, this method need us change the algorithm from a array index style
to pointer operation style. Its hard to reect the above pseudo code quite clear
in such style. Because of this problem, we wont use this approach here. Reader
can refer to STL source code for detailed information.
Another option is to pass the array or container as well as the number of
elements as arguments. However, we need abstract the comparison anyway so
that the algorithm works for both max-heap and min-heap.
template<class T> struct MinHeap: public std::less<T>{};
template<class T> struct MaxHeap: public std::greater<T>{};
Here we dene MinHeap and MaxHeap as a kind of alias of less-than and
greater-than logic comparison functor template.
The heapify algorithm can be implemented as the following.
template<class Array, class LessOp>
void heapify(Array& a, unsigned int i, unsigned int n, LessOp lt){
while(true){
unsigned int l=left(i);
unsigned int r=right(i);
unsigned int smallest=i;
if(l < n && lt(a[l], a[i]))
smallest = l;
if(r < n && lt(a[r], a[smallest]))
smallest = r;
if(smallest != i){
std::swap(a[i], a[smallest]);
i = smallest;
}
else
302CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
break;
}
}
The program accepts the reference of the array (both reference of the con-
tainer, and reference to pointer are OK), the index from where we want to
adjust so that all children of it conrms to the heap property; the number of
elements in the array, and a comparison functor. It checks from the node which
is indexed as i down to the leaf until it nd a node that both children are less
than the value of the node based on the comparison functor. Otherwise, it will
locate the smallest one and swap it with the node value.
It is also possible to create a concept of range and realize the algorithm with
it. Some C++ library, such as boost has already support range. Here we can
develop a light weight range only for random access container.
template<class RIter> // random access iterator
struct Range{
typedef typename std::iterator_traits<RIter>::value_type value_type;
typedef typename std::iterator_traits<RIter>::difference_type size_t;
typedef typename std::iterator_traits<RIter>::reference reference;
typedef RIter iterator;
Range(RIter left, RIter right):first(left), last(right){}
reference operator[](size_t i){ return (first+i); }
size_t size() const { return last-first; }
RIter first;
RIter last;
};
For a given left index l, and right index r, a range represents [l, r), So it is
easy to construct a range with iterators as well as the pointer of the array and
its length.
Two overloaded auxiliary function templates are provided to create range
easily.
template<class Iter>
Range<Iter> range(Iter left, Iter right){ return Range<Iter>(left, right); }
template<class Iter>
Range<Iter> range(Iter left, typename Range<Iter>::size_t n){
return Range<Iter>(left, left+n);
}
The above algorithm can be implemented with range like below.
template<class R, class LessOp>
void heapify(R a, typename R::size_t i, LessOp lt){
typename R::size_t l, r, smallest;
while(true){
l = left(i);
r = right(i);
smallest = i;
if( l < a.size() && lt(a[l], a[i]))
smallest = l;
8.3. IMPLICIT BINARY HEAP BY ARRAY 303
if( r < a.size() && lt(a[r], a[smallest]))
smallest = r;
if( smallest != i){
std::swap(a[i], a[smallest]);
i = smallest;
}
else
break;
}
}
Almost everything is as same as the former one except that the number of
elements can be given by the size of the range.
In order to verify the program, a same test case as in gure 6.2 in [2] is fed
to our function.
// test c-array
const int a[] = {16, 4, 10, 14, 7, 9, 3, 2, 8, 1};
const unsigned int n = sizeof(a)/sizeof(a[0]);
int x[n];
std::copy(a, a+n, x);
heapify(x, 1, n, MaxHeap<int>());
print_range(x, x+n);
// test random access container
std::vector<short> y(a, a+n);
heapify(y, 1, n, MaxHeap<short>());
print_range(y.begin(), y.end());
The same test case can also be applied to the range version of program.
heapify(range(x, n), 1, MaxHeap<int>());
//...
heapify(range(y.begin(), y.end()), 1, MaxHeap<short>());
Where print range is a helper function to output all elements in a con-
tainer, a C array or a range.
template<class Iter>
void print_range(Iter first, Iter last){
for(; first!=last; ++first)
std::cout<<first<<",";
std::cout<<"n";
}
template<class R>
void print_range(R a){
print_range(a.first, a.last);
}
The above test code can output a result as below:
16, 14, 10, 8, 7, 9, 3, 2, 4, 1,
16, 14, 10, 8, 7, 9, 3, 2, 4, 1,
Figure 8.2 shows how this algorithm works.
304CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
16
4 10
14 7
2 8 1
9 3
a. Step 1, 14 is the biggest element among 4, 14, and 7. Swap 4 with the left
child;
16
14 10
4 7
2 8 1
9 3
b. Step 2, 8 is the biggest element among 2, 4, and 8. Swap 4 with the right
child;
16
14 10
8 7
2 4 1
9 3
c. 4 is the leaf node. It hasnt any children. Process terminates.
Figure 8.2: Heapify example, a max-heap case.
8.3. IMPLICIT BINARY HEAP BY ARRAY 305
Heapify in Python
In order to cover both min-heap and max-heap, we abstract the comparison as
the following lambda expressions.
MIN_HEAP = lambda a, b: a < b
MAX_HEAP = lambda a, b: a > b
By passing the above dened comparison operation as an argument, the
Heapify algorithm is given as below.
def heapify(x, i, less_p = MIN_HEAP):
n = len(x)
while True:
l = left(i)
r = right(i)
smallest = i
if l < n and less_p(x[l], x[i]):
smallest = l
if r < n and less_p(x[r], x[smallest]):
smallest = r
if smallest != i:
(x[i], x[smallest])=(x[smallest], x[i])
i = smallest
else:
break
We can use the same test case as presents in Figure 6.2 of [2].
l = [16, 4, 10, 14, 7, 9, 3, 2, 8, 1]
heapify(l, 1, MAX_HEAP)
print l
The result is something like this.
[16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
This result is as same as the one presented in 8.2.
8.3.3 Build a heap
With heapify algorithm, it is easy to build a heap from an arbitrary array.
Observe that the number of nodes in a complete binary tree for each level is a
list like:
1, 2, 4, 8, ..., 2
i
, ....
The only exception is the last level. Since the tree may not full (note that
complete binary tree doesnt mean full binary tree), the last level contains at
most 2
p1
nodes, where 2
p
n and n is the length of the array.
Heapify algorithm doesnt take any eect on leave node, which means we
can skip applying heapify for all nodes. In other words, all leaf nodes have
already satised heap property. We only need start checking and maintaining
heap property from the last branch node. the Index of the last branch node is
no greater than n/2rfloor.
Based on this fact, we can build a heap with the following algorithm. (As-
sume the heap is min-heap).
306CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
1: function BUILD-HEAP(A)
2: n LENGTH(A)
3: for i n/2 downto 1 do
4: HEAPIFY (A, i)
Although the complexity of HEAPIFY is O(lg N), the running time of
BUILD HEAP doesnt bound to O(N lg N) but to O(N), so this is a linear
time algorithm. Please refer to [2] for the detailed proof.
Build a heap in C++
The only adjustment in C++ program from the above algorithm is about the
starting index from 1 to 0.
template<class Array, class LessOp>
void build_heap(Array& a, unsigned int n, LessOp lt){
unsigned int i = (n-1)>>1;
do {
heapify(a, i, n, lt);
} while (i--);
}
Note that since the unsigned type is used to represent index, It cant lower
than zero. We cant just use a for loop as below.
for(unsigned int i = (n-1)>>1; i0; --i) //wrong, i always 0
This program can be easily adjusted with range concept.
template<class RangeType, class LessOp>
void build_heap(RangeType a, LessOp lt){
typename RangeType::size_t i = (a.size()-1)>>1;
do {
heapify(a, i, lt);
} while (i--);
}
We can test our program with the same data as in Figure 6.3 in [2].
// test c-array
const int a[] = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7};
const unsigned int n = sizeof(a)/sizeof(a[0]);
int x[n];
std::copy(a, a+n, x);
build_heap(range(x, n), MaxHeap<int>());
print_range(x, x+n);
// test random access container
std::vector<int> y(a, a+n);
build_heap(range(y.begin(), y.end()), MaxHeap<short>());
print_range(y.begin(), y.end());
Running results are printed in console like the following.
16, 14, 10, 8, 7, 9, 3, 2, 4, 1,
16, 14, 10, 8, 7, 9, 3, 2, 4, 1,
8.3. IMPLICIT BINARY HEAP BY ARRAY 307
Figure 8.3 and 8.4 show the steps when build a heap from an arbitrary
array. The node in black color is the one we will apply HEAPIFY algorithm,
the nodes in gray color are swapped during HEAPIFY .
4 1 3 2 16 9 10 14 8 7
a. An array in arbitrary order before build-heap process;
4
1 3
2 16
14 8 7
9 10
b. Step 1, The array is mapped to binary tree. The rst branch node, which is
16 is to be examined;
4
1 3
2 16
14 8 7
9 10
c. Step 2, 16 is the largest element in current sub tree, next is to check node
with value 2;
Figure 8.3: Build a heap from an arbitrary array. Gray nodes are changed in
each step, black node is the one to be processed next step.
Build a heap in Python
Like the C++ heap building program, we check from the last non-leaf node and
apply HEAPIFY algorithm, and repeat the process back to the root node.
However, we use explicit calculation (divided by 2) instead of using bit-wise
shifting.
def build_heap(x, less_p = MIN_HEAP):
n = len(x)
for i in reversed(range(n//2)):
heapify(x, i, less_p)
We can feed the similar test case to this program as below:
l = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7]
build_heap(l, MAX_HEAP)
print l
It will output the same result as the C++ program.
308CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
4
1 3
14 16
2 8 7
9 10
d. Step 3, 14 is the largest value in the sub-tree, swap 14 and 2; next is to
check node with value 3;
4
1 10
14 16
2 8 7
9 3
e. Step 4, 10 is the largest value in the sub-tree, swap 10 and 3; next is to
check node with value 1;
4
16 10
14 7
2 8 1
9 3
f. Step 5, 16 is the largest value in current node, swap 16 and 1 rst; then
similarly, swap 1 and 7; next is to check the root node with value 4;
16
14 10
8 7
2 4 1
9 3
g. Step 6, Swap 4 and 16, then swap 4 and 14, and then swap 4 and 8; And
the whole build process nish.
Figure 8.4: Build a heap from an arbitrary array. Gray nodes are changed in
each step, black node is the one to be processed next step.
8.3. IMPLICIT BINARY HEAP BY ARRAY 309
[16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
8.3.4 Basic heap operations
From the generic denition of heap (not necessarily binary heap), Its essential
to provides basic operations so that user can access the data and modify it.
The most important operations included accessing the top element (nd the
minimum or maximum element), pop one element (the minimum one or the
maximum one depends on the type of the heap) from the heap, nd the top N
elements, decrease a key (note this is for min-heap, and it will be increase a key
for max-heap), and insertion.
For binary tree, most of this operation is bound to O(lg N) worst-case, some
of them, such as top is O(1) time.
Access the top element (minimum)
According to the denition of heap, there must be an operation to return the
top element. for implicit binary tree by array, it is the root node which stores
the minimum (maximum) value.
1: function TOP(A)
2: return A[0]
This operation is trivial. It takes O(1) time.
Access the top element in C++
By translate the above algorithm directly in C++, we get the following program.
template<class T>
typename ValueType<T>::Result heap_top(T a){ return a[0]; }
There is a small trick to get the type of the element store in array no matter
if the array is STL container or plain C like array.
template<class T> struct ValueType{
typedef typename T::value_type Result;
};
template<class T> struct ValueType<T>{
typedef T Result; // c-pointer type
};
template<class T, unsigned int n> struct ValueType<T[n]>{
typedef T Result; // c-array type
};
Not that the C++ template meta programming support to specialize for a
certain type.
Here we skip the error handling of empty heap case. If the heap is empty,
one option is just to raise exception.
Access the top element in Python
The python version of this program is also simple, we omit the error handling
for empty heap as well.
310CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
def heap_top(x):
return x[0] #ignore empty case
Heap Pop (delete minimum)
Dierent from the top operation, pop operation is a bit complex, because the
heap property has to be maintained after the top element is removed.
The solution is to apply HEAPIFY algorithm immediately to the next node
to the root node which has been removed.
A quick but slow algorithm based on this idea may look like the following.
1: function POP-SLOW(A)
2: x TOP(A)
3: REMOV E(A, 1)
4: if A is not empty then
5: HEAPIFY (A, 1)
6: return x
This algorithm rst remember the top element in x, then it removes the rst
element from the array, the size of this array reduced by one. After that if the
array isnt empty, HEAPIFY will applied to the modied array on the rst
element (previous the second element).
Removing an element from array takes O(N) time, where N is the length of
the array. Removing the rst element need shift all the rest values one by one.
Because of this bottle neck, it slows the whole algorithm to O(N).
In order to solve this program, one alternative way is to just swap the rst
element and the last one in the array, then shrink the array size by one.
1: function POP(A)
2: x TOP(A)
3: SWAP(A[1], A[HEAP SIZE(A)])
4: REMOV E(A, HEAP SIZE(A))
5: if A is not empty then
6: HEAPIFY (A, 1)
7: return x
Note that remove the last element from the array takes only O(1) time, and
HEAPIFY is bound to O(lg N). The whole algorithm is bound to O(lg N)
time.
Pop in C++
In C++ program, we abstract the min-heap and max-heap as heap type tem-
plate parameter, and pass it explicitly.
First is the reference + size approach.
template<class T, class LessOp>
typename ValueType<T>::Result heap_pop(T& a, unsigned int& n, LessOp lt){
typename ValueType<T>::Result top = heap_top(a);
a[0] = a[n-1];
heapify(a, 0, --n, lt);
return top;
}
And it can be adapted to range abstraction as well.
8.3. IMPLICIT BINARY HEAP BY ARRAY 311
template<class R, class LessOp>
typename R::value_type heap_pop(R& a, LessOp lt){
typename R::value_type top = heap_top(a);
std::swap(a[0], a[a.size()-1]);
--a.last;
heapify_(a, 0, lt);
return top;
}
Pop in Python
Python provides pop() function to get rid of the last element, so the program
can be developed as below.
def heap_pop(x, less_p = MIN_HEAP):
top = heap_top(x)
x[0] = x[-1] # this is faster than top = x.pop(0)
x.pop()
if x!=[]:
heapify(x, 0, less_p)
return top
Find the rst K biggest (smallest) element
With pop operation, it is easy to implement algorithm to nd the top K el-
ements. In order to nd the biggest K values form an array, we can build a
max-heap, then perform pop operation K times.
1: function TOP-K(A, k)
2: BUILD HEAP(A)
3: for i 1, MIN(k, LENGTH(A)) do
4: APPEND(Result, POP(A))
5: return Result
Note that if K is bigger than the length of the array, it means we need return
the whole array as the result. Thats why it need use the MIN function in the
algorithm.
Find the rst K biggest (smallest) element in C++
In C++ program, we can pass the iterator for output, so the reference - size
version looks like below.
template<class Iter, class Array, class LessOp>
void heap_top_k(Iter res, unsigned int k,
Array& a, unsigned int& n, LessOp lt){
build_heap(a, n, lt);
unsigned int count = std::min(k, n);
for(unsigned int i=0; i<count; ++i)
res++=heap_pop(a, n, lt);
}
When we adapt to range concept, it is possible to manipulate the data in
place, so that we can put the top K element in rst K positions in the array.
312CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
template<class R, class LessOp>
void heap_top_k(R a, typename R::size_t k, LessOp lt){
typename R::size_t count = std::min(k, a.size());
build_heap(a, lt);
while(count--){
++a.first;
heapify(a, 0, lt);
}
}
The algorithm doesnt utilize pop function, instead, after the heap is built,
the rst element is the top one, it adjusts the range one position next, then
apply heapify to the new range. This process is repeated for K times so the
rst K elements are the result.
A simple test cases can be fed to the program for verication.
const int a[] = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7};
unsigned int n = sizeof(a)/sizeof(a[0]);
std::vector<int> x(a, a+n);
heap_top_k(range(x.begin(), x.end()), 3, MaxHeap<int>());
print_range(range(x.begin(), 3));
The result is printed in console like below.
16, 14, 10,
Find the rst K biggest (smallest) element in Python
In Python we can put pop function to list comprehension to get the top K
elements like the following.
def top_k(x, k, less_p = MIN_HEAP):
build_heap(x, less_p)
return [heap_pop(x, less_p) for i in range(min(k, len(x)))]
The testing and result are shown as the following.
l = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7]
res = top_k(l, 3, MAX_HEAP)
print res
Evaluate the code led to below line.
[16, 14, 10]
Modication: Decrease key
Heap can be used to implement priority queue, because of this, it is important to
modify the key stored in heap. One typical operation is to increase the priority
of a tasks so that it can be performed earlier.
Here we present the decrease key operation for a min-heap. The correspond-
ing operation is increase key for max-heap.
Once we modied a key by decreasing it in a min-heap, it can make the
node conict with the heap property, that the key may be less than some values
in its ancestors. In order to maintain the invariant, an auxiliary algorithm is
provided to x the heap property.
8.3. IMPLICIT BINARY HEAP BY ARRAY 313
1: function HEAP-FIX(A, i)
2: while i > 1 and A[i] < A[PARENT[i]] do
3: Exchange A[i] A[PARENT[i]]
4: i PARENT[i]
This algorithm repeatedly examine the key of parent node and the key in
current node. It will swap nodes in case the parent contains the smaller key.
This process is performed from current node towards the root node till it nd
that the parent node holds the smaller key.
With this auxiliary algorithm, decrease key can be realized easily.
1: function DECREASE-KEY(A, i, k)
2: if k < A[i] then
3: A[i] k
4: HEAP FIX(A, i)
Note that the algorithm only takes eect when the new key is less than the
original key.
Decrease key in C++
In order to support both min-heap and max-heap, the comparison function
object is passed as an argument in C++ implementation.
template<class Array, class LessOp>
void heap_fix(Array& a, unsigned int i, LessOp lt){
while(i>0 && lt(a[i], a[parent(i)])){
std::swap(a[i], a[parent(i)]);
i = parent(i);
}
}
template<class Array, class LessOp>
void heap_decrease_key(Array& a,
unsigned int i,
typename ValueType<Array>::Result key,
LessOp lt){
if(lt(key, a[i])){
a[i] = key;
heap_fix(a, i, lt);
}
}
Some very simple verication case can be fed to the program. Here we use
the example presented in [2] Figure 6.5.
const int a[] = {16, 14, 10, 8, 7, 9, 3, 2, 4, 1};
const unsigned int n = sizeof(a)/sizeof(a[0]);
int x[n];
std::copy(a, a+n, x);
heap_decrease_key(x, 8, 15, MaxHeap<int>());
print_range(x, x+n);
Run the above lines will generate the following output.
16, 15, 10, 14, 7, 9, 3, 2, 8, 1,
314CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
In this max-heap example, we try to increase the key of the 9-th node from
4 to 15. As shown in gure 8.5
Decrease key in Python
The Python version decrease key program is similar as well. It rst checks if the
new key is less than the original one, if yes it modies the value, and performs
the xing process.
def heap_decrease_key(x, i, key, less_p = MIN_HEAP):
if less_p(key, x[i]):
x[i] = key
heap_fix(x, i, less_p)
def heap_fix(x, i, less_p = MIN_HEAP):
while i>0 and less_p(x[i],x[parent(i)]):
(x[parent(i)], x[i]) = (x[i], x[parent(i)])
i = parent(i)
We can use the same test case to verify the program.
l = [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
heap_decrease_key(l, 8, 15, MAX_HEAP)
print l
It will output the same result as the C++ program.
[16, 15, 10, 14, 7, 9, 3, 2, 8, 1]
Insertion
In [2], insertion is implemented by using DECREASE KEY . The approach
is to rst insert a node with innity key. According to the min-heap property,
the node should be the last element in the under ground array. After that, the
key is decreased to the value to be inserted, so that we can call decrease-key to
nish the process.
Instead of reuse DECREASE KEY , we can reuse HEAP FIX to
implement insertion. The new key is directly appended at the end of the array,
and the HEAP FIX is applied on this new node.
1: function HEAP-PUSH(A, k)
2: APPEND(A, k)
3: HEAP FIX(A, SIZE(A))
Insertion by decreasing key method in C++
In C++, traditional C array is static sized, so appending a new element to the
array has to be managed properly. In order to simplify the problem, we assume
the necessary has already been allocated (by client program).
First is the reference + size version of program.
template<class Array, class LessOp>
void heap_push(Array& a,
unsigned int& n,
typename ValueType<Array>::Result key,
8.3. IMPLICIT BINARY HEAP BY ARRAY 315
16
14 10
8 7
2 4 1
9 3
a. The 9-th node with key 4 will be modied;
16
14 10
8 7
2 15 1
9 3
b. The key is modied to 15, which is greater than its parent;
16
14 10
15 7
2 8 1
9 3
c. According the max-heap property, 8 and 15 are swapped.
16
15 10
14 7
2 8 1
9 3
d. Since 15 is greater than 14, which is the key of its parent node, 15 and 14
are swapped. Because 15 is less than 16, the algorithm terminates.
Figure 8.5: Example process when increase a key in a max-heap.
316CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
LessOp lt){
a[n] = key;
heap_fix(a, n, lt);
++n;
}
Note that the size is explicitly increased. so after calling this function, the
count of the element is increased by one.
It is also possible to provide equivalent program by using range.
template<class R, class LessOp>
void heap_push(R a, typename R::value_type key, LessOp lt){
a.last++ = key;
heap_fix(a, a.size()-1, lt);
}
We can test the insert program with a very simple case.
const int a[] = {16, 14, 10, 8, 7, 9, 3, 2, 4, 1};
unsigned int n = sizeof(a)/sizeof(a[0]);
std::vector<int> x(a, a+n);
x.push_back(0);
heap_push(range(x.begin(), n), 17, MaxHeap<int>());
print_range(x.begin(), x.end());
Note that, in client program (this test program), we reserved the memory
in advance, or it will cause access violation problem. Running these lines will
output the following result.
17, 16, 10, 8, 14, 9, 3, 2, 4, 1, 7,
We can found the new element 17 is inserted at the proper position of the
heap.
Insertion directly in Python
In python program, append a new element to a list is build-in supported. So
client program doesnt need to take care of the similar problem as described in
C++ implementation.
def heap_insert(x, key, less_p = MIN_HEAP):
i = len(x)
x.append(key)
heap_fix(x, i, less_p)
If the same test case is fed to the above function, we can get the output like
the following.
l = [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
heap_insert(l, 17, MAX_HEAP)
print l
[17, 16, 10, 8, 14, 9, 3, 2, 4, 1, 7
8.3. IMPLICIT BINARY HEAP BY ARRAY 317
8.3.5 Heap sort
Heap sort algorithm is an interesting application of heap. According to the heap
property, the min(max) element can be easily accessed by from the top of the
heap. So a straightforward way to sort an arbitrary of values is to rst build
a heap from them, then continuously pops the smallest element from the heap
till the heap is empty.
The algorithm based on this strategy is something like below.
1: function HEAP-SORT(A)
2: R NIL
3: BUILD HEAP(A)
4: while A = NIL do
5: APPEND(R, HEAP POP(A))
6: return R
Robert. W. Floyd found a very fast implementation of heap sort. The idea
is to build a max-heap instead of min-heap, so the rst element is the biggest
one. Then this biggest element is swapped with the last element in the array, so
that it is in the right position after sorting. Now the last element becomes the
top of the heap, it may violate the heap property. We can perform HEAPIFY
on it with the heap size shrink by one. This process is repeated till there is only
one element left in the heap.
1: function HEAP-SORT-FAST(A)
2: BUILD MAX HEAP(A)
3: while SIZE(A) > 1 do
4: Exchange A[1] A[SIZE(A)]
5: SIZE(A) SIZE(A) 1
6: HEAPIFY (A, 1)
Note that this algorithm is in-place algorithm. Its the fastest heap sort
algorithm by far.
In terms of complexity, BUILDHEAP is bound to O(N). Since HEAPIFY
is O(lg N), and it is called O(N) times, so both above algorithms take O(N lg N)
time to run.
Floyds heap sort algorithm in C++
Only Floyds algorithm is given in C++ in this post.
template<class Array, class GreaterOp>
void heap_sort(Array& a, unsigned int n, GreaterOp gt){
for(build_heap(a, n, gt); n>1; --n){
std::swap(a[0], a[n-1]);
heapify(a, 0, n-1, gt);
}
}
A very simple test case is used for verication.
const int a[] = {16, 14, 10, 8, 7, 9, 3, 2, 4, 1};
std::vector<int> y(a, a+n);
heap_sort(range(y.begin(), y.end()), MaxHeap<int>());
print_range(y.begin(), y.end());
318CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
The result is output as we expect.
1, 2, 3, 4, 7, 8, 9, 10, 14, 16,
General heap sort algorithm in Python
Instead of Floyd method, well show the straightforward popping N times
algorithm in Python. Please refer to [5] for Floyd algorithm in Python.
def heap_sort(x, less_p = MIN_HEAP):
res = []
build_heap(x, less_p)
while x!=[]:
res.append(heap_pop(x, less_p))
return res
And the test case is as same as the one we used in C++ program.
l = [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
res = heap_sort(l)
print res
The result is output as the following.
[1, 2, 3, 4, 7, 8, 9, 10, 14, 16]
8.4 Leftist heap and Skew heap, explicit binary
heaps
Instead of using implicit binary tree by array, it is natural to consider why we
cant use explicit binary tree to realize heap?
There are some problems must be solved if we turn into explicit binary tree
as the under ground data strucutre for heap.
The rst problem is about the HEAP POP or DELETE MIN oper-
ation. If the explicit binary tree is represent as the form of (left value right),
which is shown in gure 8.6
If k is the top element, all values in left and right children are less than k.
After k is popped, only left and right children are left. They have to be merged
to a new tree. Since heap property should be maintained after merge, so the
new root element is the smallest one.
Since both left child and right child are heaps in binary tree, the two trivial
cases can be found immediately.
1: function MERGE(L, R)
2: if L = NIL then
3: return R
4: else if R = NIL then
5: return L
6: else
7: ...
If neither left child nor right child is empty tree, because they all t heap
property, the top element of them are all minimum value respectively. One
solution is to compare the root value of the left and right children, select the
8.4. LEFTIST HEAP AND SKEW HEAP, EXPLICIT BINARY HEAPS 319
k
L R
Figure 8.6: A binary tree, all values in left child and right child are smaller than
k.
smaller one as the new root of the merged heap, and recursively merget the
other child to one of the children of the smaller one. For instance if L = (AxB)
and R = (A

yB

), where A, A

, B, B

are all sub trees, and x < y. There are


two candidate results according to this strategy.
(MERGE(A, R)xB)
(AxMERGE(B, R))
Both are correct result. One simplied solution is only merge on right sub
tree. Leftist tree provides a systematically approach based on this idea.
8.4.1 Denition
The heap implemented by Leftist tree is called Leftist heap. Leftist tree is rst
introduced by C. A. Crane in 1972[6].
Rank (S-value)
In Leftist tree, a rank value (or S value) is dened to each node. Rank is the
distance to the nearest external node. Where external node is a NIL concept
extended from leaf node.
For example, in gure 8.7, the rank of NIL is dened 0, consider the root
node 4, The nearest leaf node is the child of node 8. So the rank of root node
4 is 2. Because node 6 and node 8 are all only contain NIL, so the rank values
are 1. Although node 5 has non-NIL left child, However, since the right child is
NIL, so the rank value, which is the minimum distance to leaf is still 1.
320CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
4
5 8
6 NIL
NIL NIL
NIL NIL
Figure 8.7: rank(4) = 2, rank(6) = rank(8) = rank(5) = 1.
Leftist property
With rank dened, we can create a strategy when merging.
Every time when merging, we always merge to right child; Denote the
rank of the new right sub tree as r
r
;
Compare the ranks of the left and right children, if the rank of left sub
tree is r
l
and r
l
< r
r
, we swap the left and right children.
We call this Leftist property. Generally speaking, a Leftist tree always has
the shortest path to an external node on the right.
A Leftist tree tends to be very unbalanced, However, it ensures an important
property as specied in the following theorem.
If a Leftist tree T contains N internal nodes, the path from root to the
rightmost external node contains at most log (N + 1) nodes.
For the proof of these theorem, please refer to [7] and [1].
With this theorem, algorithms operate along this path are all bound to
O(lg N).
Denition of Leftist heap in Haskell
In Haskell the denition of Leftist tree is almost as same as the binary search
tree except there is a rank eld added.
data LHeap a = E -- Empty
| Node Int a (LHeap a) (LHeap a) -- rank, element, left, right
deriving (Eq, Show)
In order to access the rank eld, a helper function is provided.
rank::LHeap a Int
rank E = 0
rank (Node r _ _ _) = r
8.4. LEFTIST HEAP AND SKEW HEAP, EXPLICIT BINARY HEAPS 321
Denition of Leftist heap in Scheme/Lisp
In Scheme/Lisp, list is used to represent the Leftist tree. In order to output the
tree easily in in-order format, a node is arranged as (left rank element right).
Some auxiliary functions are dened to access these elds of a node.
(define (left t)
(if (null? t) () (car t)))
(define (rank t)
(if (null? t) 0 (cadr t)))
(define (elem t)
(if (null? t) () (caddr t)))
(define (right t)
(if (null? t) () (cadddr t)))
And a construction function is provided so that a node can be built explicitly.
(define (make-tree l s x r) ;; l: left, s: rank, x: elem, r: right
(list l s x r))
8.4.2 Merge
In order to realize merge, an auxiliary algorithm is given to compare the ranks
and do swapping if necessary.
1: function LEFTIFY(T)
2: l LEFT(T), r RIGHT(T)
3: k KEY (T)
4: if RANK(l) < RANK(r) then
5: RANK(T) RANK(L) + 1
6: else
7: RANK(T) RANK(R) + 1
8: Exchange l r
The algorithm compares the rank of the left and right sub trees, pick the
less one and add it by one as the rank of the modied node. If the rank of left
side is greater, it will also swap the left and right children.
The reason why rank need to be increased by one is because there is a new
key added on top of the tree, which causes the rank increase.
With LEFTIFY dened, merge algorithm can be provided as the following.
1: function MERGE(L, R)
2: if L = NIL then
3: return R
4: else if R = NIL then
5: return L
6: else
7: T CREATE NEW NODE()
8: if KEY (L) < KEY (R) then
9: KEY (T) KEY (L)
10: LEFT(T) LEFT(L)
11: RIGHT(T) MERGE(RIGHT(L), R)
322CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
12: else
13: KEY (T) KEY (R)
14: LEFT(T) LEFT(R)
15: RIGHT(T) MERGE(L, RIGHT(R))
16: LEFTIFY (T)
17: return T
Note that MERGE algorithm always operate on right side, and call LEFITFY
to ensure the Leftist property, so that this algorithm is bound to O(lg N).
Merge in Haskell
Translate the algorithm to Haskell lead to the following program. Here we
modied the pseudo code to a pure functional style.
merge::(Ord a)LHeap a LHeap a LHeap a
merge E h = h
merge h E = h
merge h1@(Node _ x l r) h2@(Node _ y l r) =
if x < y then makeNode x l (merge r h2)
else makeNode y l (merge h1 r)
makeNode::a LHeap a LHeap a LHeap a
makeNode x a b = if rank a < rank b then Node (rank a + 1) x b a
else Node (rank b + 1) x a b
Merge in Scheme/Lisp
In Scheme/Lisp, the LEFTIFY algorithm can be dened as an inner function
inside MERGE.
(define (merge t1 t2)
(define (make-node x a b)
(if (< (rank a) (rank b))
(make-tree b (+ (rank a) 1) x a)
(make-tree a (+ (rank b) 1) x b)))
(cond ((null? t1) t2)
((null? t2) t1)
((< (elem t1) (elem t2)) (make-node (elem t1) (left t1) (merge (right t1) t2)))
(else (make-node (elem t2) (left t2) (merge t1 (right t2))))))
Merge operation in implicit binary heap by array
In most cases implicit binary heap by array performs very fast, and it ts modern
computer with cache technology well. However, merge is the algorithm bounds
to O(N) time. The best you can do is concatenate two arrays together and
make a heap of the result [13].
1: function MERGE-HEAP(A, B) C CONCAT(A, B) BUILDHEAP(C)
We omit the implementation of this algorithm in C++ and Python because
they are trivial.
8.4. LEFTIST HEAP AND SKEW HEAP, EXPLICIT BINARY HEAPS 323
8.4.3 Basic heap operations
Most of the basic heap operations can be implemented easily with MERGE
algorithm dene above.
Find minimum (top) and delete minimum (pop)
Since we keep the smallest element in root node, nding the minimum value
(top element) is trivial. Its a O(1) operation.
1: function TOP(T)
2: return KEY (T)
While if the top element popped, left and right children are merged so the
heap updated.
1: function POP(T)
2: return MERGE(LEFT(T), RIGHT(T))
Note that pop operation on Leftist heap takes O(lg N) time.
Find minimum (top) and delete minimum in Haskell
We skip the error handling of operation on an empty Leftist heap.
findMin :: LHeap a a
findMin (Node _ x _ _) = x
deleteMin :: (Ord a) LHeap a LHeap a
deleteMin (Node _ _ l r) = merge l r
Find minimum (top) and delete minimum in Scheme/Lisp
With merge function dened, these operations are trivial to implement in Scheme/Lisp.
(define (find-min t)
(elem t))
(define (delete-min t)
(merge (left t) (right t)))
Insertion
To insert a new key to the heap, one solution is to create a single leaf node from
the key, and perform merge with this leaf node and the Leftist tree.
1: function INSERT(T, k)
2: x CREATE NEW NODE()
3: KEY (x) k
4: RAKN(x) 1
5: LEFT(x), RIGHT(x) NIL
6: return MERGE(x, T)
Since insert still call merge inside, the algorithm is bound to O(lg N) time.
324CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Insertion in Haskell
Translating the above algorithm to Haskell is trivial.
insert::(Ord a) LHeap a a LHeap a
insert h x = merge (Node 1 x E E) h
In order to provide a convenient way to build a Leftist heap from a list, an
auxiliary function is given as the following.
fromList :: (Ord a) [a] LHeap a
fromList = foldl insert E
This function can be used like this.
fromList [9, 4, 16, 7, 10, 2, 14, 3, 8, 1]
It will create a Leftist heap as below.
Node 1 1 (Node 3 2 (Node 2 4 (Node 2 7 (Node 1 16 E E)
(Node 1 10 E E)) (Node 1 9 E E)) (Node 2 3 (Node 1 14 E E)
(Node 1 8 E E))) E
Figure 8.8 shows the result respectively.
1
2
4 3
7 9
16 10
14 8
Figure 8.8: A Leftist tree.
Insertion in Scheme/Lisp
Based on the algorithm, once inserting a new element, a leaf node is created
and merge to the heap.
(define (insert t x)
(merge (make-tree () 1 x ()) t))
8.4. LEFTIST HEAP AND SKEW HEAP, EXPLICIT BINARY HEAPS 325
This function can be veried by continuously insert all elements from a list
so that a Leftist heap can be built as a result.
(define (from-list lst)
(fold-left insert () lst))
One example test case which is equivalent to the Haskell program is shown
like the following.
(define (test-from-list)
(from-list (16 14 10 8 7 9 3 2 4 1)))
Evaluate this function yields a Leftist tree, which can be output in in-order
as below.
((((((((() 1 16 ()) 1 14 ()) 1 10 ()) 1 8 ()) 2 7 (() 1 9 ())) 1 3
()) 2 2 (() 1 4 ())) 1 1 ())
8.4.4 Heap sort by Leftist Heap
With all the basic operations dened, its straightforward to implement heap
sort in a N-popping way. Once we need sort a list of elements, we rst build a
Leftist heap from the list. Then repeatedly pop the minimum element from the
heap until heap is empty.
First is the algorithm to build the Leftist heap by insert all elements to an
empty heap.
1: function BUILD-HEAP(A)
2: H NIL
3: for each x in A do
4: T INSERT(T, x)
5: return T
And the heap sort algorithm is as same as the generic one presented in 8.3.5.
Note this algorithm is bound to O(N lg N) time.
Heap sort in Haskell
In Haskell program, since we have already dened the fromList auxiliary func-
tion to build Leftist heap from a list, the heap sort algorithm can utilize it.
heapSort :: (Ord a) [a] [a]
heapSort = hsort fromList where
hsort E = []
hsort h = (findMin h):(hsort $ deleteMin h)
Here is an example case which is used in previous C++ and Python pro-
grams.
heapSort [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
It will output the same result as the following.
[1,2,3,4,7,8,9,10,14,16]
326CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Heap sort in Scheme/Lisp
In Scheme/Lisp program, we can rst use from-list function to turn a list of
element into a Leftist heap, then repeatedly pop the smallest one to a result
list.
(define (heap-sort lst)
(define (hsort t)
(if (null? t) () (cons (find-min t) (hsort (delete-min t)))))
(hsort (from-list lst)))
Here is a simple test case.
heap-sort (16 14 10 8 7 9 3 2 4 1))
Evaluate the case will out put an ordered list.
(1 2 3 4 7 8 9 10 14 16)
8.4.5 Skew heaps
The problem with Leftist heap is that, it performs bad in some cases. For
example, if we examine the Leftist heap behind the above heap sort test case,
its a very unbalanced binary tree as shown in gure 8.9
1
.
1
2
3 4
7
8 9
10
14
16
Figure 8.9: A very unbalanced Leftist tree build from list [16, 14, 10, 8, 7, 9, 3,
2, 4, 1].
The binary tree is almost turned to be a linked-list. The worst case is when
feed a ordered list for building Leftist tree, since the tree changed to linked-list,
the time bound degrade from O(lg N) to O(N).
Skew heap (or self-adjusting heap) is one step ahead simplied Leftist heap
[9] [10].
1
run Haskell Leftist tree function: fromList [16, 14, 10, 8, 7, 9, 3, 2, 4, 1] will generates the
result: Node 1 1 (Node 2 2 (Node 1 3 (Node 2 7 (Node 1 8 (Node 1 10 (Node 1 14 (Node 1
16 E E) E) E) E) (Node 1 9 E E)) E) (Node 1 4 E E)) E
8.4. LEFTIST HEAP AND SKEW HEAP, EXPLICIT BINARY HEAPS 327
Remind the Leftist heap, we swap the left and right children during merge
when the rank on left side is less than right side. This comparison strategy
doesnt work when one of the sub tree has only one child. Because in such case,
the rank of the sub tree is always 1 no matter how big it is. A Brute-force
approach is to swap the left and right children every time when merge. This
idea leads to Skew heap.
Denition of Skew heap
A Skew heap is a heap implemented with Skew tree. A Skew tree is a special
binary tree. The minimum element is stored in root node. Every sub tree is
also a skew tree.
Based on above discussion, there is no use to keep the rank (or S-value) eld,
so the Skew heap denition is as same as the binary tree from the programming
language point of view.
Denition of Skew heap in Haskell
After removing rank from Leftist heap denition, we can get the Skew heap one.
data SHeap a = E -- Empty
| Node a (SHeap a) (SHeap a) -- element, left, right
deriving (Eq, Show)
Denition of Skew heap in Scheme/Lisp
Since Skew heap is just a special kind of binary tree, the denition is as same
as binary trees. In Scheme/Lisp, the inner data structure is list. We organize
it in in-order for easy output purpose.
Some auxiliary access functions and a simple constructor is provided.
(define (left t)
(if (null? t) () (car t)))
(define (elem t)
(if (null? t) () (cadr t)))
(define (right t)
(if (null? t) () (caddr t)))
;; constructor
(define (make-tree l x r) ;; l: left, x: element, r: right
(list l x r))
Merge
The merge algorithm tends to be very simple. When we merge two Skew trees,
we compare the root element of each tree, and pick the smaller one as the new
root, we then merge the other tree contains bigger element onto the right sub
tree and swap the left and right children.
1: function MERGE(L, R)
2: if L = NIL then
328CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
3: return R
4: else if R = NIL then
5: return L
6: else
7: T CREATE EMPTY NODE()
8: if KEY (L) < KEY (R) then
9: KEY (T) KEY (L)
10: LEFT(T) MERGE(R, RIGHT(L))
11: RIGHT(T) LEFT(L)
12: else
13: KEY (T) KEY (R)
14: LEFT(T) MERGE(L, RIGHT(R))
15: RIGHT(T) LEFT(R)
16: return T
Skew heap in Haskell
Translating the above algorithm into Haskell gets a simple merge program.
merge::(Ord a)SHeap a SHeap a SHeap a
merge E h = h
merge h E = h
merge h1@(Node x l r) h2@(Node y l r) =
if x < y then Node x (merge r h2) l
else Node y (merge h1 r) l
All the rest programs are as same as the Leftist heap except we neednt
provide rank value when construct a node.
insert::(Ord a) SHeap a a SHeap a
insert h x = merge (Node x E E) h
findMin :: SHeap a a
findMin (Node x _ _) = x
deleteMin :: (Ord a) SHeap a SHeap a
deleteMin (Node _ l r) = merge l r
If we feed a completely ordered list to Skew heap, it will results a fairly
balanced binary trees as shown in gure 8.10.
SkewHeap>fromList [1..10]
Node 1 (Node 2 (Node 6 (Node 10 E E) E) (Node 4 (Node 8 E E) E))
(Node 3 (Node 5 (Node 9 E E) E) (Node 7 E E))
Skew heap in Scheme/Lisp
In merge program, if any one of the tree to be merged, it just returns the other
one. For non-trivial case, the program select the smaller one as the new root
element, merge the tree contains the bigger element to the right child, then swap
the two children.
(define (merge t1 t2)
(cond ((null? t1) t2)
8.4. LEFTIST HEAP AND SKEW HEAP, EXPLICIT BINARY HEAPS 329
1
2
4 3
7 9
16 10
14 8
Figure 8.10: Skew tree is still balanced even the input is an ordered list.
((null? t2) t1)
((< (elem t1) (elem t2))
(make-tree (merge (right t1) t2)
(elem t1)
(left t1)))
(else
(make-tree (merge t1 (right t2))
(elem t2)
(left t2)))))
With merge function dened, insert can be treated as just a special merge
case, that one tree is a leaf which contains the value to be inserted.
(define (insert t x)
(merge (make-tree () x ()) t))
The nd minimum, delete minimum and heap sort functions are all as same
as the leftist heap programs, that they can be put to a generic module.
We only show the testing result in this section.
(load "skewheap.scm")
;Loading "skewheap.scm"... done
;Value: insert
(test-from-list)
;Value 13: (((() 4 ()) 2 (((() 9 ()) 7 ((((() 16 ()) 14 ()) 10 ())
8 ())) 3 ())) 1 ())
(test-sort)
;Value 14: (1 2 3 4 7 8 9 10 14 16)
330CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
8.5 Splay heap, another explicit binary heap
Leftist heap presents that its quite possible to implement heap data structure
with explicit binary tree. Skew heap shows one method to solve the balance
problem. Splay heap on the other hand, shows another balance approach.
Although Leftist heap and Skew heap use binary trees, they are not Bi-
nary Search tree (BST). If we turn the underground data structure to binary
search tree, the minimum(maximum) element isnt located in root node. It
takes O(lg N) time to nd the minimum(maximum) element.
Binary search tree becomes inecient if it isnt well balanced, operations
degrades to O(N) in the worst case. Although its quite OK to use red-black tree
to implement binary heap, Splay tree provides a light weight implementation
with acceptable dynamic balancing result.
8.5.1 Denition
Splay tree uses a cache-like approach that it keeps rotating the current access
node close to the top, so that the node can be accessed fast next time. It denes
such kinds of operation as Splay. For an unbalanced binary search tree, after
several times of splay operation, the tree tends to be more and more balanced.
Most basic operation of Splay tree have amortized O(lg N) time. Splay tree was
invented by Daniel Dominic Sleator and Robert Endre Tarjan in 1985[11] [12].
Splaying
There are two kinds of splaying approach method. The rst one is a bit complex,
However, it can be implemented fairly simple with pattern matching. The
second one is simple, but the implementation is a bit complex.
Denote the node is currently accessed as X, its parent node as P, and its
grand parent node (if has) as G. There are 3 steps for splaying. Each step
contains 2 symmetric cases. For illustration purpose, only one case is shown for
each step.
Zig-zig step. As shown in gure 8.11, in this case, both X, and its parent
P are either left children or right children. By rotating 2 times, X becomes
the new root.
Zig-zag step. As shown in gure 8.12, in this case, X is the right child of
its parent while P is the left child. Or X is the left child of P, and P is
the right child of G. After rotation, X becomes the new root, P and G
become siblings.
Zig step. As shown in gure 8.13, in this case, P is the root, we perform
one rotate, so that X becomes new root. Note this is the last step in splay
operation.
Okasaki found a simple rule for Splaying [6], that every time we follow two
left branches in a row, or two right branches in a row, we rotate those two nodes.
Based on this rule, the Splaying can be realized in such a way. When we
access node for a key x (can be during the process of inserting a node, or looking
up a node, or deleting a node), if we found we traverse two left branches (or two
8.5. SPLAY HEAP, ANOTHER EXPLICIT BINARY HEAP 331
G
P d
X c
a b
X
a p
b g
c d
Figure 8.11: Zig-zig case.
332CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
G
P d
a X
b c
X
P G
a b c d
Figure 8.12: Zig-zag case.
8.5. SPLAY HEAP, ANOTHER EXPLICIT BINARY HEAP 333
P
X c
a b
X
a P
b c
Figure 8.13: Zig case.
right branches), we partition the tree in two part L and R, where L contains
all nodes smaller than x, and R contains all nodes bigger than x. We can then
create a new tree (for instance in insertion), with x as the root, L as the left
child and R is the right child. Note the partition process is recursive, because
it will do splaying inside.
1: function PARTITION(T, pivot)
2: if T = NIL then
3: return (NIL, NIL)
4: x KEY (T)
5: l LEFT(T)
6: r RIGHT(T)
7: L NIL
8: R NIL
9: if x < pivot then
10: if r = NIL then
11: L T
12: else
13: x

KEY (r)
14: l

LEFT(r)
15: r

RIGHT(r)
16: if x

< pivot then


17: (small, big) PARTITION(r

, pivot)
18: L CREATENODE(CREATENODE(l, x, r), x

, small)
334CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
19: R big
20: else
21: (small, big) PARTITION(l

, pivot)
22: L CREATE NODE(l, x, small)
23: R CREATE NODE(big, x

, r

)
24: else
25: if l = NIL then
26: L NIL
27: else
28: x

KEY (L)
29: l

LEFT(L)
30: r

RIGHT(L)
31: if x

> pivot then


32: (small, big) PARTITION(l

, pivot)
33: L small
34: R CREATENODE(l

, x

, CREATENODE(r

, x, r))
35: else
36: (small, big) PARTITION(r

, pivot)
37: L CREATE NODE(l

, x, small)
38: R CREATE NODE(big, x, r)
39: return (L, R)
40: function CREATE-NODE(l, x, r)
41: T CREATE NEW NODE()
42: KEY (T) x
43: LEFT(T) l
44: RIGHT(T) r
45: return T
Denition of Splay heap in Haskell
Since Splay tree is a special binary search tree, the denition of them are same.
data STree a = E -- Empty
| Node (STree a) a (STree a) -- left, element, right
deriving (Eq, Show)
Translate the above algorithm into Haskell gets the following partition pro-
gram.
partition :: (Ord a) STree a a (STree a, STree a)
partition E _ = (E, E)
partition t@(Node l x r) y
| x < y =
case r of
E (t, E)
Node l x r
if x < y then
let (small, big) = partition r y in
(Node (Node l x l) x small, big)
else
let (small, big) = partition l y in
(Node l x small, Node big x r)
8.5. SPLAY HEAP, ANOTHER EXPLICIT BINARY HEAP 335
| otherwise =
case l of
E (E, t)
Node l x r
if y < x then
let (small, big) = partition l y in
(small, Node l x (Node r x r))
else
let (small, big) = partition r y in
(Node l x small, Node big x r)
In a language which supports pattern matching such as Haskell, Splay can
be implemented in a very simple and straightforward style by translating the
gure 8.11, 8.12 and 8.13 directly into patterns. Note that for each step, there
are two left-right symmetric cases.
-- splay by pattern matching
splay :: (Eq a) STree a a STree a
-- zig-zig
splay t@(Node (Node (Node a x b) p c) g d) y =
if x == y then Node a x (Node b p (Node c g d)) else t
splay t@(Node a g (Node b p (Node c x d))) y =
if x == y then Node (Node (Node a g b) p c) x d else t
-- zig-zag
splay t@(Node (Node a p (Node b x c)) g d) y =
if x == y then Node (Node a p b) x (Node c g d) else t
splay t@(Node a g (Node (Node b x c) p d)) y =
if x == y then Node (Node a g b) x (Node c p d) else t
-- zig
splay t@(Node (Node a x b) p c) y = if x == y then Node a x (Node b p c) else t
splay t@(Node a p (Node b x c)) y = if x == y then Node (Node a p b) x c else t
-- otherwise
splay t _ = t
Denition of Splay heap in Scheme/Lisp
Since Splay heap is essentially binary search tree. The denition is same. In
order to output the tree in in-order. we arrange it as (left element right) format.
Some auxiliary functions to access left child, element, right child and con-
structor are dened as same as in refskew-heap-def-lisp.
The function which can partition the tree into 2 parts according to a pivot
value based on algorithm PARTITION is dened as the following. Its a bit
complex, but not hard. We compared the pivot and the element and traverse
the tree based on binary search tree property. In case there are two left (or
right) children traversed, the tree is rotated by splaying. It makes the tree more
and more balanced.
(define (partition t pivot)
(if (null? t)
(cons () ())
(let ((l (left t))
(x (elem t))
(r (right t)))
(if (< x pivot)
336CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
(if (null? r)
(cons t ())
(let ((l1 (left r))
(x1 (elem r))
(r1 (right r)))
(if (< x1 pivot)
(let ((p (partition r1 pivot))
(small (car p))
(big (cdr p)))
(cons (make-tree (make-tree l x l1) x1 small) big))
(let ((p (partition l1 pivot))
(small (car p))
(big (cdr p)))
(cons (make-tree l x small) (make-tree big x1 r1))))))
(if (null? l)
(cons () t)
(let ((l1 (left l))
(x1 (elem l))
(r1 (right l)))
(if (> x1 pivot)
(let ((p (partition l1 pivot))
(small (car p))
(big (cdr p)))
(cons small (make-tree l1 x1 (make-tree r1 x r))))
(let ((p (partition r1 pivot))
(small (car p))
(big (cdr p)))
(cons (make-tree l1 x1 small) (make-tree big x r))))))))))
Although there is no direct pattern matching language feature supporting in
Scheme/Lisp, it possible to provide a splay function by guard clause. Compare
to the Haskell one, it is a bit more complex.
(define (splay l x r y)
(cond ((eq? y (elem (left l))) ;; zig-zig
(make-tree (left (left l))
(elem (left l))
(make-tree (right (left l))
(elem l)
(make-tree (right l) x r))))
((eq? y (elem (right r))) ;; zig-zig
(make-tree (make-tree (make-tree l x (left r))
(elem r)
(left (right r)))
(elem (right r))
(right (right r))))
((eq? y (elem (right l))) ;; zig-zag
(make-tree (make-tree (left l) (elem l) (left (right l)))
(elem (right l))
(make-tree (right (right l)) x r)))
((eq? y (elem (left r))) ;; zig-zag
(make-tree (make-tree l x (left (left r)))
(elem (left r))
(make-tree (right (left r)) (elem r) (right r))))
((eq? y (elem l)) ;; zig
8.5. SPLAY HEAP, ANOTHER EXPLICIT BINARY HEAP 337
(make-tree (left l) (elem l) (make-tree (right l) x r)))
((eq? y (elem r)) ;; zig
(make-tree (make-tree l x (left r)) (elem r) (right r)))
(else (make-tree l x r))))
8.5.2 Basic heap operations
There are two methods to implement basic heap operations for Splay heap.
One is by using PARTITION algorithm we dened, the other is to utilize a
SPLAY process, which can be realized in pattern matching way in languages
equipped with this feature.
Insertion
If using PARTITION algorithm, once we want to insert a new x into a heap
T, we can rst partition it into two trees, L and R. Where L contains all nodes
smaller than x, and R contains all bigger ones. We then construct a new node,
with x as the root and L, R as children.
1: function INSERT(T, x)
2: (L, R) PARTITION(T, x)
3: return CREATE NODE(L, x, R)
If there is a SPLAY algorithm dened, the insert can be done in a recursive
way as the following.
1: function INSERT(T, x)
2: if T = NIL then
3: return CREATE NODE(NIL, x, NIL)
4: if KEY (T) < x then
5: LEFT(T) INSERT

(LEFT(T), x)
6: else
7: RIGHT(T) INSERT

(RIGHT(T), x)
8: return SPLAY (T, x)
Insertion in Haskell
Its easy to translate the above algorithms into Haskell.
insert :: (Ord a) STree a a STree a
insert t x = Node small x big where (small, big) = partition t x
And the pattern matching one is in recursive manner as below.
insert :: (Ord a) STree a a STree a
insert E y = Node E y E
insert (Node l x r) y
| x > y = splay (Node (insert l y) x r) y
| otherwise = splay (Node l x (insert r y)) y
Insertion in Scheme/Lisp
By using partition method, when insert a new element to the Splay heap, we
use this element as the pivot to partition the tree. After that we set this new
338CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
element as the new root, the sub tree contains all elements smaller than root
as the left child, and the others as the right child. Note that we intend not
handle the duplicated elements case, because it quite possible to contains them
in Splay heap.
(define (insert t x)
(let ((p (partition t x))
(small (car p))
(big (cdr p)))
(make-tree small x big)))
And if use splay function, the insert can be implemented straightforward like
binary search tree, except that we need apply splaying recursively after that.
(define (insert-splay t x)
(cond ((null? t) (make-tree () x ()))
((> (elem t) x)
(splay (insert-splay (left t) x) (elem t) (right t) x))
(else
(splay (left t) (elem t) (insert-splay (right t) x) x))))
Verify how Splay improve the balance
In order to show how Splaying improves the balance of binary search tree, we
rst insert a list of ordered element to the tree and then performs a large number
of arbitrary node access.
A look-up algorithm is provided with Splaying operation inside.
1: function LOOKUP(T, x)
2: if KEY (T) = x then
3: return T
4: else if KEY (T) > x then
5: LEFT(T) LOOKUP(LEFT(T), x)
6: else
7: RIGHT(T) LOOKUP(RIGHT(T), x)
8: return SPLAY (T, x)
Translate this algorithm into Haskell yields the following program
2
.
lookup :: (Ord a) STree a a STree a
lookup E _ = E
lookup t@(Node l x r) y
| x == y = t
| x > y = splay (Node (lookup l y) x r) y
| otherwise = splay (Node l x (lookup r y)) y
Next we can create a Splay heap by inserting sequence number from 1 to
10. After that, We random select from a number from this range, and perform
looking up 1000 times as below.
testSplay = do
xs sequence (replicate 1000 (randomRIO(1, 10)))
putStrLn $ show (foldl lookup t xs)
where
t = foldl insert (E::STree Int) [1..10]
2
LOOKUP algorithm in other language is skipped.
8.5. SPLAY HEAP, ANOTHER EXPLICIT BINARY HEAP 339
Run these test will make the tree quite balance. Below is an example result.
Figure 8.14 shows the tree after splaying.
Node (Node (Node (Node E 1 E) 2 E) 3 E) 4 (Node (Node E 5 E) 6 (Node
(Node (Node E 7 E) 8 E) 9 (Node E 10 E)))
4
3 6
2
1
5 9
8 10
7
Figure 8.14: Splaying helps improving the balance.
Find minimum (top) and delete minimum (pop)
Since Splay tree is just a special binary search tree, so the minimum element is
stored in the left most node. We need keep traversing the left child.
1: function FIND-MIN(T)
2: if LEFT(T) = NIL then
3: return KEY (T)
4: else
5: return FIND MIN(LEFT(T))
And for pop operation, the algorithm need keep traversing in left and also
remove the minimum element from the tree. In case there are two left nodes
traversed, a splaying operation should be performed.
1: function DEL-MIN(T)
2: if LEFT(T) = NIL then
3: return RIGHT(T)
4: else if LEFT(LEFT(T)) = NIL then
5: LEFT(T) RIGHT(LEFT(T))
6: return T
7: else Splaying
8: l LEFT(T)
9: r CREATE NEW NODE()
340CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
10: LEFT(r) RIGHT(l)
11: KEY (r) KEY (T)
12: RIGHT(r) RIGHT(T)
13: T

CREATE NEW NODE()


14: LEFT(T

) DEL MIN(LEFT(l))
15: KEY (T

) KEY (l)
16: RIGHT(T

) r
17: return T

Note that the nd minimum and delete minimum algorithms both are bound
to O(lg N).
Find minimum (top) and delete minimum in Haskell
When translate the above algorithms into Haskell, one option is to use pattern
matching.
findMin :: STree a a
findMin (Node E x _) = x
findMin (Node l x _) = findMin l
deleteMin :: STree a STree a
deleteMin (Node E x r) = r
deleteMin (Node (Node E x r) x r) = Node r x r
deleteMin (Node (Node l x r) x r) = Node (deleteMin l) x (Node r x r)
Find minimum (top) and delete minimum in Scheme/Lisp
In Scheme/Lisp, nding minimum element means keep traversing to left; while
for max-heap, we need change the program to keep traversing to right.
(define (find-min t)
(if (null? (left t))
(elem t)
(find-min (left t))))
And for delete minimum program, except for trivial case, splaying also need
be performed if we traverse in left twice.
(define (delete-min t)
(cond ((null? (left t)) (right t))
((null? (left (left t))) (make-tree (right (left t)) (elem t) (right t)))
(else (make-tree (delete-min (left (left t)))
(elem (left t))
(make-tree (right (left t)) (elem t) (right t)))))
Merge
Merge is another basic operation for heaps as it is widely used in Graph algo-
rithms. By using PARTITION algorithm, merge can be realized in O(lg N)
time.
When merging two Splay trees, for non-trivial case, we can take the root
element of the rst tree as the new root element, then partition the second
tree with the new root as the pivot value. After that we recursively merge the
8.6. NOTES AND SHORT SUMMARY 341
children of the rst tree with the partition result. This algorithm is shown as
the following.
1: function MERGE(T1, T2)
2: if T1 = NIL then
3: return T2
4: else
5: L LEFT(T1)
6: R RIGHT(T1)
7: k KEY (T1)
8: (L

, R

) PARTITION(T2, k)
9: return CREATE NODE(MERGE(L, L

), kMERGE(R, R

))
Merge two Splay heaps in Haskell
In Haskell, we can handle the trivial and non-trivial case by pattern matching.
merge :: (Ord a) STree a STree a STree a
merge E t = t
merge (Node l x r) t = Node (merge l l) x (merge r r)
where (l, r) = partition t x
Merge two Splay heaps in Scheme/Lisp
In Scheme/Lisp, we translate the above algorithm strictly as the following.
(define (merge t1 t2)
(if (null? t1)
t2
(let ((p (partition t2 (elem t1)))
(small (car p))
(big (cdr p)))
(make-tree (merge (left t1) small) (elem t1) (merge (right t1) big)))))
8.5.3 Heap sort
Since the internal implementation of the Splay heap is completely transparent
to the heap interface, the heap sort algorithm can be reused. It means that the
heap sort algorithm is generic no matter what the underground data structure
is.
8.6 Notes and short summary
In this post, we reviewed the denition of binary heap, and adjust it a bit so
that as long as the heap property is maintained, all binary representation of
data structures can be used to implement binary heap.
This enable us not only limit to the popular implicit binary heap by ar-
ray, but also extend to explicit binary heaps including Leftist heap, Skew heap
and Splay heap. Note that, the implicit binary heap by array is particularly
convenient for imperative implementation because it intense uses random index
access which can be mapped to a completely binary tree. Its hard to directly
nd functional counterpart in this way.
342CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
However, by using explicit binary tree, functional implementation can be
easily achieved, most of them have O(lg N) worst case performance, and some
of them even reach O(1) amortize time. Okasaki in [6] shows good analysis of
these data structures.
In this post, Only pure functional realization for Leftist heap, Skew heap, and
Splay heap are explained, they are all possible to be implemented in imperative
way. I skipped them only for the purpose of presenting comparable functional
algorithm to the implicit binary heap by array.
Its very natural to extend the concept from binary tree to K-ary (K-way)
tree, which leads to other useful heap concept such as Binomial heaps, Fibonacci
heaps and pairing heaps. Ill introduce them in a separate post later.
8.7 Appendix
All programs provided along with this article are free for downloading.
8.7.1 Prerequisite software
GNU Make is used for easy build some of the program. For C++ and ANSI
C programs, GNU GCC and G++ 3.4.4 are used. For Haskell programs GHC
6.10.4 is used for building. For Python programs, Python 2.5 is used for testing,
for Scheme/Lisp program, MIT Scheme 14.9 is used.
all source les are put in one folder. Invoke make or make all will build
C++ Program.
There is no separate Haskell main program module, however, it is possible
to run the program in GHCi.
bheap.hpp. This is the C++ source le contains binary heap dention
and functions, There are two types of approach, one is the reference +
size way, the other is range representation.
test.cpp. This is the main C++ program to test bheap.hpp module.
bheap.py. This is the Python source le for binary heap implementation.
Its a self-contained program with test cases embedded. run it directly
will performs all test cases. It is also possible to import it as a module.
LeftistHeap.hs. This is the Haskell program for Leftist Heap with some
simple test cases as well. It can be loaded to GHCi directly.
SkewHeap.hs. This is the Haskell program for Skew heap dention. Some
very simple test cases are provided.
SplayHeap.hs. This is the Haskell proram for Splay heap denition.
letst.scm. This is the Scheme/Lisp program for Leftist heap.
skewheap.scm. This is the Scheme/Lisp program for Skew heap. Same as
leftist.scm, some gerneric functions are reused.
genheap.scm. This is the Scheme/Lisp general heap function utilities
which are all same for varies of heaps. It can be overwritten afterwards.
8.7. APPENDIX 343
splayheap.scm. This is the Scheme/Lisp program for Splay heap deni-
tion.
download position: http://sites.google.com/site/algoxy/btree/bheap.zip
344CHAPTER 8. BINARY HEAPS WITH FUNCTIONAL AND IMPERATIVE IMPLEMENTATION
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.
[2] Heap (data structure), Wikipedia. http://en.wikipedia.org/wiki/Heap (data structure)
[3] Heapsort, Wikipedia. http://en.wikipedia.org/wiki/Heapsort
[4] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[5] Sorting algorithms/Heapsort. Rosetta Code.
http://rosettacode.org/wiki/Sorting algorithms/Heapsort
[6] Leftist Tree, Wikipedia. http://en.wikipedia.org/wiki/Leftist tree
[7] Bruno R. Preiss. Data Structures and Algorithms with Object-Oriented De-
sign Patterns in Java. http://www.brpreiss.com/books/opus5/index.html
[8] Donald E. Knuth. The Art of Computer Programming. Volume 3: Sorting
and Searching.. Addison-Wesley Professional; 2nd Edition (October 15,
1998). ISBN-13: 978-0201485417. Section 5.2.3 and 6.2.3
[9] Skew heap, Wikipedia. http://en.wikipedia.org/wiki/Skew heap
[10] Sleator, Daniel Dominic; Jarjan, Robert Endre. Self-adjusting heaps
SIAM Journal on Computing 15(1):52-69. doi:10.1137/0215004 ISSN
00975397 (1986)
[11] Splay tree, Wikipedia. http://en.wikipedia.org/wiki/Splay tree
[12] Sleator, Daniel D.; Tarjan, Robert E. (1985), Self-Adjusting Binary Search
Trees, Journal of the ACM 32(3):652 - 686, doi: 10.1145/3828.3835
[13] NIST, binary heap. http://xw2k.nist.gov/dads//HTML/binaryheap.html
From grape to the world cup, the evolution of selection sort
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
345
346 The evolution of selection sort
Chapter 9
From grape to the world
cup, the evolution of
selection sort
9.1 Introduction
We have introduced the hello world sorting algorithm, insertion sort. In this
short chapter, we explain another straightforward sorting method, selection sort.
The basic version of selection sort doesnt perform as good as the divide and
conqueror methods, e.g. quick sort and merge sort. Well use the same ap-
proaches in the chapter of insertion sort, to analyze why its slow, and try to
improve it by varies of attempts till reach the best bound of comparison based
sorting, O(N lg N), by evolving to heap sort.
The idea of selection sort can be illustrated by a real life story. Consider
a kid eating a bunch of grapes. There are two types of children according to
my observation. One is optimistic type, that the kid always eats the biggest
grape he/she can ever nd; the other is pessimistic, that he/she always eats the
smallest one.
The rst type of kids actually eat the grape in an order that the size decreases
monotonically; while the other eat in a increase order. The kid sorts the grapes
in order of size in fact, and the method used here is selection sort.
Based on this idea, the algorithm of selection sort can be directly described
as the following.
In order to sort a series of elements:
The trivial case, if the series is empty, then we are done, the result is also
empty;
Otherwise, we nd the smallest element, and append it to the tail of the
result;
Note that this algorithm sorts the elements in increase order; Its easy to
sort in decrease order by picking the biggest element instead; Well introduce
about passing a comparator as a parameter later on.
347
348CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
Figure 9.1: Always picking the smallest grape.
This description can be formalized to a equation.
sort(A) =
_
: A =
{m} sort(A

) : otherwise
(9.1)
Where m is the minimum element among collection A, and A

is all the rest


elements except m:
m = min(A)
A

= A{m}
We dont limit the data structure of the collection here. Typically, A is an
array in imperative environment, and a list (singly linked-list particularly) in
functional environment, and it can even be other data struture which will be
introduced later.
The algorithm can also be given in imperative manner.
function Sort(A)
X
while A = do
x Min(A)
A Del(A, x)
X Append(X, x)
return X
Figure 9.2 depicts the process of this algorithm.
... sorted elements ... min ... unsorted elements ...
pick
Figure 9.2: The left part is sorted data, continuously pick the minimum element
in the rest and append it to the result.
We just translate the very original idea of eating grapes line by line without
considering any expense of time and space. This realization stores the result in
9.2. FINDING THE MINIMUM 349
X, and when an selected element is appended to X, we delete the same element
from A. This indicates that we can change it to in-place sorting to reuse the
spaces in A.
The idea is to store the minimum element in the rst cell in A (we use
term cell if A is an array, and say node if A is a list); then store the second
minimum element in the next cell, then the third cell, ...
One solution to realize this sorting strategy is swapping. When we select
the i-th minimum element, we swap it with the element in the i-th cell:
function Sort(A)
for i 1 to |A| do
m Min(A[i...])
Exchange A[i] m
Denote A = {a
1
, a
2
, ..., a
N
}. At any time, when we process the i-th element,
all elements before i, as {a
1
, a
2
, ..., a
i1
} have already been sorted. We locate
the minimum element among the {a
i
, a
i+1
, ..., a
N
}, and exchange it with a
i
, so
that the i-th cell contains the right value. The process is repeatedly executed
until we arrived at the last element.
This idea can be illustrated by gure 9.3.
... sorted elements ... x
insert
... unsorted elements ...
Figure 9.3: The left part is sorted data, continuously pick the minimum element
in the rest and put it to the right position.
9.2 Finding the minimum
We havent completely realized the selection sort, because we take the operation
of nding the minimum (or the maximum) element as a black box. Its a puzzle
how does a kid locate the biggest or the smallest grape. And this is an interesting
topic for computer algorithms.
The easiest but not so fast way to nd the minimum in a collection is to
perform a scan. There are several ways to interpret this scan process. Consider
that we want to pick the biggest grape. We start from any grape, compare
it with another one, and pick the bigger one; then we take a next grape and
compare it with the one we selected so far, pick the bigger one and go on the
take-and-compare process, until there are not any grapes we havent compared.
Its easy to get loss in real practice if we dont mark which grape has been
compared. There are two ways to to solve this problem, which are suitable for
dierent data-structures respectively.
350CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
9.2.1 Labeling
Method 1 is to label each grape with a number: {1, 2, ..., N}, and we systemat-
ically perform the comparison in the order of this sequence of labels. That we
rst compare grape number 1 and grape number 2, pick the bigger one; then we
take grape number 3, and do the comparison, ... We repeat this process until
arrive at grape number N. This is quite suitable for elements stored in an array.
function Min(A)
m A[1]
for i 2 to |A| do
if A[i] < m then
m A[i]
return m
With Min dened, we can complete the basic version of selection sort (or
naive version without any optimization in terms of time and space).
However, this algorithm returns the value of the minimum element instead
of its location (or the label of the grape), which needs a bit tweaking for the
in-place version. Some languages such as ISO C++, support returning the
reference as result, so that the swap can be achieved directly as below.
template<typename T>
T& min(T from, T to) {
T m;
for (m = from++; from != to; ++from)
if (from < m)
m = from;
return m;
}
template<typename T>
void ssort(T xs, int n) {
int i;
for (i = 0; i < n; ++i)
std::swap(xs[i], min(xs+i, xs+n));
}
In environments without reference semantics, the solution is to return the
location of the minimum element instead of the value:
function Min-At(A)
m First-Index(A)
for i m+ 1 to |A| do
if A[i] < A[m] then
m i
return m
Note that since we pass A[i...] to Min-At as the argument, we assume the
rst element A[i] as the smallest one, and examine all elements A[i + 1], A[i +
2], ... one by one. Function First-Index() is used to retrieve i from the input
parameter.
The following Python example program, for example, completes the basic
in-place selection sort algorithm based on this idea. It explicitly passes the
range information to the function of nding the minimum location.
9.2. FINDING THE MINIMUM 351
def ssort(xs):
n = len(xs)
for i in range(n):
m = min_at(xs, i, n)
(xs[i], xs[m]) = (xs[m], xs[i])
return xs
def min_at(xs, i, n):
m = i;
for j in range(i+1, n):
if xs[j] < xs[m]:
m = j
return m
9.2.2 Grouping
Another method is to group all grapes in two parts: the group we have examined,
and the rest we havent. We denote these two groups as A and B; All the
elements (grapes) as L. At the beginning, we havent examine any grapes at
all, thus A is empty (), and B contains all grapes. We can select arbitrary two
grapes from B, compare them, and put the loser (the smaller one for example) to
A. After that, we repeat this process by continuously picking arbitrary grapes
from B, and compare with the winner of the previous time until B becomes
empty. At this time being, the nal winner is the minimum element. And A
turns to be L{min(L)}, which can be used for the next time minimum nding.
There is an invariant of this method, that at any time, we have L = A
{m} B, where m is the winner so far we hold.
This approach doesnt need the collection of grapes being indexed (as being
labeled in method 1). Its suitable for any traversable data structures, including
linked-list etc. Suppose b
1
is an arbitrary element in B if B isnt empty, and B

is the rest of elements with b


1
being removed, this method can be formalized as
the below auxiliary function.
min

(A, m, B) =
_
_
_
(m, A) : B =
min

(A {m}, b
1
, B

) : b
1
< m
min

(A {b
1
}, m, B

) : otherwise
(9.2)
In order to pick the minimum element, we call this auxiliary function by
passing an empty A, and use an arbitrary element (for instance, the rst one)
to initialize m:
extractMin(L) = min

(, l
1
, L

) (9.3)
Where L

is all elements in L except for the rst one l


1
. The algorithm
extractMin) doesnt not only nd the minimum element, but also returns the
updated collection which doesnt contain this minimum. Summarize this mini-
mum extracting algorithm up to the basic selection sort denition, we can create
a complete functional sorting program, for example as this Haskell code snippet.
sort [] = []
sort xs = x : sort xs where
(x, xs) = extractMin xs
352CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
extractMin (x:xs) = min [] x xs where
min ys m [] = (m, ys)
min ys m (x:xs) = if m < x then min (x:ys) m xs else min (m:ys) x xs
The rst line handles the trivial edge case that the sorting result for empty
list is obvious empty; The second clause ensures that, there is at least one
element, thats why the extractMin function neednt other pattern-matching.
One may think the second clause of min function should be written like
below:
min ys m (x:xs) = if m < x then min ys ++ [x] m xs
else min ys ++ [m] x xs
Or it will produce the updated list in reverse order. Actually, its necessary
to use cons instead of appending here. This is because appending is linear
operation which is proportion to the length of part A, while cons is constant
O(1) time operation. In fact, we neednt keep the relative order of the list to
be sorted, as it will be re-arranged anyway during sorting.
Its quite possible to keep the relative order during sorting, while ensure the
performance of nding the minimum element not degrade to quadratic. The
following equation denes a solution.
extractMin(L) =
_
_
_
(l
1
, ) : |L| = 1
(l
1
, L

) : l
1
< m, (m, L

) = extractMin(L

)
(m, l
1
L

) : otherwise
(9.4)
If L is a singleton, the minimum is the only element it contains. Otherwise,
denote l
1
as the rst element in L, and L

contains the rest elements except for


l
1
, that L

= {l
2
, l
3
, ...}. The algorithm recursively nding the minimum element
in L

, which yields the intermediate result as (m, L

), that m is the minimum


element in L

, and L

contains all rest elements except for m. Comparing l


1
with m, we can determine which of them is the nal minimum result.
The following Haskell program implements this version of selection sort.
sort [] = []
sort xs = x : sort xs where
(x, xs) = extractMin xs
extractMin [x] = (x, [])
extractMin (x:xs) = if x < m then (x, xs) else (m, x:xs) where
(m, xs) = extractMin xs
Note that only cons operation is used, we neednt appending at all because
the algorithm actually examines the list from right to left. However, its not
free, as this program need book-keeping the context (via call stack typically).
The relative order is ensured by the nature of recursion. Please refer to the
appendix about tail recursion call for detailed discussion.
9.2.3 performance of the basic selection sorting
Both the labeling method, and the grouping method need examine all the ele-
ments to pick the minimum in every round; and we totally pick up the minimum
9.3. MINOR IMPROVEMENT 353
element N times. Thus the performance is around N+(N1)+(N2)+... +1
which is
N(N+1)
2
. Selection sort is a quadratic algorithm bound to O(N
2
) time.
Compare to the insertion sort, which we introduced previously, selection sort
performs same in its best case, worst case and average case. While insertion
sort performs well in best case (that the list has been reverse ordered, and it is
stored in linked-list) as O(N), and the worst performance is O(N
2
).
In the next sections, well examine, why selection sort performs poor, and
try to improve it step by step.
Exercise 9.1
Implement the basic imperative selection sort algorithm (the none in-place
version) in your favorite programming language. Compare it with the in-
place version, and analyze the time and space eectiveness.
9.3 Minor Improvement
9.3.1 Parameterize the comparator
Before any improvement in terms of performance, lets make the selection sort
algorithm general enough to handle dierent sorting criteria.
Weve seen two opposite examples so far, that one may need sort the elements
in ascending order or descending order. For the former case, we need repeatedly
nding the minimum, while for the later, we need nd the maximum instead.
They are just two special cases. In real world practice, one may want to sort
things in varies criteria, e.g. in terms of size, weight, age, ...
One solution to handle them all is to passing the criteria as a compare
function to the basic selection sort algorithms. For example:
sort(c, L) =
_
: L =
m sort(c, L

) : otherwise, (m, L

) = extract(c, L

)
(9.5)
And the algorithm extract(c, L) is dened as below.
extract(c, L) =
_
_
_
(l
1
, ) : |L| = 1
(l
1
, L

) : c(l
1
, m), (m, L

) = extract(c, L

)
(m, {l
1
} L

) : c(l
1
, m)
(9.6)
Where c is a comparator function, it takes two elements, compare them and
returns the result of which one is preceding of the other. Passing less than
operator (<) turns this algorithm to be the version we introduced in previous
section.
Some environments require to pass the total ordering comparator, which
returns result among less than, equal, and greater than. We neednt such
strong condition here, that c only tests if less than is satised. However, as the
minimum requirement, the comparator should meet the strict weak ordering as
following [16]:
354CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
Irreexivity, for all x, its not the case that x < x;
Asymmetric, For all x and y, if x < y, then its not the case y < x;
Transitivity, For all x, y, and z, if x < y, and y < z, then x < z;
The following Scheme/Lisp program translates this generic selection sorting
algorithm. The reason why we choose Scheme/Lisp here is because the lexical
scope can simplify the needs to pass the less than comparator for every function
calls.
(define (sel-sort-by ltp? lst)
(define (ssort lst)
(if (null? lst)
lst
(let ((p (extract-min lst)))
(cons (car p) (ssort (cdr p))))))
(define (extract-min lst)
(if (null? (cdr lst))
lst
(let ((p (extract-min (cdr lst))))
(if (ltp? (car lst) (car p))
lst
(cons (car p) (cons (car lst) (cdr p)))))))
(ssort lst))
Note that, both ssort and extract-min are inner functions, so that the
less than comparator ltp? is available to them. Passing < to this function
yields the normal sorting in ascending order:
(sel-sort-by < (3 1 2 4 5 10 9))
;Value 16: (1 2 3 4 5 9 10)
Its possible to pass varies of comparator to imperative selection sort as well.
This is left as an exercise to the reader.
For the sake of brevity, we only consider sorting elements in ascending order
in the rest of this chapter. And well not pass comparator as a parameter unless
its necessary.
9.3.2 Trivial ne tune
The basic in-place imperative selection sorting algorithm iterates all elements,
and picking the minimum by traversing as well. It can be written in a compact
way, that we inline the minimum nding part as an inner loop.
procedure Sort(A)
for i 1 to |A| do
m i
for j i + 1 to |A| do
if A[i] < A[m] then
m i
Exchange A[i] A[m]
Observe that, when we are sorting N elements, after the rst N1 minimum
ones are selected, the left only one, is denitely the N-th big element, so that
9.3. MINOR IMPROVEMENT 355
we need NOT nd the minimum if there is only one element in the list. This
indicates that the outer loop can iterate to N 1 instead of N.
Another place we can ne tune, is that we neednt swap the elements if the
i-th minimum one is just A[i]. The algorithm can be modied accordingly as
below:
procedure Sort(A)
for i 1 to |A| 1 do
m i
for j i + 1 to |A| do
if A[i] < A[m] then
m i
if m = i then
Exchange A[i] A[m]
Denitely, these modications wont aects the performance in terms of big-
O.
9.3.3 Cock-tail sort
Knuth gave an alternative realization of selection sort in [1]. Instead of selecting
the minimum each time, we can select the maximum element, and put it to the
last position. This method can be illustrated by the following algorithm.
procedure Sort(A)
for i |A| down-to 2 do
m i
for j 1 to i 1 do
if A[m] < A[i] then
m i
Exchange A[i] A[m]
As shown in gure 13.1, at any time, the elements on right most side are
sorted. The algorithm scans all unsorted ones, and locate the maximum. Then,
put it to the tail of the unsorted range by swapping.
... max ... x
swap
... sorted elements ...
Figure 9.4: Select the maximum every time and put it to the end.
This version reveals the fact that, selecting the maximum element can sort
the element in ascending order as well. Whats more, we can nd both the
minimum and the maximum elements in one pass of traversing, putting the
minimum at the rst location, while putting the maximum at the last position.
This approach can speed up the sorting slightly (halve the times of the outer
loop).
procedure Sort(A)
for i 1 to
|A|
2
do
356CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
min i
max |A| + 1 i
if A[max] < A[min] then
Exchange A[min] A[max]
for j i + 1 to |A| i do
if A[j] < A[min] then
min j
if A[max] < A[j] then
max j
Exchange A[i] A[min]
Exchange A[|A| + 1 i] A[max]
This algorithm can be illustrated as in gure 9.5, at any time, the left most
and right most parts contain sorted elements so far. That the smaller sorted ones
are on the left, while the bigger sorted ones are on the right. The algorithm scans
the unsorted ranges, located both the minimum and the maximum positions,
then put them to the head and the tail position of the unsorted ranges by
swapping.
... sorted small ones ... x ... max ... min ... y
swap swap
... sorted big ones ...
Figure 9.5: Select both the minimum and maximum in one pass, and put them
to the proper positions.
Note that its necessary to swap the left most and right most elements before
the inner loop if they are not in correct order. This is because we scan the range
excluding these two elements. Another method is to initialize the rst element of
the unsorted range as both the maximum and minimum before the inner loop.
However, since we need two swapping operations after the scan, its possible
that the rst swapping moves the maximum or the minimum from the position
we just found, which leads the second swapping malfunctioned. How to solve
this problem is left as exercise to the reader.
The following Python example program implements this cock-tail sort algo-
rithm.
def cocktail_sort(xs):
n = len(xs)
for i in range(n / 2):
(mi, ma) = (i, n - 1 -i)
if xs[ma] < xs[mi]:
(xs[mi], xs[ma]) = (xs[ma], xs[mi])
for j in range(i+1, n - 1 - i):
if xs[j] < xs[mi]:
mi = j
if xs[ma] < xs[j]:
ma = j
(xs[i], xs[mi]) = (xs[mi], xs[i])
(xs[n - 1 - i], xs[ma]) = (xs[ma], xs[n - 1 - i])
9.3. MINOR IMPROVEMENT 357
return xs
Its possible to realize cock-tail sort in functional approach as well. An
intuitive recursive description can be given like this:
Trivial edge case: If the list is empty, or there is only one element in the
list, the sorted result is obviously the origin list;
Otherwise, we select the minimum and the maximum, put them in the
head and tail positions, then recursively sort the rest elements.
This algorithm description can be formalized by the following equation.
sort(L) =
_
L : |L| 1
{l
min
} sort(L

) {l
max
} : otherwise
(9.7)
Where the minimum and the maximum are extracted from L by a function
select(L).
(l
min
, L

, l
max
) = select(L)
Note that, the minimum is actually linked to the front of the recursive sort
result. Its semantic is a constant O(1) time cons (refer to the appendix of this
book for detail). While the maximum is appending to the tail. This is typically
a linear O(N) time expensive operation. Well optimize it later.
Function select(L) scans the whole list to nd both the minimum and the
maximum. It can be dened as below:
select(L) =
_

_
(min(l
1
, l
2
), max(l
1
, l
2
)) : L = {l
1
, l
2
}
(l
1
, {l
min
} L

, l
max
) : l
1
< l
min
(l
min
, {l
max
} L

, l
1
) : l
max
< l
1
(l
min
, {l
1
} L

, l
max
) : otherwise
(9.8)
Where (l
min
, L

, l
max
) = select(L

) and L

is the rest of the list except for


the rst element l
1
. If there are only two elements in the list, we pick the
smaller as the minimum, and the bigger as the maximum. After extract them,
the list becomes empty. This is the trivial edge case; Otherwise, we take the rst
element l
1
out, then recursively perform selection on the rest of the list. After
that, we compare if l
1
is less then the minimum or greater than the maximum
candidates, so that we can nalize the result.
Note that for all the cases, there is no appending operation to form the result.
However, since selection must scan all the element to determine the minimum
and the maximum, it is bound to O(N) linear time.
The complete example Haskell program is given as the following.
csort [] = []
csort [x] = [x]
csort xs = mi : csort xs ++ [ma] where
(mi, xs, ma) = extractMinMax xs
extractMinMax [x, y] = (min x y, [], max x y)
extractMinMax (x:xs) | x < mi = (x, mi:xs, ma)
| ma < x = (mi, ma:xs, x)
| otherwise = (mi, x:xs, ma)
where (mi, xs, ma) = extractMinMax xs
358CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
We mentioned that the appending operation is expensive in this intuitive
version. It can be improved. This can be achieved in two steps. The rst step is
to convert the cock-tail sort into tail-recursive call. Denote the sorted small ones
as A, and sorted big ones as B in gure 9.5. We use A and B as accumulators.
The new cock-tail sort is dened as the following.
sort

(A, L, B) =
_
A L B : L = |L| = 1
sort

(A {l
min
}, L

, {l
max
} B) : otherwise
(9.9)
Where l
min
, l
max
and L

are dened as same as before. And we start sorting


by passing empty A and B: sort(L) = sort

(, L, ).
Besides the edge case, observing that the appending operation only happens
on A {l
min
}; while l
max
is only linked to the head of B. This appending
occurs in every recursive call. To eliminate it, we can store A in reverse order
as

A, so that l
max
can be cons to the head instead of appending. Denote
cons(x, L) = {x} L and append(L, x) = L{x}, we have the below equation.
append(L, x) = reverse(cons(x, reverse(L)))
= reverse(cons(x,

L))
(9.10)
Finally, we perform a reverse to turn

A back to A. Based on this idea, the
algorithm can be improved one more step as the following.
sort

(A, L, B) =
_
_
_
reverse(A) B : L =
reverse({l
1
} A) B : |L| = 1
sort

({l
min
} A, L

, {l
max
} B) :
(9.11)
This algorithm can be implemented by Haskell as below.
csort xs = cocktail [] xs [] where
cocktail as [] bs = reverse as ++ bs
cocktail as [x] bs = reverse (x:as) ++ bs
cocktail as xs bs = let (mi, xs, ma) = extractMinMax xs
in cocktail (mi:as) xs (ma:bs)
Exercise 9.2
Realize the imperative basic selection sort algorithm, which can take a
comparator as a parameter. Please try both dynamic typed language and
static typed language. How to annotate the type of the comparator as
general as possible in a static typed language?
Implement Knuths version of selection sort in your favorite programming
language.
An alternative to realize cock-tail sort is to assume the i-th element both
the minimum and the maximum, after the inner loop, the minimum and
maximum are found, then we can swap the the minimum to the i-th
position, and the maximum to position |A|+1i. Implement this solution
in your favorite imperative language. Please note that there are several
special edge cases should be handled correctly:
9.4. MAJOR IMPROVEMENT 359
A = {max, min, ...};
A = {..., max, min};
A = {max, ..., min}.
Please dont refer to the example source code along with this chapter
before you try to solve this problem.
9.4 Major improvement
Although cock-tail sort halves the numbers of loop, the performance is still
bound to quadratic time. It means that, the method we developed so far handles
big data poorly compare to other divide and conquer sorting solutions.
To improve selection based sort essentially, we must analyze where is the
bottle-neck. In order to sort the elements by comparison, we must examine all
the elements for ordering. Thus the outer loop of selection sort is necessary.
However, must it scan all the elements every time to select the minimum? Note
that when we pick the smallest one at the rst time, we actually traverse the
whole collection, so that we know which ones are relative big, and which ones
are relative small partially.
The problem is that, when we select the further minimum elements, instead
of re-using the ordering information we obtained previously, we drop them all,
and blindly start a new traverse.
So the key point to improve selection based sort is to re-use the previous
result. There are several approaches, well adopt an intuitive idea inspired by
football match in this chapter.
9.4.1 Tournament knock out
The football world cup is held every four years. There are 32 teams from
dierent continent play the nal games. Before 1982, there were 16 teams
compete for the tournament nals[4].
For simplication purpose, lets go back to 1978 and imagine a way to de-
termine the champion: In the rst round, the teams are grouped into 8 pairs
to play the game; After that, there will be 8 winner, and 8 teams will be out.
Then in the second round, these 8 teams are grouped into 4 pairs. This time
there will be 4 winners after the second round of games; Then the top 4 teams
are divided into 2 pairs, so that there will be only two teams left for the nal
game.
The champion is determined after the total 4 rounds of games. And there
are actually 8 + 4 + 2 + 1 = 16 games. Now we have the world cup champion,
however, the world cup game wont nish at this stage, we need to determine
which is the silver medal team.
Readers may argue that isnt the team beaten by the champion at the -
nal game the second best? This is true according to the real world cup rule.
However, it isnt fair enough in some sense.
We often heard about the so called group of death, Lets suppose that
Brazil team is grouped with Deutch team at the very beginning. Although both
teams are quite strong, one of them must be knocked out. Its quite possible
360CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
that even the team loss that game can beat all the other teams except for the
champion. Figure 9.6 illustrates such case.
16
16 14
16 13
7 16
7 6 15 16
8 13
8 4 13 3
10 14
10 9
5 10 9 1
12 14
12 2 11 14
Figure 9.6: The element 15 is knocked out in the rst round.
Imagine that every team has a number. The bigger the number, the stronger
the team. Suppose that the stronger team always beats the team with smaller
number, although this is not true in real world. But this simplication is fair
enough for us to develop the tournament knock out solution. This maximum
number which represents the champion is 16. Denitely, team with number 14
isnt the second best according to our rules. It should be 15, which is knocked
out at the rst round of comparison.
The key question here is to nd an eective way to locate the second max-
imum number in this tournament tree. After that, what we need is to apply
the same method to select the third, the fourth, ..., to accomplish the selection
based sort.
One idea is to assign the champion a very small number (for instance, ),
so that it wont be selected next time, and the second best one, becomes the
new champion. However, suppose there are 2
m
teams for some natural number
m, it still takes 2
m1
+2
m2
+... +2+1 = 2
m
times of comparison to determine
the new champion. Which is as slow as the rst time.
Actually, we neednt perform a bottom-up comparison at all since the tour-
nament tree stores plenty of ordering information. Observe that, the second
best team must be beaten by the champion at sometime, or it will be the nal
winner. So we can track the path from the root of the tournament tree to the
leaf of the champion, examine all the teams along with this path to nd the
second best team.
In gure 9.6, this path is marked in gray color, the elements to be examined
are {14, 13, 7, 15}. Based on this idea, we rene the algorithm like below.
1. Build a tournament tree from the elements to be sorted, so that the cham-
pion (the maximum) becomes the root;
2. Extract the root from the tree, perform a top-down pass and replace the
maximum with ;
3. Perform a bottom-up back-track along the path, determine the new cham-
pion and make it as the new root;
4. Repeat step 2 until all elements have been extracted.
Figure 9.7, 9.8, and 9.9 show the steps of applying this strategy.
9.4. MAJOR IMPROVEMENT 361
15
15 14
15 13
7 15
7 6 15 -INF
8 13
8 4 13 3
10 14
10 9
5 10 9 1
12 14
12 2 11 14
Figure 9.7: Extract 16, replace it with , 15 sifts up to root.
14
13 14
7 13
7 -INF
7 6 -INF -INF
8 13
8 4 13 3
10 14
10 9
5 10 9 1
12 14
12 2 11 14
Figure 9.8: Extract 15, replace it with , 14 sifts up to root.
13
13 12
7 13
7 -INF
7 6 -INF -INF
8 13
8 4 13 3
10 12
10 9
5 10 9 1
12 11
12 2 11 -INF
Figure 9.9: Extract 14, replace it with , 13 sifts up to root.
362CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
We can reuse the binary tree denition given in the rst chapter of this
book to represent tournament tree. In order to back-track from leaf to the root,
every node should hold a reference to its parent (concept of pointer in some
environment such as ANSI C):
struct Node {
Key key;
struct Node left, right, parent;
};
To build a tournament tree from a list of elements (suppose the number of
elements are 2
m
for some m), we can rst wrap each element as a leaf, so that
we obtain a list of binary trees. We take every two trees from this list, compare
their keys, and form a new binary tree with the bigger key as the root; the two
trees are set as the left and right children of this new binary tree. Repeat this
operation to build a new list of trees. The height of each tree is increased by 1.
Note that the size of the tree list halves after such a pass, so that we can keep
reducing the list until there is only one tree left. And this tree is the nally
built tournament tree.
function Build-Tree(A)
T
for each x A do
t Create-Node
Key(t) x
Append(T, t)
while |T| > 1 do
T


for every t
1
, t
2
T do
t Create-Node
Key(t) Max(Key(t
1
), Key(t
2
))
Left(t) t
1
Right(t) t
2
Parent(t
1
) t
Parent(t
2
) t
Append(T

, t)
T T

return T[1]
Suppose the length of the list A is N, this algorithm rstly traverses the list
to build tree, which is linear to N time. Then it repeatedly compares pairs,
which loops proportion to N +
N
2
+
N
4
+... +2 = 2N. So the total performance
is bound to O(N) time.
The following ANSI C program implements this tournament tree building
algorithm.
struct Node build(const Key xs, int n) {
int i;
struct Node t, ts = (struct Node) malloc(sizeof(struct Node) n);
for (i = 0; i < n; ++i)
ts[i] = leaf(xs[i]);
for (; n > 1; n /= 2)
for (i = 0; i < n; i += 2)
9.4. MAJOR IMPROVEMENT 363
ts[i/2] = branch(max(ts[i]key, ts[i+1]key), ts[i], ts[i+1]);
t = ts[0];
free(ts);
return t;
}
The type of key can be dened somewhere, for example:
typedef int Key;
Function leaf(x) creats a leaf node, with value x as key, and sets all its
elds, left, right and parent to NIL. While function branch(key, left, right)
creates a branch node, and links the new created node as parent of its two
children if they are not empty. For the sake of brevity, we skip the detail of
them. They are left as exercise to the reader, and the complete program can be
downloaded along with this book.
Some programming environments, such as Python provides tool to iterate
every two elements at a time, for example:
for x, y in zip([iter(ts)]2):
We skip such language specic feature, readers can refer to the Python ex-
ample program along with this book for details.
When the maximum element is extracted from the tournament tree, we
replace it with , and repeatedly replace all these values from the root to the
leaf. Next, we back-track to root through the parent eld, and determine the
new maximum element.
function Extract-Max(T)
m Key(T)
Key(T)
while Leaf?(T) do The top down pass
if Key(Left(T)) = m then
T Left(T)
else
T Right(T)
Key(T)
while Parent(T) = do The bottom up pass
T Parent(T)
Key(T) Max(Key(Left(T)), Key(Right(T)))
return m
This algorithm returns the extracted maximum element, and modies the
tournament tree in-place. Because we cant represent in real program by
limited length of word, one approach is to dene a relative negative big number,
which is less than all the elements in the tournament tree, for example, suppose
all the elements are greater than -65535, we can dene negative innity as below:
#define N_INF -65535
We can implements this algorithm as the following ANSI C example program.
Key pop(struct Node t) {
Key x = tkey;
tkey = N_INF;
while (!isleaf(t)) {
364CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
t = tleftkey == x ? tleft : tright;
tkey = N_INF;
}
while (tparent) {
t = tparent;
tkey = max(tleftkey, trightkey);
}
return x;
}
The behavior of Extract-Max is quite similar to the pop operation for
some data structures, such as queue, and heap, thus we name it as pop in this
code snippet.
Algorithm Extract-Max process the tree in tow passes, one is top-down,
then a bottom-up along the path that the champion team wins the world cup.
Because the tournament tree is well balanced, the length of this path, which
is the height of the tree, is bound to O(lg N), where N is the number of the
elements to be sorted (which are equal to the number of leaves). Thus the
performance of this algorithm is O(lg N).
Its possible to realize the tournament knock out sort now. We build a
tournament tree from the elements to be sorted, then continuously extract the
maximum. If we want to sort in monotonically increase order, we put the rst
extracted one to the right most, then insert the further extracted elements one
by one to left; Otherwise if we want to sort in decrease order, we can just append
the extracted elements to the result. Below is the algorithm sorts elements in
ascending order.
procedure Sort(A)
T Build-Tree(A)
for i |A| down to 1 do
A[i] Extract-Max(T)
Translating it to ANSI C example program is straightforward.
void tsort(Key xs, int n) {
struct Node t = build(xs, n);
while(n)
xs[--n] = pop(t);
release(t);
}
This algorithm rstly takes O(N) time to build the tournament tree, then
performs N pops to select the maximum elements so far left in the tree. Since
each pop operation is bound to O(lg N), thus the total performance of tourna-
ment knock out sorting is O(N lg N).
Rene the tournament knock out
Its possible to design the tournament knock out algorithm in purely functional
approach. And well see that the two passes (rst top-down replace the cham-
pion with , then bottom-up determine the new champion) in pop operation
can be combined in recursive manner, so that we neednt the parent eld any
more. We can re-use the functional binary tree denition as the following ex-
ample Haskell code.
9.4. MAJOR IMPROVEMENT 365
data Tr a = Empty | Br (Tr a) a (Tr a)
Thus a binary tree is either empty or a branch node contains a key, a left
sub tree and a right sub tree. Both children are again binary trees.
Weve use hard coded big negative number to represents . However, this
solution is ad-hoc, and it forces all elements to be sorted are greater than this
pre-dened magic number. Some programming environments support algebraic
type, so that we can dene negative innity explicitly. For instance, the below
Haskell program setups the concept of innity
1
.
data Infinite a = NegInf | Only a | Inf deriving (Eq, Ord)
From now on, we switch back to use the min() function to determine the
winner, so that the tournament selects the minimum instead of the maximum
as the champion.
Denote function key(T) returns the key of the tree rooted at T. Function
wrap(x) wraps the element x into a leaf node. Function tree(l, k, r) creates a
branch node, with k as the key, l and r as the two children respectively.
The knock out process, can be represented as comparing two trees, picking
the smaller key as the new key, and setting these two trees as children:
branch(T
1
, T
2
) = tree(T
1
, min(key(T
1
), key(T
2
)), T
2
) (9.12)
This can be implemented in Haskell word by word:
branch t1 t2 = Br t1 (min (key t1) (key t2)) t2
There is limitation in our tournament sorting algorithm so far. It only
accepts collection of elements with size of 2
m
, or we cant build a complete
binary tree. This can be actually solved in the tree building process. Remind
that we pick two trees every time, compare and pick the winner. This is perfect
if there are always even number of trees. Considering a case in football match,
that one team is absent for some reason (sever ight delay or whatever), so that
there left one team without a challenger. One option is to make this team the
winner, so that it will attend the further games. Actually, we can use the similar
approach.
To build the tournament tree from a list of elements, we wrap every element
into a leaf, then start the building process.
build(L) = build

({wrap(x)|x L}) (9.13)


The build

(T) function terminates when there is only one tree left in T, which
is the champion. This is the trivial edge case. Otherwise, it groups every two
trees in a pair to determine the winners. When there are odd numbers of trees,
it just makes the last tree as the winner to attend the next level of tournament
and recursively repeats the building process.
build

(T) =
_
T : |T| 1
build

(pair(T)) : otherwise
(9.14)
1
The order of the denition of NegInf, regular number, and Inf is signicant if we want
to derive the default, correct comparing behavior of Ord. Anyway, its possible to specify the
detailed order by make it as an instance of Ord. However, this is Language specic feature
which is out of the scope of this book. Please refer to other textbook about Haskell.
366CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
Note that this algorithm actually handles another special cases, that the list
to be sort is empty. The result is obviously empty.
Denote T = {T
1
, T
2
, ...} if there are at least two trees, and T

represents the
left trees by removing the rst two. Function pair(T) is dened as the following.
pair(T) =
_
{branch(T
1
, T
2
)} pair(T

) : |T| 2
T : otherwise
(9.15)
The complete tournament tree building algorithm can be implemented as
the below example Haskell program.
fromList :: (Ord a) [a] Tr (Infinite a)
fromList = build (map wrap) where
build [] = Empty
build [t] = t
build ts = build $ pair ts
pair (t1:t2:ts) = (branch t1 t2):pair ts
pair ts = ts
When extracting the champion (the minimum) from the tournament tree,
we need examine either the left child sub-tree or the right one has the same key,
and recursively extract on that tree until arrive at the leaf node. Denote the
left sub-tree of T as L, right sub-tree as R, and K as its key. We can dene this
popping algorithm as the following.
pop(T) =
_
_
_
tree(, , ) : L = R =
tree(L

, min(key(L

), key(R)), R) : K = key(L), L

= pop(L)
tree(L, min(key(L), key(R

)), R

) : K = key(R), R

= pop(R)
(9.16)
Its straightforward to translate this algorithm into example Haskell code.
pop (Br Empty _ Empty) = Br Empty Inf Empty
pop (Br l k r) | k == key l = let l = pop l in Br l (min (key l) (key r)) r
| k == key r = let r = pop r in Br l (min (key l) (key r)) r
Note that this algorithm only removes the current champion without return-
ing it. So its necessary to dene a function to get the champion at the root
node.
top(T) = key(T) (9.17)
With these functions dened, tournament knock out sorting can be formal-
ized by using them.
sort(L) = sort

(build(L)) (9.18)
Where sort

(T) continuously pops the minimum element to form a result list


sort

(T) =
_
: T = key(T) =
{top(T)} sort

(pop(T)) : otherwise
(9.19)
The rest of the Haskell code is given below to complete the implementation.
9.4. MAJOR IMPROVEMENT 367
top = only key
tsort :: (Ord a) [a] [a]
tsort = sort fromList where
sort Empty = []
sort (Br _ Inf _) = []
sort t = (top t) : (sort $ pop t)
And the auxiliary function only, key, wrap accomplished with explicit in-
nity support are list as the following.
only (Only x) = x
key (Br _ k _ ) = k
wrap x = Br Empty (Only x) Empty
Exercise 9.3
Implement the helper function leaf(), branch, max() lsleaf(), and
release() to complete the imperative tournament tree program.
Implement the imperative tournament tree in a programming language
support GC (garbage collection).
Why can our tournament tree knock out sort algorithm handle duplicated
elements (elements with same value)? We say a sorting algorithm stable, if
it keeps the original order of elements with same value. Is the tournament
tree knock out sorting stable?
Design an imperative tournament tree knock out sort algorithm, which
satises the following:
Can handle arbitrary number of elements;
Without using hard coded negative innity, so that it can take ele-
ments with any value.
Compare the tournament tree knock out sort algorithm and binary tree
sort algorithm, analyze eciency both in time and space.
Compare the heap sort algorithm and binary tree sort algorithm, and do
same analysis for them.
9.4.2 Final improvement by using heap sort
We manage improving the performance of selection based sorting to O(N lg N)
by using tournament knock out. This is the limit of comparison based sort
according to [1]. However, there are still rooms for improvement. After sorting,
there lefts a complete binary tree with all leaves and branches hold useless
innite values. This isnt space ecient at all. Can we release the nodes when
popping?
Another observation is that if there are N elements to be sorted, we actually
allocate about 2N tree nodes. N for leaves and N for branches. Is there any
better way to halve the space usage?
368CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
The nal sorting structure described in equation 9.19 can be easily uniformed
to a more general one if we treat the case that the tree is empty if its root holds
innity as key:
sort

(T) =
_
: T =
{top(T)} sort

(pop(T)) : otherwise
(9.20)
This is exactly as same as the one of heap sort we gave in previous chapter.
Heap always keeps the minimum (or the maximum) on the top, and provides
fast pop operation. The binary heap by implicit array encodes the tree structure
in array index, so there arent any extra spaces allocated except for the N array
cells. The functional heaps, such as leftist heap and splay heap allocate N nodes
as well. Well introduce more heaps in next chapter which perform well in many
aspects.
9.5 Short summary
In this chapter, we present the evolution process of selection based sort. selection
sort is easy and commonly used as example to teach students about embedded
looping. It has simple and straightforward structure, but the performance is
quadratic. In this chapter, we do see that there exists ways to improve it not
only by some ne tuning, but also fundamentally change the data structure,
which leads to tournament knock out and heap sort.
Bibliography
[1] Donald E. Knuth. The Art of Computer Programming, Volume 3: Sorting
and Searching (2nd Edition). Addison-Wesley Professional; 2 edition (May
4, 1998) ISBN-10: 0201896850 ISBN-13: 978-0201896855
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[3] Wikipedia. Strict weak order. http://en.wikipedia.org/wiki/Strict weak order
[4] Wikipedia. FIFA world cup. http://en.wikipedia.org/wiki/FIFA World Cup
Binomial heap, Fibonacci heap, and pairing heap
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
369
370 Binomial heap, Fibonacci heap, and pairing heap
Chapter 10
Binomial heap, Fibonacci
heap, and pairing heap
10.1 Introduction
In previous chapter, we mentioned that heaps can be generalized and imple-
mented with varies of data structures. However, only binary heaps are focused
so far no matter by explicit binary trees or implicit array.
Its quite natural to extend the binary tree to K-ary [1] tree. In this chapter,
we rst show Binomial heaps which is actually consist of forest of K-ary trees.
Binomial heaps gain the performance for all operations to O(lg N), as well as
keeping the nding minimum element to O(1) time.
If we delay some operations in Binomial heaps by using lazy strategy, it
turns to be Fibonacci heap.
All binary heaps we have shown perform no less than O(lg N) time for merg-
ing, well show its possible to improve it to O(1) with Fibonacci heap, which
is quite helpful to graph algorithms. Actually, Fibonacci heap achieves almost
all operations to good amortized time bound as O(1), and left the heap pop to
O(lg N).
Finally, well introduce about the pairing heaps. It has the best performance
in practice although the proof of it is still a conjecture for the time being.
10.2 Binomial Heaps
10.2.1 Denition
Binomial heap is more complex than most of the binary heaps. However, it has
excellent merge performance which bound to O(lg N) time. A binomial heap is
consist of a list of binomial trees.
Binomial tree
In order to explain why the name of the tree is binomial, lets review the
famous Pascals triangle (Also know as the Jia Xians triangle to memorize the
Chinese methematican Jia Xian (1010-1070).) [4].
371
372CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
...
In each row, the numbers are all binomial coecients. There are many
ways to gain a series of binomial coecient numbers. One of them is by using
recursive composition. Binomial trees, as well, can be dened in this way as the
following.
A binomial tree of rank 0 has only a node as the root;
A binomial tree of rank N is consist of two rank N 1 binomial trees,
Among these 2 sub trees, the one has the bigger root element is linked as
the leftmost child of the other.
We denote a binomial tree of rank 0 as B
0
, and the binomial tree of rank n
as B
n
.
Figure 10.1 shows a B
0
tree and how to link 2 B
n1
trees to a B
n
tree.
(a) A B
0
tree.
rank=n-1
rank=n-1 ...
...
(b) Linking 2 B
n1
trees yields a B
n
tree.
Figure 10.1: Recursive denition of binomial trees
With this recursive denition, it easy to draw the form of binomial trees of
rank 0, 1, 2, ..., as shown in gure 10.2
Observing the binomial trees reveals some interesting properties. For each
rank N binomial tree, if counting the number of nodes in each row, it can be
found that it is the binomial number.
For instance for rank 4 binomial tree, there is 1 node as the root; and in the
second level next to root, there are 4 nodes; and in 3rd level, there are 6 nodes;
and in 4-th level, there are 4 nodes; and the 5-th level, there is 1 node. They
10.2. BINOMIAL HEAPS 373
(a) B
0
tree;
1
0
(b) B
1
tree;
2
1 0
0
(c) B
2
tree;
3
2 1 0
1 0
0
0
(d) B
3
tree;
4
3 2 1 0
2 1 0
1 0
0
0
1 0
0
0
...
(e) B
4
tree;
Figure 10.2: Forms of binomial trees with rank = 0, 1, 2, 3, 4, ...
374CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
are exactly 1, 4, 6, 4, 1 which is the 5th row in Pascals triangle. Thats why
we call it binomial tree.
Another interesting property is that the total number of node for a binomial
tree with rank N is 2
N
. This can be proved either by binomial theory or the
recursive denition directly.
Binomial heap
With binomial tree dened, we can introduce the denition of binomial heap.
A binomial heap is a set of binomial trees (or a forest of binomial trees) that
satised the following properties.
Each binomial tree in the heap conforms to heap property, that the key
of a node is equal or greater than the key of its parent. Here the heap is
actually min-heap, for max-heap, it changes to equal or less than. In this
chapter, we only discuss about min-heap, and max-heap can be equally
applied by changing the comparison condition.
There is at most one binomial tree which has the rank r. In other words,
there are no two binomial trees have the same rank.
This denition leads to an important result that for a binomial heap contains
N elements, and if convert N to binary format yields a
0
, a
1
, a
2
, ..., a
m
, where a
0
is the LSB and a
m
is the MSB, then for each 0 i m, if a
i
= 0, there is no
binomial tree of rank i and if a
i
= 1, there must be a binomial tree of rank i.
For example, if a binomial heap contains 5 element, as 5 is (LSB)101(MSB),
then there are 2 binomial trees in this heap, one tree has rank 0, the other has
rank 2.
Figure 10.3 shows a binomial heap which have 19 nodes, as 19 is (LSB)11001(MSB)
in binary format, so there is a B
0
tree, a B
1
tree and a B
4
tree.
18 3
37
6
8 29 10 44
30 23 22
45 32
55
24
48 31
50
17
Figure 10.3: A binomial heap with 19 elements
10.2. BINOMIAL HEAPS 375
Data layout
There are two ways to dene K-ary trees imperatively. One is by using left-
child, right-sibling approach[2]. It is compatible with the typical binary tree
structure. For each node, it has two elds, left eld and right eld. We use the
left eld to point to the rst child of this node, and use the right eld to point to
the sibling node of this node. All siblings are represented as a single directional
linked list. Figure 10.4 shows an example tree represented in this way.
NIL R
C1 C2 ... Cn
C1 C2 ... Cm
Figure 10.4: Example tree represented in left-child, right-sibling way. R is the
root node, it has no sibling, so it right side is pointed to NIL. C
1
, C
2
, ..., C
n
are children of R. C
1
is linked from the left side of R, other siblings of C
1
are
linked one next to each other on the right side of C
1
. C

2
, ..., C

m
are children of
C
1
.
The other way is to use the library dened collection container, such as array
or list to represent all children of a node.
Since the rank of a tree plays very important role, we also dened it as a
eld.
For left-child, right-sibling method, we dened the binomial tree as the
following.
1
class BinomialTree:
def __init__(self, x = None):
self.rank = 0
self.key = x
self.parent = None
self.child = None
self.sibling = None
When initialize a tree with a key, we create a leaf node, set its rank as zero
and all other elds are set as NIL.
It quite nature to utilize pre-dened list to represent multiple children as
below.
class BinomialTree:
def __init__(self, x = None):
self.rank = 0
self.key = x
self.parent = None
self.children = []
1
C programs are also provided along with this book.
376CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
For purely functional settings, such as in Haskell language, binomial tree are
dened as the following.
data BiTree a = Node { rank :: Int
, root :: a
, children :: [BiTree a]}
While binomial heap are dened as a list of binomial trees (a forest) with
ranks in monotonically increase order. And as another implicit constraint, there
are no two binomial trees have the same rank.
type BiHeap a = [BiTree a]
10.2.2 Basic heap operations
Linking trees
Before dive into the basic heap operations such as pop and insert, Well rst
realize how to link two binomial trees with same rank into a bigger one. Accord-
ing to the denition of binomial tree and heap property that the root always
contains the minimum key, we rstly compare the two root values, select the
smaller one as the new root, and insert the other tree as the rst child in front
of all other children. Suppose function Key(T), Children(T), and Rank(T)
access the key, children and rank of a binomial tree respectively.
link(T
1
, T
2
) =
_
node(r + 1, x, {T
2
} C
1
) : x < y
node(r + 1, y, {T
1
} C
2
) : otherwise
(10.1)
Where
x = Key(T
1
)
y = Key(T
2
)
r = Rank(T
1
) = Rank(T
2
)
C
1
= Children(T
1
)
C
2
= Children(T
2
)
x
y ...
...
Figure 10.5: Suppose x < y, insert y as the rst child of x.
Note that the link operation is bound to O(1) time if the is a constant
time operation. Its easy to translate the link function to Haskell program as
the following.
10.2. BINOMIAL HEAPS 377
link :: (Ord a) BiTree a BiTree a BiTree a
link t1@(Node r x c1) t2@(Node _ y c2) =
if x<y then Node (r+1) x (t2:c1)
else Node (r+1) y (t1:c2)
Its possible to realize the link operation in imperative way. If we use left
child, right sibling approach, we just link the tree which has the bigger key to
the left side of the others key, and link the children of it to the right side as
sibling. Figure 10.6 shows the result of one case.
1: function Link(T
1
, T
2
)
2: if Key(T
2
) < Key(T
1
) then
3: Exchange T
1
T
2
4: Sibling(T
2
) Child(T
1
)
5: Child(T
1
) T
2
6: Parent(T
2
) T
1
7: Rank(T
1
) Rank(T
1
) + 1
8: return T
1
x
y ...
...
Figure 10.6: Suppose x < y, link y to the left side of x and link the original
children of x to the right side of y.
And if we use a container to manage all children of a node, the algorithm is
like below.
1: function Link(T
1
, T
2
)
2: if Key(T
2
) < Key(T
1
) then
3: Exchange T
1
T
2
4: Parent(T
2
) T
1
5: Insert-Before(Children(T
1
), T
2
)
6: Rank(T
1
) Rank(T
1
) + 1
7: return T
1
Its easy to translate both algorithms to real program. Here we only show
the Python program of Link for illustration purpose
2
.
def link(t1, t2):
if t2.key < t1.key:
(t1, t2) = (t2, t1)
t2.parent = t1
t1.children.insert(0, t2)
t1.rank = t1.rank + 1
return t1
2
The C and C++ programs are also available along with this book
378CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Exercise 10.1
Implement the tree-linking program in your favorite language with left-child,
right-sibling method.
We mentioned linking is a constant time algorithm and it is true when using
left-child, right-sibling approach, However, if we use container to manage the
children, the performance depends on the concrete implementation of the con-
tainer. If it is plain array, the linking time will be proportion to the number of
children. In this chapter, we assume the time is constant. This is true if the
container is implemented in linked-list.
Insert a new element to the heap (push)
As the rank of binomial trees in a forest is monotonically increasing, by using
the link function dened above, its possible to dene an auxiliary function, so
that we can insert a new tree, with rank no bigger than the smallest one, to the
heap which is a forest actually.
Denote the non-empty heap as H = {T
1
, T
2
, ..., T
n
}, we dene
insertT(H, T) =
_
_
_
{T} : H =
{T} H : Rank(T) < Rank(T
1
)
insertT(H

, link(T, T
1
)) : otherwise
(10.2)
where
H

= {T
2
, T
3
, ..., T
n
}
The idea is that for the empty heap, we set the new tree as the only element
to create a singleton forest; otherwise, we compare the ranks of the new tree
and the rst tree in the forest, if they are same, we link them together, and
recursively insert the linked result (a tree with rank increased by one) to the
rest of the forest; If they are not same, since the pre-condition constraints the
rank of the new tree, it must be the smallest, we put this new tree in front of
all the other trees in the forest.
From the binomial properties mentioned above, there are at most O(lg N)
binomial trees in the forest, where N is the total number of nodes. Thus function
insertT performs at most O(lg N) times linking, which are all constant time
operation. So the performance of insertT is O(lg N).
3
The relative Haskell program is given as below.
insertTree :: (Ord a) BiHeap a BiTree a BiHeap a
insertTree [] t = [t]
insertTree ts@(t:ts) t = if rank t < rank t then t:ts
else insertTree ts (link t t)
3
There is interesting observation by comparing this operation with adding two binary
numbers. Which will lead to topic of numeric representation[6].
10.2. BINOMIAL HEAPS 379
With this auxiliary function, its easy to realize the insertion. We can wrap
the new element to be inserted as the only leaf of a tree, then insert this tree to
the binomial heap.
insert(H, x) = insertT(H, node(0, x, )) (10.3)
And we can continuously build a heap from a series of elements by folding.
For example the following Haskell dene a helper function fromList.
fromList = foldl insert []
Since wrapping an element as a singleton tree takes O(1) time, the real work
is done in insertT, the performance of binomial heap insertion is bound to
O(lg N).
The insertion algorithm can also be realized with imperative approach.
Algorithm 4 Insert a tree with left-child-right-sibling method.
1: function Insert-Tree(H, T)
2: while H = Rank(Head(H)) = Rank(T) do
3: (T
1
, H) Extract-Head(H)
4: T Link(T, T
1
)
5: Sibling(T) H
6: return T
Algorithm 4 continuously linking the rst tree in a heap with the new tree
to be inserted if they have the same rank. After that, it puts the linked-list of
the rest trees as the sibling, and returns the new linked-list.
If using a container to manage the children of a node, the algorithm can be
given in Algorithm 5.
Algorithm 5 Insert a tree with children managed by a container.
1: function Insert-Tree(H, T)
2: while H = Rank(H[0]) = Rank(T) do
3: T
1
Pop(H)
4: T Link(T, T
1
)
5: Head-Insert(H, T)
6: return H
In this algorithm, function Pop removes the rst tree T
1
= H[0] from the
forest. And function Head-Insert, insert a new tree before any other trees in
the heap, so that it becomes the rst element in the forest.
With either Insert-Tree or Insert-Tree dened. Realize the binomial
heap insertion is trivial.
Algorithm 6 Imperative insert algorithm
1: function Insert(H, x)
2: return Insert-Tree(H, Node(0, x, ))
The following python program implement the insert algorithm by using a
container to manage sub-trees. the left-child, right-sibling program is left as
an exercise.
380CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
def insert_tree(ts, t):
while ts !=[] and t.rank == ts[0].rank:
t = link(t, ts.pop(0))
ts.insert(0, t)
return ts
def insert(h, x):
return insert_tree(h, BinomialTree(x))
Exercise 10.2
Write the insertion program in your favorite imperative programming lan-
guage by using the left-child, right-sibling approach.
Merge two heaps
When merge two binomial heaps, we actually try to merge two forests of bino-
mial trees. According to the denition, there cant be two trees with the same
rank and the ranks are in monotonically increasing order. Our strategy is very
similar to merge sort. That in every iteration, we take the rst tree from each
forest, compare their ranks, and pick the smaller one to the result heap; if the
ranks are equal, we then perform linking to get a new tree, and recursively insert
this new tree to the result of merging the rest trees.
Figure 10.7 illustrates the idea of this algorithm. This method is dierent
from the one given in [2].
We can formalize this idea with a function. For non-empty cases, we denote
the two heaps as H
1
= {T
1
, T
2
, ...} and H
2
= {T

1
, T

2
, ...}. Let H

1
= {T
2
, T
3
, ...}
and H

2
= {T

2
, T

3
, ...}.
merge(H
1
, H
2
) =
_

_
H
1
: H
2
=
H
2
: H
1
=
{T
1
} merge(H

1
, H
2
) : Rank(T
1
) < Rank(T

1
)
{T

1
} merge(H
1
, H

2
) : Rank(T
1
) > Rank(T

1
)
insertT(merge(H

1
, H

2
), link(T
1
, T

1
)) : otherwise
(10.4)
To analysis the performance of merge, suppose there are m
1
trees in H
1
,
and m
2
trees in H
2
. There are at most m
1
+ m
2
trees in the merged result.
If there are no two trees have the same rank, the merge operation is bound to
O(m
1
+m
2
). While if there need linking for the trees with same rank, insertT
performs at most O(m
1
+ m
2
) time. Consider the fact that m
1
= 1 + lg N
1

and m
2
= 1 + lg N
2
, where N
1
, N
2
are the numbers of nodes in each heap,
and lg N
1
+ lg N
2
2lg N, where N = N
1
+ N
2
, is the total number of
nodes. the nal performance of merging is O(lg N).
Translating this algorithm to Haskell yields the following program.
merge:: (Ord a) BiHeap a BiHeap a BiHeap a
merge ts1 [] = ts1
merge [] ts2 = ts2
merge ts1@(t1:ts1) ts2@(t2:ts2)
| rank t1 < rank t2 = t1:(merge ts1 ts2)
| rank t1 > rank t2 = t2:(merge ts1 ts2)
| otherwise = insertTree (merge ts1 ts2) (link t1 t2)
10.2. BINOMIAL HEAPS 381
t1 ...
Rank(t1)<Rank(t2)?
t2 ...
T1 T2 ... Ti ...
the smaller
(a) Pick the tree with smaller rank to
the result.
t1 ...
Rank(t1)=Rank(t2)
merge rest
t2 ...
Ti
link(t1, t2)
T1 T2 ... +
insert
(b) If two trees have same rank, link them to a new tree, and recursively insert it
to the merge result of the rest.
Figure 10.7: Merge two heaps.
382CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Merge algorithm can also be described in imperative way as shown in algo-
rithm 7.
Algorithm 7 imperative merge two binomial heaps
1: function Merge(H
1
, H
2
)
2: if H
1
= then
3: return H
2
4: if H
2
= then
5: return H
1
6: H
7: while H
1
= H
2
= do
8: T
9: if Rank(H
1
) < Rank(H
2
) then
10: (T, H
1
) Extract-Head(H
1
)
11: else if Rank(H
2
) < Rank(H
1
) then
12: (T, H
2
) Extract-Head(H
2
)
13: else Equal rank
14: (T
1
, H
1
) Extract-Head(H
1
)
15: (T
2
, H
2
) Extract-Head(H
2
)
16: T Link(T
1
, T
2
)
17: Append-Tree(H, T)
18: if H
1
= then
19: Append-Trees(H, H
1
)
20: if H
2
= then
21: Append-Trees(H, H
2
)
22: return H
Since both heaps contain binomial trees with rank in monotonically increas-
ing order. Each iteration, we pick the tree with smallest rank and append it to
the result heap. If both trees have same rank we perform linking rst. Consider
the Append-Tree algorithm, The rank of the new tree to be appended, cant
be less than any other trees in the result heap according to our merge strategy,
however, it might be equal to the rank of the last tree in the result heap. This
can happen if the last tree appended are the result of linking, which will increase
the rank by one. In this case, we must link the new tree to be inserted with the
last tree. In below algorithm, suppose function Last(H) refers to the last tree
in a heap, and Append(H, T) just appends a new tree at the end of a forest.
1: function Append-Tree(H, T)
2: if H = Rank(T) = Rank(Last(H)) then
3: Last(H) Link(T, Last(H))
4: else
5: Append(H, T)
Function Append-Trees repeatedly call this function, so that it can append
all trees in a heap to the other heap.
1: function Append-Trees(H
1
, H
2
)
2: for each T H
2
do
3: H
1
Append-Tree(H
1
, T)
10.2. BINOMIAL HEAPS 383
The following Python program translates the merge algorithm.
def append_tree(ts, t):
if ts != [] and ts[-1].rank == t.rank:
ts[-1] = link(ts[-1], t)
else:
ts.append(t)
return ts
def append_trees(ts1, ts2):
return reduce(append_tree, ts2, ts1)
def merge(ts1, ts2):
if ts1 == []:
return ts2
if ts2 == []:
return ts1
ts = []
while ts1 != [] and ts2 != []:
t = None
if ts1[0].rank < ts2[0].rank:
t = ts1.pop(0)
elif ts2[0].rank < ts1[0].rank:
t = ts2.pop(0)
else:
t = link(ts1.pop(0), ts2.pop(0))
ts = append_tree(ts, t)
ts = append_trees(ts, ts1)
ts = append_trees(ts, ts2)
return ts
Exercise 10.3
The program given above uses a container to manage sub-trees. Implement
the merge algorithm in your favorite imperative programming language with
left-child, right-sibling approach.
Pop
Among the forest which forms the binomial heap, each binomial tree conforms
to heap property that the root contains the minimum element in that tree.
However, the order relationship of these roots can be arbitrary. To nd the
minimum element in the heap, we can select the smallest root of these trees.
Since there are lg N binomial trees, this approach takes O(lg N) time.
However, after we locate the minimum element (which is also know as the
top element of a heap), we need remove it from the heap and keep the binomial
property to accomplish heap-pop operation. Suppose the forest forms the bino-
mial heap consists trees of B
i
, B
j
, ..., B
p
, ..., B
m
, where B
k
is a binomial tree of
rank k, and the minimum element is the root of B
p
. If we delete it, there will
be p children left, which are all binomial trees with ranks p 1, p 2, ..., 0.
One tool at hand is that we have dened O(lg N) merge function. A possible
approach is to reverse the p children, so that their ranks change to monotonically
increasing order, and forms a binomial heap H
p
. The rest of trees is still a
384CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
binomial heap, we represent it as H

= H B
p
. Merging H
p
and H

given the
nal result of pop. Figure 10.8 illustrates this idea.
Figure 10.8: Pop the minimum element from a binomial heap.
In order to realize this algorithm, we rst need to dene an auxiliary function,
which can extract the tree contains the minimum element at root from the forest.
extractMin(H) =
_
_
_
(T, ) : H is a singleton as {T}
(T
1
, H

) : Root(T
1
) < Root(T

)
(T

, {T
1
} H

) : otherwise
(10.5)
where
H = {T
1
, T
2
, ...} for the non-empty forest case;
H

= {T
2
, T
3
, ...} is the forest without the rst tree;
(T

, H

) = extractMin(H

)
The result of this function is a tuple. The rst part is the tree which has the
minimum element at root, the second part is the rest of the trees after remove
the rst part from the forest.
This function examine each of the trees in the forest thus is bound to O(lg N)
time.
The relative Haskell program can be give respectively.
extractMin :: (Ord a) BiHeap a (BiTree a, BiHeap a)
10.2. BINOMIAL HEAPS 385
extractMin [t] = (t, [])
extractMin (t:ts) = if root t < root t then (t, ts)
else (t, t:ts)
where
(t, ts) = extractMin ts
With this function dened, to return the minimum element is trivial.
findMin :: (Ord a) BiHeap a a
findMin = root fst. extractMin
Of course, its possible to just traverse forest and pick the minimum root
without remove the tree for this purpose. Below imperative algorithm describes
it with left child, right sibling approach.
1: function Find-Minimum(H)
2: T Head(H)
3: min
4: while T = do
5: if Key(T)< min then
6: min Key(T)
7: T Sibling(T)
8: return min
While if we manage the children with collection containers, the link list
traversing is abstracted as to nd the minimum element among the list. The
following Python program shows about this situation.
def find_min(ts):
min_t = min(ts, key=lambda t: t.key)
return min_t.key
Next we dene the function to delete the minimum element from the heap
by using extractMin.
delteMin(H) = merge(reverse(Children(T)), H

) (10.6)
where
(T, H

) = extractMin(H)
Translate the formula to Haskell program is trivial and well skip it.
To realize the algorithm in procedural way takes extra eorts including list
reversing etc. We left these details as exercise to the reader. The following
pseudo code illustrate the imperative pop algorithm
1: function Extract-Min(H)
2: (T
min
, H) Extract-Min-Tree(H)
3: H Merge(H, Reverse(Children(T
min
)))
4: return (Key(T
min
), H)
With pop operation dened, we can realize heap sort by creating a binomial
heap from a series of numbers, than keep popping the smallest number from the
heap till it becomes empty.
sort(xs) = heapSort(fromList(xs)) (10.7)
386CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
And the real work is done in function heapSort.
heapSort(H) =
_
: H =
{findMin(H)} heapSort(deleteMin(H)) : otherwise
(10.8)
Translate to Haskell yields the following program.
heapSort :: (Ord a) [a] [a]
heapSort = hsort fromList where
hsort [] = []
hsort h = (findMin h):(hsort $ deleteMin h)
Function fromList can be dened by folding. Heap sort can also be expressed
in procedural way respectively. Please refer to previous chapter about binary
heap for detail.
Exercise 10.4
Write the program to return the minimum element from a binomial heap
in your favorite imperative programming language with left-child, right-
sibling approach.
Realize the Extract-Min-Tree() Algorithm.
For left-child, right-sibling approach, reversing all children of a tree is
actually reversing a single-direct linked-list. Write a program to reverse
such linked-list in your favorite imperative programming language.
More words about binomial heap
As what we have shown that insertion and merge are bound to O(lg N) time.
The results are all ensure for the worst case. The amortized performance are
O(1). We skip the proof for this fact.
10.3 Fibonacci Heaps
Its interesting that why the name is given as Fibonacci heap. In fact, there is
no direct connection from the structure design to Fibonacci series. The inventors
of Fibonacci heap, Michael L. Fredman and Robert E. Tarjan, utilized the
property of Fibonacci series to prove the performance time bound, so they
decided to use Fibonacci to name this data structure.[2]
10.3.1 Denition
Fibonacci heap is essentially a lazy evaluated binomial heap. Note that, it
doesnt mean implementing binomial heap in lazy evaluation settings, for in-
stance Haskell, brings Fibonacci heap automatically. However, lazy evaluation
setting does help in realization. For example in [5], presents a elegant imple-
mentation.
10.3. FIBONACCI HEAPS 387
Fibonacci heap has excellent performance theoretically. All operations ex-
cept for pop are bound to amortized O(1) time. In this section, well give an
algorithm dierent from some popular textbook[2]. Most of the ideas present
here are based on Okasakis work[6].
Lets review and compare the performance of binomial heap and Fibonacci
heap (more precisely, the performance goal of Fibonacci heap).
operation Binomial heap Fibonacci heap
insertion O(lg N) O(1)
merge O(lg N) O(1)
top O(lg N) O(1)
pop O(lg N) amortized O(lg N)
Consider where is the bottleneck of inserting a new element x to binomial
heap. We actually wrap x as a singleton leaf and insert this tree into the heap
which is actually a forest.
During this operation, we inserted the tree in monotonically increasing order
of rank, and once the rank is equal, recursively linking and inserting will happen,
which lead to the O(lg N) time.
As the lazy strategy, we can postpone the ordered-rank insertion and merging
operations. On the contrary, we just put the singleton leaf to the forest. The
problem is that when we try to nd the minimum element, for example the top
operation, the performance will be bad, because we need check all trees in the
forest, and there arent only O(lg N) trees.
In order to locate the top element in constant time, we must remember where
is the tree contains the minimum element as root.
Based on this idea, we can reuse the denition of binomial tree and give the
denition of Fibonacci heap as the following Haskell program for example.
data BiTree a = Node { rank :: Int
, root :: a
, children :: [BiTree a]}
The Fibonacci heap is either empty or a forest of binomial trees with the
minimum element stored in a special one explicitly.
data FibHeap a = E | FH { size :: Int
, minTree :: BiTree a
, trees :: [BiTree a]}
For convenient purpose, we also add a size eld to record how many elements
are there in a heap.
The data layout can also be dened in imperative way as the following ANSI
C code.
struct node{
Key key;
struct node next, prev, parent, children;
int degree; / As known as rank /
int mark;
};
struct FibHeap{
struct node roots;
struct node minTr;
int n; / number of nodes /
388CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
};
For generality, Key can be a customized type, we use integer for illustration
purpose.
typedef int Key;
In this chapter, we use the circular doubly linked-list for imperative settings
to realize the Fibonacci Heap as described in [2]. It makes many operations
easy and fast. Note that, there are two extra elds added. A degree also known
as rank for a node is the number of children of this node; Flag mark is used
only in decreasing key operation. It will be explained in detail in later section.
10.3.2 Basic heap operations
As we mentioned that Fibonacci heap is essentially binomial heap implemented
in a lazy evaluation strategy, well reuse many algorithms dened for binomial
heap.
Insert a new element to the heap
Recall the insertion algorithm of binomial tree. It can be treated as a special
case of merge operation, that one heap contains only a singleton tree. So that
the inserting algorithm can be dened by means of merging.
insert(H, x) = merge(H, singleton(x)) (10.9)
where singleton is an auxiliary function to wrap an element to a one-leaf-tree.
singleton(x) = FibHeap(1, node(1, x, ), )
Note that function FibHeap() accepts three parameters, a size value, which
is 1 for this one-leaf-tree, a special tree which contains the minimum element as
root, and a list of other binomial trees in the forest. The meaning of function
node() is as same as before, that it creates a binomial tree from a rank, an
element, and a list of children.
Insertion can also be realized directly by appending the new node to the
forest and updated the record of the tree which contains the minimum element.
1: function Insert(H, k)
2: x Singleton(k) Wrap x to a node
3: append x to root list of H
4: if T
min
(H) = NIL k < Key(T
min
(H)) then
5: T
min
(H) x
6: n(H) n(H)+1
Where function T
min
() returns the tree which contains the minimum element
at root.
The following C source snippet is a translation for this algorithm.
struct FibHeap insert_node(struct FibHeap h, struct node x){
h = add_tree(h, x);
if(hminTr == NULL | | xkey < hminTrkey)
hminTr = x;
hn++;
10.3. FIBONACCI HEAPS 389
return h;
}
Exercise 10.5
Implement the insert algorithm in your favorite imperative programming
language completely. This is also an exercise to circular doubly linked list ma-
nipulation.
Merge two heaps
Dierent with the merging algorithm of binomial heap, we post-pone the linking
operations later. The idea is to just put all binomial trees from each heap
together, and choose one special tree which record the minimum element for
the result heap.
merge(H
1
, H
2
) =
_

_
H
1
: H
2
=
H
2
: H
1
=
FibHeap(s
1
+s
2
, T
1min
, {T
2min
} T
1
T
2
) : root(T
1min
) < root(T
2min
)
FibHeap(s
1
+s
2
, T
2min
, {T
1min
} T
1
T
2
) : otherwise
(10.10)
where s
1
and s
2
are the size of H
1
and H
2
; T
1min
and T
2min
are the spe-
cial trees with minimum element as root in H
1
and H
2
respectively; T
1
=
{T
11
, T
12
, ...} is a forest contains all other binomial trees in H
1
; while T
2
has
the same meaning as T
1
except that it represents the forest in H
2
. Function
root(T) return the root element of a binomial tree.
Note that as long as the operation takes constant time, these merge al-
gorithm is bound to O(1). The following Haskell program is the translation of
this algorithm.
merge:: (Ord a) FibHeap a FibHeap a FibHeap a
merge h E = h
merge E h = h
merge h1@(FH sz1 minTr1 ts1) h2@(FH sz2 minTr2 ts2)
| root minTr1 < root minTr2 = FH (sz1+sz2) minTr1 (minTr2:ts2++ts1)
| otherwise = FH (sz1+sz2) minTr2 (minTr1:ts1++ts2)
Merge algorithm can also be realized imperatively by concatenating the root
lists of the two heaps.
1: function Merge(H
1
, H
2
)
2: H
3: Root(H) Concat(Root(H
1
), Root(H
2
))
4: if Key(T
min
(H
1
)) < Key(T
min
(H
2
)) then
5: T
min
(H) T
min
(H
1
)
6: else
7: T
min
(H) T
min
(H
2
)
n(H) = n(H
1
) + n(H
2
)
8: release H
1
and H
2
9: return H
This function assumes neither H
1
, nor H
2
is empty. And its easy to add
handling to these special cases as the following ANSI C program.
390CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
struct FibHeap merge(struct FibHeap h1, struct FibHeap h2){
struct FibHeap h;
if(is_empty(h1))
return h2;
if(is_empty(h2))
return h1;
h = empty();
hroots = concat(h1roots, h2roots);
if(h1minTrkey < h2minTrkey)
hminTr = h1minTr;
else
hminTr = h2minTr;
hn = h1n + h2n;
free(h1);
free(h2);
return h;
}
With merge function dened, the O(1) insertion algorithm is realized as
well. And we can also give the O(1) time top function as below.
top(H) = root(T
min
) (10.11)
Exercise 10.6
Implement the circular doubly linked list concatenation function in your
favorite imperative programming language.
Extract the minimum element from the heap (pop)
The pop (delete the minimum element) operation is the most complex one in
Fibonacci heap. Since we postpone the tree consolidation in merge algorithm.
We have to compensate it somewhere. Pop is the only place left as we have
dened, insert, merge, top already.
There is an elegant procedural algorithm to do the tree consolidation by
using an auxiliary array[2]. Well show it later in imperative approach section.
In order to realize the purely functional consolidation algorithm, lets rst
consider a similar number puzzle.
Given a list of numbers, such as {2, 1, 1, 4, 8, 1, 1, 2, 4}, we want to add any
two values if they are same. And repeat this procedure till all numbers are
unique. The result of the example list should be {8, 16} for instance.
One solution to this problem will as the following.
consolidate(L) = fold(meld, , L) (10.12)
Where fold() function is dened to iterate all elements from a list, applying
a specied function to the intermediate result and each element. it is sometimes
called as reducing. Please refer to the chapter of binary search tree for it.
L = {x
1
, x
2
, ..., x
n
}, denotes a list of numbers; and well use L

= {x
2
, x
3
, ..., x
n
}
to represent the rest of the list with the rst element removed. Function meld()
10.3. FIBONACCI HEAPS 391
Table 10.1: Steps of consolidate numbers
number intermediate result result
2 2 2
1 1, 2 1, 2
1 (1+1), 2 4
4 (4+4) 8
8 (8+8) 16
1 1, 16 1, 16
1 (1+1), 16 2, 16
2 (2+2), 16 4, 16
4 (4+4), 16 8, 16
is dened as below.
meld(L, x) =
_

_
{x} : L =
meld(L

, x +x
1
) : x = x
1
{x} L : x < x
1
{x
1
} meld(L

, x) : otherwise
(10.13)
The consolidate() function works as the follows. It maintains an ordered
result list L, contains only unique numbers, which is initialized from an empty
list . Each time it process an element x, it rstly check if the rst element in L
is equal to x, if so, it will add them together (which yields 2x), and repeatedly
check if 2x is equal to the next element in L. This process wont stop until either
the element to be melt is not equal to the head element in the rest of the list, or
the list becomes empty. Table 10.1 illustrates the process of consolidating num-
ber sequence {2, 1, 1, 4, 8, 1, 1, 2, 4}. Column one lists the number scanned one
by one; Column two shows the intermediate result, typically the new scanned
number is compared with the rst number in result list. If they are equal, they
are enclosed in a pair of parentheses; The last column is the result of meld, and
it will be used as the input to next step processing.
The Haskell program can be give accordingly.
consolidate = foldl meld [] where
meld [] x = [x]
meld (x:xs) x | x == x = meld xs (x+x)
| x < x = x:x:xs
| otherwise = x: meld xs x
Well analyze the performance of consolidation as a part of pop operation in
later section.
The tree consolidation is very similar to this algorithm except it performs
based on rank. The only thing we need to do is to modify meld() function a
bit, so that it compare on ranks and do linking instead of adding.
meld(L, x) =
_

_
{x} : L =
meld(L

, link(x, x
1
)) : rank(x) = rank(x
1
)
{x} L : rank(x) < rank(x
1
)
{x
1
} meld(L

, x) : otherwise
(10.14)
The nal consolidate Haskell program changes to the below version.
392CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
consolidate :: (Ord a) [BiTree a] [BiTree a]
consolidate = foldl meld [] where
meld [] t = [t]
meld (t:ts) t | rank t == rank t = meld ts (link t t)
| rank t < rank t = t:t:ts
| otherwise = t : meld ts t
Figure 10.9 and 10.10 show the steps of consolidation when processing a
Fibonacci Heap contains dierent ranks of trees. Comparing with table 10.1
reveals the similarity.
a
b
c d e
f g
i
h
j k m
q
l n o
p
r s
t
u
v w
x
(a) Before consolidation
c a
b
(b) Step 1, 2
a
b c
d
(c) Step 3, d is rstly linked to c,
then repeatedly linked to a.
a
b c e
d f g
h
(d) Step 4
Figure 10.9: Steps of consolidation
After we merge all binomial trees, including the special tree record for the
minimum element in root, in a Fibonacci heap, the heap becomes a Binomial
heap. And we lost the special tree, which gives us the ability to return the top
element in O(1) time.
Its necessary to perform a O(lg N) time search to resume the special tree.
We can reuse the function extractMin() dened for Binomial heap.
Its time to give the nal pop function for Fibonacci heap as all the sub
problems have been solved. Let T
min
denote the special tree in the heap to
record the minimum element in root; T denote the forest contains all the other
trees except for the special tree, s represents the size of a heap, and function
10.3. FIBONACCI HEAPS 393
a
b c e i
d f g
h
j k m
l n o
p
(a) Step 5
q a
b c e i
d f g
h
j k m
l n o
p
(b) Step 6
q
r s
a
t
b c e i
d f g
h
j k m
l n o
p
(c) Step 7, 8, r is rstly linked to q, then s is linked to q.
Figure 10.10: Steps of consolidation
394CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
children() returns all sub trees except the root of a binomial tree.
deleteMin(H) =
_
: T = children(T
min
) =
FibHeap(s 1, T

min
, T

) : otherwise
(10.15)
Where
(T

min
, T

) = extractMin(consolidate(children(T
min
) T))
Translate to Haskell yields the below program.
deleteMin :: (Ord a) FibHeap a FibHeap a
deleteMin (FH _ (Node _ x []) []) = E
deleteMin h@(FH sz minTr ts) = FH (sz-1) minTr ts where
(minTr, ts) = extractMin $ consolidate (children minTr ++ ts)
The main part of the imperative realization is similar. We cut all children
of T
min
and append them to root list, then perform consolidation to merge all
trees with the same rank until all trees are unique in term of rank.
1: function Delete-Min(H)
2: x T
min
(H)
3: if x = NIL then
4: for each y Children(x) do
5: append y to root list of H
6: Parent(y) NIL
7: remove x from root list of H
8: n(H) n(H) - 1
9: Consolidate(H)
10: return x
Algorithm Consolidate utilizes an auxiliary array A to do the merge job.
Array A[i] is dened to store the tree with rank (degree) i. During the traverse
of root list, if we meet another tree of rank i, we link them together to get a
new tree of rank i + 1. Next we clean A[i], and check if A[i + 1] is empty and
perform further linking if necessary. After we nish traversing all roots, array
A stores all result trees and we can re-construct the heap from it.
1: function Consolidate(H)
2: D Max-Degree(n(H))
3: for i 0 to D do
4: A[i] NIL
5: for each x root list of H do
6: remove x from root list of H
7: d Degree(x)
8: while A[d] = NIL do
9: y A[d]
10: x Link(x, y)
11: A[d] NIL
12: d d + 1
13: A[d] x
14: T
min
(H) NIL root list is NIL at the time
15: for i 0 to D do
10.3. FIBONACCI HEAPS 395
16: if A[i] = NIL then
17: append A[i] to root list of H.
18: if T
min
= NIL Key(A[i]) < Key(T
min
(H)) then
19: T
min
(H) A[i]
The only unclear sub algorithm is Max-Degree, which can determine the
upper bound of the degree of any node in a Fibonacci Heap. Well delay the
realization of it to the last sub section.
Feed a Fibonacci Heap shown in Figure 10.9 to the above algorithm, Figure
10.11, 10.12 and 10.13 show the result trees stored in auxiliary array A in every
steps.
A[0] A[1] A[2] A[3] A[4]
c a
b
(a) Step 1, 2
A[0] A[1] A[2] A[3] A[4]
a
b c
d
(b) Step 3, Since A
0
= NIL,
d is rstly linked to c, and
clear A
0
to NIL. Again, as
A
1
= NIL, c is linked to a
and the new tree is stored in
A
2
.
A[0] A[1] A[2] A[3] A[4]
a
b c e
d f g
h
(c) Step 4
Figure 10.11: Steps of consolidation
Translate the above algorithm to ANSI C yields the below program.
void consolidate(struct FibHeap h){
if(!hroots)
return;
int D = max_degree(hn)+1;
struct node x, y;
struct node a = (struct node)malloc(sizeof(struct node)(D+1));
int i, d;
for(i=0; iD; ++i)
a[i] = NULL;
while(hroots){
x = hroots;
hroots = remove_node(hroots, x);
d= xdegree;
while(a[d]){
y = a[d]; / Another node has the same degree as x /
x = link(x, y);
396CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
A[0] A[1] A[2] A[3] A[4]
a
b c e i
d f g
h
j k m
l n o
p
(a) Step 5
A[0] A[1] A[2] A[3] A[4]
q a
b c e i
d f g
h
j k m
l n o
p
(b) Step 6
Figure 10.12: Steps of consolidation
10.3. FIBONACCI HEAPS 397
A[0] A[1] A[2] A[3] A[4]
q a
r s
t
b c e i
d f g
h
j k m
l n o
p
(a) Step 7, 8, Since A
0
= NIL, r is rstly linked to q, and the new
tree is stored in A
1
(A
0
is cleared); then s is linked to q, and stored
in A
2
(A
1
is cleared).
Figure 10.13: Steps of consolidation
a[d++] = NULL;
}
a[d] = x;
}
hminTr = hroots = NULL;
for(i=0; iD; ++i)
if(a[i]){
hroots = append(hroots, a[i]);
if(hminTr == NULL | | a[i]key < hminTrkey)
hminTr = a[i];
}
free(a);
}
Exercise 10.7
Implement the remove function for circular doubly linked list in your favorite
imperative programming language.
10.3.3 Running time of pop
In order to analyze the amortize performance of pop, we adopt potential method.
Reader can refer to [2] for a formal denition. In this chapter, we only give a
intuitive illustration.
Remind the gravity potential energy, which is dened as
E = M g h
398CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Suppose there is a complex process, which moves the object with mass M up
and down, and nally the object stop at height h

. And if there exists friction


resistance W
f
, We say the process works the following power.
W = M g (h

h) +W
f
Figure 10.14: Gravity potential energy.
Figure 10.14 illustrated this concept.
We treat the Fibonacci heap pop operation in a similar way, in order to
evaluate the cost, we rstly dene the potential (H) before extract the mini-
mum element. This potential is accumulated by insertion and merge operations
executed so far. And after tree consolidation and we get the result H

, we then
calculate the new potential (H

). The dierence between (H

) and (H) plus


the contribution of consolidate algorithm indicates the amortized performance
of pop.
For pop operation analysis, the potential can be dened as
(H) = t(H) (10.16)
Where t(H) is the number of trees in Fibonacci heap forest. We have t(H) =
1 +length(T) for any non-empty heap.
For the N-nodes Fibonacci heap, suppose there is an upper bound of ranks
for all trees as D(N). After consolidation, it ensures that the number of trees
in the heap forest is at most D(N) + 1.
Before consolidation, we actually did another important thing, which also
contribute to running time, we removed the root of the minimum tree, and
concatenate all children left to the forest. So consolidate operation at most
processes D(N) +t(H) 1 trees.
Summarize all the above factors, we deduce the amortized cost as below.
T = T
consolidation
+ (H

) (H)
= O(D(N) +t(H) 1) + (D(N) + 1) t(H)
= O(D(N))
(10.17)
10.3. FIBONACCI HEAPS 399
If only insertion, merge, and pop function are applied to Fibonacci heap.
We ensure that all trees are binomial trees. It is easy to estimate the upper
limit D(N) if O(lg N). (Suppose the extreme case, that all nodes are in only
one Binomial tree).
However, well show in next sub section that, there is operation can violate
the binomial tree assumption.
10.3.4 Decreasing key
There is a special heap operation left. It only makes sense for imperative set-
tings. Its about decreasing key of a certain node. Decreasing key plays impor-
tant role in some Graphic algorithms such as Minimum Spanning tree algorithm
and Dijkstras algorithm [2]. In that case we hope the decreasing key takes O(1)
amortized time.
However, we cant dene a function like Decrease(H, k, k

), which rst lo-


cates a node with key k, then decrease k to k

by replacement, and then resume


the heap properties. This is because the time for locating phase is bound to
O(N) time, since we dont have a pointer to the target node.
In imperative setting, we can dene the algorithm as Decrease-Key(H, x, k).
Here x is a node in heap H, which we want to decrease its key to k. We neednt
perform a search, as we have x at hand. Its possible to give an amortized O(1)
solution.
When we decreased the key of a node, if its not a root, this operation may
violate the property Binomial tree that the key of parent is less than all keys of
children. So we need to compare the decreased key with the parent node, and if
this case happens, we can cut this node and append it to the root list. (Remind
the recursive swapping solution for binary heap which leads to O(lg N))
x
...
... r
y ...
x
...

@
@
Figure 10.15: x < y, cut tree x from its parent, and add x to root list.
Figure 10.15 illustrates this situation. After decreasing key of node x, it is
less than y, we cut x o its parent y, and past the whole tree rooted at x to
root list.
400CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Although we recover the property of that parent is less than all children, the
tree isnt any longer a Binomial tree after it losses some sub tree. If a tree losses
too many of its children because of cutting, we cant ensure the performance of
merge-able heap operations. Fibonacci Heap adds another constraints to avoid
such problem:
If a node losses its second child, it is immediately cut from parent, and added
to root list
The nal Decrease-Key algorithm is given as below.
1: function Decrease-Key(H, x, k)
2: Key(x) k
3: p Parent(x)
4: if p = NIL k < Key(p) then
5: Cut(H, x)
6: Cascading-Cut(H, p)
7: if k < Key(T
min
(H)) then
8: T
min
(H) x
Where function Cascading-Cut uses the mark to determine if the node is
losing the second child. the node is marked after it losses the rst child. And
the mark is cleared in Cut function.
1: function Cut(H, x)
2: p Parent(x)
3: remove x from p
4: Degree(p) Degree(p) - 1
5: add x to root list of H
6: Parent(x) NIL
7: Mark(x) FALSE
During cascading cut process, if x is marked, which means it has already
lost one child. We recursively performs cut and cascading cut on its parent till
reach to root.
1: function Cascading-Cut(H, x)
2: p Parent(x)
3: if p = NIL then
4: if Mark(x) = FALSE then
5: Mark(x) TRUE
6: else
7: Cut(H, x)
8: Cascading-Cut(H, p)
The relevant ANSI C decreasing key program is given as the following.
void decrease_key(struct FibHeap h, struct node x, Key k){
struct node p = xparent;
xkey = k;
if(p && k < pkey){
cut(h, x);
cascading_cut(h, p);
}
if(k < hminTrkey)
hminTr = x;
}
10.3. FIBONACCI HEAPS 401
void cut(struct FibHeap h, struct node x){
struct node p = xparent;
pchildren = remove_node(pchildren, x);
pdegree--;
hroots = append(hroots, x);
xparent = NULL;
xmark = 0;
}
void cascading_cut(struct FibHeap h, struct node x){
struct node p = xparent;
if(p){
if(!xmark)
xmark = 1;
else{
cut(h, x);
cascading_cut(h, p);
}
}
}
Exercise 10.8
Prove that Decrease-Key algorithm is amortized O(1) time.
10.3.5 The name of Fibonacci Heap
Its time to reveal the reason why the data structure is named as Fibonacci
Heap.
There is only one undened algorithm so far, Max-Degree(N). Which can
determine the upper bound of degree for any node in a N nodes Fibonacci Heap.
Well give the proof by using Fibonacci series and nally realize Max-Degree
algorithm.
Lemma 10.3.1. For any node x in a Fibonacci Heap, denote k = degree(x),
and |x| = size(x), then
|x| F
k+2
(10.18)
Where F
k
is Fibonacci series dened as the following.
F
k
=
_
_
_
0 : k = 0
1 : k = 1
F
k1
+F
k2
: k 2
Proof. Consider all k children of node x, we denote them as y
1
, y
2
, ..., y
k
in the
order of time when they were linked to x. Where y
1
is the oldest, and y
k
is the
youngest.
Obviously, y
i
0. When we link y
i
to x, children y
1
, y
2
, ..., y
i1
have already
been there. And algorithm LINKonly links nodes with the same degree. Which
indicates at that time, we have
degree(y
i
) = degree(x) = i 1
402CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
After that, node y
i
can at most lost 1 child, (due to the decreasing key
operation) otherwise, if it will be immediately cut o and append to root list
after the second child loss. Thus we conclude
degree(y
i
) i 2
For any i = 2, 3, ..., k.
Let s
k
be the minimum possible size of node x, where degree(x) = k. For
trivial cases, s
0
= 1, s
1
= 2, and we have
|x| s
k
= 2 +
k

i=2
s
degree(y
i
)
2 +
k

i=2
s
i2
We next show that s
k
> F
k+2
. This can be proved by induction. For trivial
cases, we have s
0
= 1 F
2
= 1, and s
1
= 2 F
3
= 2. For induction case k 2.
We have
|x| s
k
2 +
k

i=2
s
i2
2 +
k

i=2
F
i
= 1 +
k

i=0
F
i
At this point, we need prove that
F
k+2
= 1 +
k

i=
F
i
(10.19)
This can also be proved by using induction:
Trivial case, F
2
= 1 +F
0
= 2
Induction case,
F
k+2
= F
k+1
+F
k
= 1 +
k1

i=0
F
i
+F
k
= 1 +
k

i=0
F
i
10.4. PAIRING HEAPS 403
Summarize all above we have the nal result.
N |x| F
k
+ 2 (10.20)
Recall the result of AVL tree, that F
k

k
, where =
1+

5
2
is the golden
ratio. We also proved that pop operation is amortized O(lg N) algorithm.
Based on this result. We can dene Function MaxDegree as the following.
MaxDegree(N) = 1 +log

N (10.21)
The imperative Max-Degree algorithm can also be realized by using Fi-
bonacci sequences.
1: function Max-Degree(N)
2: F
0
0
3: F
1
1
4: k 2
5: repeat
6: F
k
F
k
1
+F
k
2
7: k k + 1
8: until F
k
< N
9: return k 2
Translate the algorithm to ANSI C given the following program.
int max_degree(int n){
int k, F;
int F2 = 0;
int F1 = 1;
for(F=F1+F2, k=2; F<n; ++k){
F2 = F1;
F1 = F;
F = F1 + F2;
}
return k-2;
}
10.4 Pairing Heaps
Although Fibonacci Heaps provide excellent performance theoretically, it is com-
plex to realize. People nd that the constant behind the big-O is big. Actually,
Fibonacci Heap is more signicant in theory than in practice.
In this section, well introduce another solution, Pairing heap, which is one of
the best heaps ever known in terms of performance. Most operations including
insertion, nding minimum element (top), merging are all bounds to O(1) time,
while deleting minimum element (pop) is conjectured to amortized O(lg N) time
[7] [6]. Note that this is still a conjecture for 15 years by the time I write this
chapter. Nobody has been proven it although there are much experimental data
support the O(lg N) amortized result.
Besides that, pairing heap is simple. There exist both elegant imperative
and functional implementations.
404CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
10.4.1 Denition
Both Binomial Heaps and Fibonacci Heaps are realized with forest. While a
pairing heaps is essentially a K-ary tree. The minimum element is stored at
root. All other elements are stored in sub trees.
The following Haskell program denes pairing heap.
data PHeap a = E | Node a [PHeap a]
This is a recursive denition, that a pairing heap is either empty or a K-ary
tree, which is consist of a root node, and a list of sub trees.
Pairing heap can also be dened in procedural languages, for example ANSI
C as below. For illustration purpose, all heaps we mentioned later are minimum-
heap, and we assume the type of key is integer
4
. We use same linked-list based
left-child, right-sibling approach (aka, binary tree representation[2]).
typedef int Key;
struct node{
Key key;
struct node next, children, parent;
};
Note that the parent eld does only make sense for decreasing key operation,
which will be explained later on. we can omit it for the time being.
10.4.2 Basic heap operations
In this section, we rst give the merging operation for pairing heap, which can
be used to realize the insertion. Merging, insertion, and nding the minimum
element are relative trivial compare to the extracting minimum element opera-
tion.
Merge, insert, and nd the minimum element (top)
The idea of merging is similar to the linking algorithm we shown previously for
Binomial heap. When we merge two pairing heaps, there are two cases.
Trivial case, one heap is empty, we simply return the other heap as the
result;
Otherwise, we compare the root element of the two heaps, make the heap
with bigger root element as a new children of the other.
Let H
1
, and H
2
denote the two heaps, x and y be the root element of H
1
and H
2
respectively. Function Children() returns the children of a K-ary tree.
Function Node() can construct a K-ary tree from a root element and a list of
children.
merge(H
1
, H
2
) =
_

_
H
1
: H
2
=
H
2
: H
1
=
Node(x, {H
2
} Children(H
1
)) : x < y
Node(y, {H
1
} Children(H
2
)) : otherwise
(10.22)
4
We can parametrize the key type with C++ template, but this is beyond our scope, please
refer to the example programs along with this book
10.4. PAIRING HEAPS 405
Where
x = Root(H
1
)
y = Root(H
2
)
Its obviously that merging algorithm is bound to O(1) time
5
. The merge
equation can be translated to the following Haskell program.
merge :: (Ord a) PHeap a PHeap a PHeap a
merge h E = h
merge E h = h
merge h1@(Node x hs1) h2@(Node y hs2) =
if x < y then Node x (h2:hs1) else Node y (h1:hs2)
Merge can also be realized imperatively. With left-child, right sibling ap-
proach, we can just link the heap, which is in fact a K-ary tree, with larger key
as the rst new child of the other. This is constant time operation as described
below.
1: function Merge(H
1
, H
2
)
2: if H
1
= NIL then
3: return H
2
4: if H
2
= NIL then
5: return H
1
6: if Key(H
2
) < Key(H
1
) then
7: Exchange(H
1
H
2
)
8: Insert H
2
in front of Children(H
1
)
9: Parent(H
2
) H
1
10: return H
1
Note that we also update the parent eld accordingly. The ANSI C example
program is given as the following.
struct node merge(struct node h1, struct node h2){
if(h1 == NULL)
return h2;
if(h2 == NULL)
return h1;
if(h2key < h1key)
swap(&h1, &h2);
h2next = h1children;
h1children = h2;
h2parent = h1;
h1next = NULL; /Break previous link if any/
return h1;
}
Where function swap() is dened in a similar way as Fibonacci Heap.
With merge dened, insertion can be realized as same as Fibonacci Heap in
Equation 10.9. Denitely its O(1) time operation. As the minimum element is
always stored in root, nding it is trivial.
top(H) = Root(H) (10.23)
5
Assume is constant time operation, this is true for linked-list settings, including cons
like operation in functional programming languages.
406CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Same as the other two above operations, its bound to O(1) time.
Exercise 10.9
Implement the insertion and top operation in your favorite programming
language.
Decrease key of a node
There is another trivial operation, to decrease key of a given node, which only
makes sense in imperative settings as we explained in Fibonacci Heap section.
The solution is simple, that we can cut the node with the new smaller key
from its parent along with all its children. Then merge it again to the heap.
The only special case is that if the given node is the root, then we can directly
set the new key without doing anything else.
The following algorithm describes this procedure for a given node x, with
new key k.
1: function Decrease-Key(H, x, k)
2: Key(x) k
3: if Parent(x) = NIL then
4: Remove x from Children(Parent(x))
Parent(x) NIL
5: return Merge(H, x)
The following ANSI C program translates this algorithm.
struct node decrease_key(struct node h, struct node x, Key key){
xkey = key; / Assume key xkey /
if(xparent)
xparentchildren = remove_node(xparentchildren, x);
xparent = NULL;
return merge(h, x);
}
Exercise 10.10
Implement the program of removing a node from the children of its parent
in your favorite imperative programming language. Consider how can we ensure
the overall performance of decreasing key is O(1) time? Is left-child, right sibling
approach enough?
Delete the minimum element from the heap (pop)
Since the minimum element is always stored at root, after delete it during pop-
ping, the rest things left are all sub-trees. These trees can be merged to one big
tree.
pop(H) = mergePairs(Children(H)) (10.24)
Pairing Heap uses a special approach that it merges every two sub-trees
from left to right in pair. Then merge these paired results from right to left
which forms a nal result tree. The name of Pairing Heap comes from the
characteristic of this pair-merging.
10.4. PAIRING HEAPS 407
2
5 4 3 12 7 10 11 6 9
15 13 8 17 14
16
(a) A pairing heap before pop.
5
15
4
13
3
8
12 7 10 11 6 9
17 14
16
(b) After root element 2 being removed, there are 9 sub-trees left.
4
5 13
3
15
12 8
7
10
6
11
9
7 14
16
(c) Merge every two trees in pair, note that there are odd
number trees, so the last one neednt merge.
Figure 10.16: Remove the root element, and merge children in pairs.
408CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
6
9 11
7 14
16
(a) Merge tree with 9, and tree with root 6.
6
7 9 11
10 14
16
(b) Merge tree with root 7 to the result.
3
6 12 8
7 9 11
10 14
16
(c) Merge tree with root 3 to the result.
3
4 6 12 8
5 13
15
7 9 11
10 14
16
(d) Merge tree with root 4 to the result.
Figure 10.17: Steps of merge from right to left.
10.4. PAIRING HEAPS 409
Figure 10.16 and 10.17 illustrate the procedure of pair-merging.
The recursive pair-merging solution is quite similar to the bottom up merge
sort[6]. Denote the children of a pairing heap as A, which is a list of trees of
{T
1
, T
2
, T
3
, ..., T
m
} for example. The mergePairs() function can be given as
below.
mergePairs(A) =
_
_
_
: A =
T
1
: A = {T
1
}
merge(merge(T
1
, T
2
), mergePairs(A

)) : otherwise
(10.25)
where
A

= {T
3
, T
4
, ..., T
m
}
is the rest of the children without the rst two trees.
The relative Haskell program of popping is given as the following.
deleteMin :: (Ord a) PHeap a PHeap a
deleteMin (Node _ hs) = mergePairs hs where
mergePairs [] = E
mergePairs [h] = h
mergePairs (h1:h2:hs) = merge (merge h1 h2) (mergePairs hs)
The popping operation can also be explained in the following procedural
algorithm.
1: function Pop(H)
2: L NIL
3: for every 2 trees T
x
, T
y
Children(H) from left to right do
4: Extract x, and y from Children(H)
5: T Merge(T
x
, T
y
)
6: Insert T at the beginning of L
7: H Children(H) H is either NIL or one tree.
8: for T L from left to right do
9: H Merge(H, T)
10: return H
Note that L is initialized as an empty linked-list, then the algorithm iterates
every two trees in pair in the children of the K-ary tree, from left to right, and
performs merging, the result is inserted at the beginning of L. Because we insert
to front end, so when we traverse L later on, we actually process from right to
left. There may be odd number of sub-trees in H, in that case, it will leave one
tree after pair-merging. We handle it by start the right to left merging from
this left tree.
Below is the ANSI C program to this algorithm.
struct node pop(struct node h){
struct node x, y, lst = NULL;
while((x = hchildren) != NULL){
if((hchildren = y = xnext) != NULL)
hchildren = hchildrennext;
lst = push_front(lst, merge(x, y));
}
x = NULL;
410CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
while((y = lst) != NULL){
lst = lstnext;
x = merge(x, y);
}
free(h);
return x;
}
The pairing heap pop operation is conjectured to be amortized O(lg N) time
[7].
Exercise 10.11
Write a program to insert a tree at the beginning of a linked-list in your
favorite imperative programming language.
Delete a node
We didnt mention delete in Binomial heap or Fibonacci Heap. Deletion can be
realized by rst decreasing key to minus innity (), then performing pop.
In this section, we present another solution for delete node.
The algorithm is to dene the function delete(H, x), where x is a node in a
pairing heap H
6
.
If x is root, we can just perform a pop operation. Otherwise, we can cut x
from H, perform a pop on x, and then merge the pop result back to H. This
can be described as the following.
delete(H, x) =
_
pop(H) : x is root of H
merge(cut(H, x), pop(x)) : otherwise
(10.26)
As delete algorithm uses pop, the performance is conjectured to be amortized
O(1) time.
Exercise 10.12
Write procedural pseudo code for delete algorithm.
Write the delete operation in your favorite imperative programming lan-
guage
Consider how to realize delete in purely functional setting.
10.5 Notes and short summary
In this chapter, we extend the heap implementation from binary tree to more
generic approach. Binomial heap and Fibonacci heap use Forest of K-ary trees
as under ground data structure, while Pairing heap use a K-ary tree to represent
heap. Its a good point to post pone some expensive operation, so that the over
all amortized performance is ensured. Although Fibonacci Heap gives good
6
Here the semantic of x is a reference to a node.
10.5. NOTES AND SHORT SUMMARY 411
performance in theory, the implementation is a bit complex. It was removed in
some latest textbooks. We also present pairing heap, which is easy to realize
and have good performance in practice.
The elementary tree based data structures are all introduced in this book.
There are still many tree based data structures which we cant covers them all
and skip here. We encourage the reader to refer to other textbooks about them.
From next chapter, well introduce generic sequence data structures, array and
queue.
412CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Bibliography
[1] K-ary tree, Wikipedia. http://en.wikipedia.org/wiki/K-ary tree
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.
[3] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[4] Wikipedia, Pascals triangle. http://en.wikipedia.org/wiki/Pascals triangle
[5] Hackage. An alternate implementation of a priority queue based on
a Fibonacci heap., http://hackage.haskell.org/packages/archive/pqueue-
mtl/1.0.7/doc/html/src/Data-Queue-FibQueue.html
[6] Chris Okasaki. Fibonacci Heaps. http://darcs.haskell.org/nob/gc/bheaps/orig
[7] Michael L. Fredman, Robert Sedgewick, Daniel D. Sleator, and Robert E.
Tarjan. The Pairing Heap: A New Form of Self-Adjusting Heap Algo-
rithmica (1986) 1: 111-129.
413
414 BIBLIOGRAPHY
Part III
Queues and Sequences
415
AlgoXY 417
Queue, not so simple as it was thought
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
418 Queue
Chapter 11
Queue, not so simple as it
was thought
11.1 Introduction
It seems that queues are relative simple. A queue provides FIFO (rst-in, rst-
out) data manipulation support. There are many options to realize queue in-
cludes singly linked-list, doubly linked-list, circular buer etc. However, well
show that its not so easy to realize queue in purely functional settings if it must
satisfy abstract queue properties.
In this chapter, well present several dierent approaches to implement queue.
And in next chapter, well explain how to realize sequence.
A queue is a FIFO data structure satises the following performance con-
straints.
Element can be added to the tail of the queue in O(1) constant time;
Element can be removed from the head of the queue in O(1) constant
time.
These two properties must be satised. And its common to add some extra
goals, such as dynamic memory allocation etc.
Of course such abstract queue interface can be implemented with doubly-
linked list trivially. But this is a overkill solution. We can even implement
imperative queue with singly linked-list or plain array. However, our main
question here is about how to realize a purely functional queue as well?
Well rst review the typical queue solution which is realized by singly linked-
list and circular buer in rst section; Then we give a simple and straightforward
functional solution in the second section. While the performance is ensured in
terms of amortized constant time, we need nd real-time solution (or worst-case
solution) for some special case. Such solution will be described in the third
and the fourth section. Finally, well show a very simple real-time queue which
depends on lazy evaluation.
Most of the functional contents are based on Chris, Okasakis great work in
[6]. There are more than 16 dierent types of purely functional queue given in
that material.
419
420 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
11.2 Queue by linked-list and circular buer
11.2.1 Singly linked-list solution
Queue can be implemented with singly linked-list. Its easy to add and remove
element at the front end of a linked-list in O(1) time. However, in order to
keep the FIFO order, if we execute one operation on head, we must perform the
inverse operation on tail.
For plain singly linked-list, we must traverse the whole list before adding or
removing. Traversing is bound to O(N) time, where N is the length of the list.
This doesnt match the abstract queue properties.
The solution is to use an extra record to store the tail of the linked-list. A
sentinel is often used to simplify the boundary handling. The following ANSI
C
1
code denes a queue realized by singly linked-list.
typedef int Key;
struct Node{
Key key;
struct Node next;
};
struct Queue{
struct Node head, tail;
};
Figure 11.1 illustrates an empty list. Both head and tail point to the sentinel
NIL node.
S
head tail
Figure 11.1: The empty queue, both head and tail point to sentinel node.
We summarize the abstract queue interface as the following.
function Empty Create an empty queue
function Empty?(Q) Test if Q is empty
function Enqueue(Q, x) Add a new element x to queue Q
function Dequeue(Q) Remove element from queue Q
function Head(Q) get the next element in queue Q in FIFO order
1
Its possible to parameterize the type of the key with C++ template. ANSI C is used
here for illustration purpose.
11.2. QUEUE BY LINKED-LIST AND CIRCULAR BUFFER 421
Note the dierence between Dequeue and Head. Head only retrieve next
element in FIFO order without removing it, while Dequeue performs removing.
In some programming languages, such as Haskell, and most object-oriented
languages, the above abstract queue interface can be ensured by some denition.
For example, the following Haskell code species the abstract queue.
class Queue q where
empty :: q a
isEmpty :: q a Bool
push :: q a a q a -- aka snoc or append, or push_back
pop :: q a q a -- aka tail or pop_front
front :: q a a -- aka head
To ensure the constant time Enqueue and Dequeue, we add new element
to head and remove element from tail.
2
function Enqueue(Q, x)
p Create-New-Node
Key(p) x
Next(p) NIL
Next(Tail(Q)) p
Tail(Q) p
Note that, as we use the sentinel node, there are at least one node, the
sentinel in the queue. Thats why we neednt check the validation of of the tail
before we append the new created node p to it.
function Dequeue(Q)
x Head(Q)
Next(Head(Q)) Next(x)
if x = Tail(Q) then Q gets empty
Tail(Q) Head(Q)
return Key(x)
As we always put the sentinel node in front of all the other nodes, function
Head actually returns the next node to the sentinel.
Figure 11.2 illustrates Enqueue and Dequeue process with sentinel node.
Translating the pseudo code to ANSI C program yields the below code.
struct Queue enqueue(struct Queue q, Key x){
struct Node p = (struct Node)malloc(sizeof(struct Node));
pkey = x;
pnext = NULL;
qtailnext = p;
qtail = p;
return q;
}
Key dequeue(struct Queue q){
struct Node p = head(q); /gets the node next to sentinel/
Key x = key(p);
qheadnext = pnext;
if(qtail == p)
2
Its possible to add new element to the tail, while remove element from head, but the
operations are more complex than this approach.
422 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
Sentinel a ... e NIL
x NIL
Enqueue
head tail
(a) Before Enqueue x to queue
Sentinel a ... e x NIL
head tail
(b) After Enqueue x to queue
Sentinel a b
Dequeue
... e NIL
head tail
(c) Before Dequeue x to queue
Sentinel b ... e NIL
head tail
(d) After Dequeue x to queue
Figure 11.2: Enqueue and Dequeue to linked-list queue.
11.2. QUEUE BY LINKED-LIST AND CIRCULAR BUFFER 423
qtail = qhead;
free(p);
return x;
}
This solution is simple and robust. Its easy to extend this solution even to
the concurrent environment (e.g. multicores). We can assign a lock to the head
and use another lock to the tail. The sentinel helps us from being dead-locked
due to the empty case [1] [2].
Exercise 11.1
Realize the Empty? and Head algorithms for linked-list queue.
Implement the singly linked-list queue in your favorite imperative pro-
gramming language. Note that you need provide functions to initialize
and destroy the queue.
11.2.2 Circular buer solution
Another typical solution to realize queue is to use plain array as a circular buer
(also known as ring buer). Oppose to linked-list, array support appending to
the tail in constant O(1) time if there are still spaces. Of course we need re-
allocate spaces if the array is fully occupied. However, Array performs poor
in O(N) time when removing element from head and packing the space. This
is because we need shift all rest elements one cell ahead. The idea of circular
buer is to reuse the free cells before the rst valid element after we remove
elements from head.
The idea of circular buer can be described in gure 11.3 and 11.4.
If we set a maximum size of the buer instead of dynamically allocate mem-
ories, the queue can be dened with the below ANSI C code.
struct Queue{
Key buf;
int head, tail, size;
};
When initialize the queue, we are explicitly asked to provide the maximum
size as argument.
struct Queue createQ(int max){
struct Queue q = (struct Queue)malloc(sizeof(struct Queue));
qbuf = (Key)malloc(sizeof(Key)max);
qsize = max;
qhead = qtail = 0;
return q;
}
To test if a queue is empty is trivial.
function Empty?(Q)
return Head(Q) = Tail(Q)
One brute-force implementation for Enqueue and Dequeue is to calculate
the modular of index blindly as the following.
424 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
a[0] a[1] ... a[i] ...
head tail boundery
(a) Continuously add some elements.
... a[j] ... a[i] ...
head tail boundery
(b) After remove some elements from head, there are
free cells.
... a[j] ... a[i]
head tail boundery
(c) Go on adding elements till the boundary of the
array.
a[0] ... a[j] ...
head tail boundery
(d) The next element is added to the rst free
cell on head.
a[0] a[1] ... a[j-1] a[j] ...
head tail boundery
(e) All cells are occupied. The queue is full.
Figure 11.3: A queue is realized with ring buer.
11.2. QUEUE BY LINKED-LIST AND CIRCULAR BUFFER 425
Figure 11.4: The circular buer.
function Enqueue(Q, x)
if Full?(Q) then
Tail(Q) (Tail(Q) + 1) mod Size(Q)
Buffer(Q)[Tail(Q)] x
function Head(Q)
if Empty?(Q) then
return Buffer(Q)[Head(Q)]
function Dequeue(Q)
if Empty?(Q) then
Head(Q) (Head(Q) + 1) mod Size(Q)
However, modular is expensive and slow depends on some settings, so one
may replace it by some adjustment. For example as in the below ANSI C
program.
void enQ(struct Queue q, Key x){
if(!fullQ(q)){
qbuf[qtail++] = x;
qtail -= qtail< qsize ? 0 : qsize;
}
}
Key headQ(struct Queue q){
return qbuf[qhead]; / Assume queue isntempty/
}
KeydeQ(structQueueq){
Keyx=headQ(q);
qhead++;
qhead-=qhead<qsize?0:qsize;
returnx;
426 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
}
Exercise 11.2
As the circular buer is allocated with a maximum size parameter, please
write a function to test if a queue is full to avoid overow. Note there are two
cases, one is that the head is in front of the tail, the other is on the contrary.
11.3 Purely functional solution
11.3.1 Paired-list queue
We cant just use a list to implement queue, or we cant satisfy abstract queue
properties. This is because singly linked-list, which is the back-end data struc-
ture in most functional settings, performs well on head in constant O(1) time,
while it performs in linear O(N) time on tail, where N is the length of the list.
Either dequeue or enqueue will perform proportion to the number of elements
stored in the list as shown in gure 11.5.
x[N] x[N-1] ... x[2] x[1] NIL DeQueue O(N) EnQueue O(1)
(a) DeQueue performs poorly.
EnQueue O(N) x[N] x[N-1] ... x[2] x[1] NIL DeQueue O(1)
(b) EnQueue performs poorly.
Figure 11.5: DeQueue and EnQueue cant perform both in constant O(1)
time with a list.
We neither can add a pointer to record the tail position of the list as what
we have done in the imperative settings like in the ANSI C program, because
of the nature of purely functional.
Chris Okasaki mentioned a simple and straightforward functional solution in
[6]. The idea is to maintain two linked-lists as a queue, and concatenate these
two lists in a tail-to-tail manner. The shape of the queue looks like a horseshoe
magnet as shown in gure 11.6.
With this setup, we push new element to the head of the rear list, which is
ensure to be O(1) constant time; on the other hand, we pop element from the
head of the front list, which is also O(1) constant time. So that the abstract
queue properties can be satised.
The denition of such paired-list queue can be expressed in the following
Haskell code.
type Queue a = ([a], [a])
empty = ([], [])
Suppose function front(Q) and rear(Q) return the front and rear list in
such setup, and Queue(F, R) create a paired-list queue from two lists F and R.
11.3. PURELY FUNCTIONAL SOLUTION 427
(a) a horseshoe magnet.
DeQueue O(1) x[N]
front
x[N-1]
y[M]
... x[2] x[1] NIL
y[1] NIL EnQueue O(1) y[M-1]
rear
... y[2]
(b) concatenate two lists tail-to-tail.
Figure 11.6: A queue with front and rear list shapes like a horseshoe magnet.
428 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
The EnQueue (push) and DeQueue (pop) operations can be easily realized
based on this setup.
push(Q, x) = Queue(front(Q), {x} rear(Q)) (11.1)
pop(Q) = Queue(tail(front(Q)), rear(Q)) (11.2)
where if a list X = {x
1
, x
2
, ..., x
n
}, function tail(X) = {x
2
, x
3
, ..., x
n
} returns
the rest of the list without the rst element.
However, we must next solve the problem that after several pop operations,
the front list becomes empty, while there are still elements in rear list. One
method is to rebuild the queue by reversing the rear list, and use it to replace
front list.
Hence a balance operation will be execute after popping. Lets denote the
front and rear list of a queue Q as F = front(Q), and R = fear(Q).
balance(F, R) =
_
Queue(reverse(R), ) : F =
Q : otherwise
(11.3)
Thus if front list isnt empty, we do nothing, while when the front list be-
comes empty, we use the reversed rear list as the new front list, and the new
rear list is empty.
The new enqueue and dequeue algorithms are updated as below.
push(Q, x) = balance(F, {x} R) (11.4)
pop(Q) = balance(tail(F), R) (11.5)
Sum up the above algorithms and translate them to Haskell yields the fol-
lowing program.
balance :: Queue a Queue a
balance ([], r) = (reverse r, [])
balance q = q
push :: Queue a a Queue a
push (f, r) x = balance (f, x:r)
pop :: Queue a Queue a
pop ([], _) = error "Empty"
pop (_:f, r) = balance (f, r)
However, although we only touch the heads of front list and rear list, the
overall performance cant be kept always as O(1). Actually, the performance of
this algorithm is amortized O(1). This is because the reverse operation takes
time proportion to the length of the rear list. its bound O(N) time, where
N = |R|. We left the prove of amortized performance as an exercise to the
reader.
11.3. PURELY FUNCTIONAL SOLUTION 429
11.3.2 Paired-array queue - a symmetric implementation
There is an interesting implementation which is symmetric to the paired-list
queue. In some old programming languages, such as legacy version of BASIC,
There is array supported, but there is no pointers, nor records to represent
linked-list. Although we can use another array to store indexes so that we
can represent linked-list with implicit array, there is another option to realized
amortized O(1) queue.
Compare the performance of array and linked-list. Below table reveals some
facts (Suppose both contain N elements).
operation Array Linked-list
insert on head O(N) O(1)
insert on tail O(1) O(N)
remove on head O(N) O(1)
remove on tail O(1) O(N)
Note that linked-list performs in constant time on head, but in linear time
on tail; while array performs in constant time on tail (suppose there is enough
memory spaces, and omit the memory reallocation for simplication), but in
linear time on head. This is because we need do shifting when prepare or
eliminate an empty cell in array. (see chapter the evolution of insertion sort
for detail.)
The above table shows an interesting characteristic, that we can exploit it
and provide a solution mimic to the paired-list queue: We concatenate two
arrays, head-to-head, to make a horseshoe shape queue like in gure 11.7.
(a) a horseshoe magnet.
EnQueue O(1)
DeQueue O(1)
front array
x[1] x[2] ... x[N-1] x[N]
y[1] y[2] ... y[M-1] y[M]
rear array
(b) concatenate two arrays head-to-head.
Figure 11.7: A queue with front and rear arrays shapes like a horseshoe magnet.
We can dene such paired-array queue like the following Python code
3
class Queue:
def __init__(self):
self.front = []
self.rear = []
3
Legacy Basic code is not presented here. And we actually use list but not array in Python
to illustrate the idea. ANSI C and ISO C++ programs are provides along with this chapter,
they show more in a purely array manner.
430 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
def is_empty(q):
return q.front == [] and q.rear == []
The relative Push() and Pop() algorithm only manipulate on the tail of the
arrays.
function Push(Q, x)
Append(Rear(Q), x)
Here we assume that the Append() algorithm append element x to the end
of the array, and handle the necessary memory allocation etc. Actually, there
are multiple memory handling approaches. For example, besides the dynamic
re-allocation, we can initialize the array with enough space, and just report error
if its full.
function Pop(Q)
if Front(Q) = then
Front(Q) Reverse(Rear(Q))
Rear(Q)
N Length(Front(Q))
x Front(Q)[N]
Length(Front(Q)) N 1
return x
For simplication and pure illustration purpose, the array isnt shrunk ex-
plicitly after elements removed. So test if front array is empty () can be
realized as check if the length of the array is zero. We omit all these details
here.
The enqueue and dequeue algorithms can be translated to Python programs
straightforwardly.
def push(q, x):
q.rear.append(x)
def pop(q):
if q.front == []:
q.rear.reverse()
(q.front, q.rear) = (q.rear, [])
return q.front.pop()
Similar to the paired-list queue, the performance is amortized O(1) because
the reverse procedure takes linear time.
Exercise 11.3
Prove that the amortized performance of paired-list queue is O(1).
Prove that the amortized performance of paired-array queue is O(1).
11.4 A small improvement, Balanced Queue
Although paired-list queue is amortized O(1) for popping and pushing, the so-
lution we proposed in previous section performs poor in the worst case. For
11.4. A SMALL IMPROVEMENT, BALANCED QUEUE 431
example, there is one element in the front list, and we push N elements con-
tinuously to the queue, here N is a big number. After that executing a pop
operation will cause the worst case.
According to the strategy we used so far, all the N elements are added to
the rear list. The front list turns to be empty after a pop operation. So the
algorithm starts to reverse the rear list. This reversing procedure is bound to
O(N) time, which is proportion to the length of the rear list. Sometimes, it
cant be acceptable for a very big N.
The reason why this worst case happens is because the front and rear lists are
extremely unbalanced. We can improve our paired-list queue design by making
them more balanced. One option is to add a balancing constraint.
|R| |F| (11.6)
Where R = Rear(Q), F = Front(Q), and |L| is the length of list L. This
constraint ensure the length of the rear list is less than the length of the front
list. So that the reverse procedure will be executed once the rear list grows
longer than the front list.
Here we need frequently access the length information of a list. However,
calculate the length takes linear time for singly linked-list. We can record the
length to a variable and update it as adding and removing elements. This
approach enables us to get the length information in constant time.
Below example shows the modied paired-list queue denition which is aug-
mented with length elds.
data BalanceQueue a = BQ [a] Int [a] Int
As we keep the invariant as specied in (11.6), we can easily tell if a queue
is empty by testing the length of the front list.
F = |F| = 0 (11.7)
In the rest part of this section, we suppose the length of a list L, can be
retrieved as |L| in constant time.
Push and pop are almost as same as before except that we check the balance
invariant by passing length information and performs reversing accordingly.
push(Q, x) = balance(F, |F|, {x} R, |R| + 1) (11.8)
pop(Q) = balance(tail(F), |F| 1, R, |R|) (11.9)
Where function balance() is dened as the following.
balance(F, |F|, R, |R|) =
_
Queue(F, |F|, R, |R|) : |R| |F|
Queue(F reverse(R), |F| +|R|, , 0) : otherwise
(11.10)
Note that the function Queue() takes four parameters, the front list along
with its length (recorded), and the rear list along with its length, and forms a
paired-list queue augmented with length elds.
We can easily translate the equations to Haskell program. And we can
enforce the abstract queue interface by making the implementation an instance
of the Queue type class.
432 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
instance Queue BalanceQueue where
empty = BQ [] 0 [] 0
isEmpty (BQ _ lenf _ _) = lenf == 0
-- Amortized O(1) time push
push (BQ f lenf r lenr) x = balance f lenf (x:r) (lenr + 1)
-- Amortized O(1) time pop
pop (BQ (_:f) lenf r lenr) = balance f (lenf - 1) r lenr
front (BQ (x:_) _ _ _) = x
balance f lenf r lenr
| lenr lenf = BQ f lenf r lenr
| otherwise = BQ (f ++ (reverse r)) (lenf + lenr) [] 0
Exercise 11.4
Write the symmetric balance improvement solution for paired-array queue
in your favorite imperative programming language.
11.5 One more step improvement, Real-time Queue
Although the extremely worst case can be avoided by improving the balancing
as what has been presented in previous section, the performance of reversing
rear list is still bound to O(N), where N = |R|. So if the rear list is very long,
the instant performance is still unacceptable poor even if the amortized time is
O(1). It is particularly important in some real-time system to ensure the worst
case performance.
As we have analyzed, the bottleneck is the computation of F reverse(R).
This happens when |R| > |F|. Considering that |F| and |R| are all integers, so
this computation happens when
|R| = |F| + 1 (11.11)
Both F and the result of reverse(R) are singly linked-list, It takes O(|F|)
time to concatenate them together, and it takes extra O(|R|) time to reverse
the rear list, so the total computation is bound to O(|N|), where N = |F| +|R|.
Which is proportion to the total number of elements in the queue.
In order to realize a real-time queue, we cant computing F reverse(R)
monolithic. Our strategy is to distribute this expensive computation to every
pop and push operations. Thus although each pop and push get a bit slow, we
may avoid the extremely slow worst pop or push case.
Incremental reverse
Lets examine how functional reverse algorithm is implemented typically.
reverse(X) =
_
: X =
reverse(X

) {x
1
} : otherwise
(11.12)
11.5. ONE MORE STEP IMPROVEMENT, REAL-TIME QUEUE 433
Where X

= tail(X) = {x
2
, x
3
, ...}.
This is a typical recursive algorithm, that if the list to be reversed is empty,
the result is just an empty list. This is the edge case; otherwise, we take the rst
element x
1
from the list, reverse the rest {x
2
, x
3
, ..., x
n
}, to {x
n
, x
n1
, .., x
3
, x
2
}
and append x
1
after it.
However, this algorithm performs poor, as appending an element to the end
of a list is proportion to the length of the list. So its O(N
2
), but not a linear
time reverse algorithm.
There exists another implementation which utilizes an accumulator A, like
below.
reverse(X) = reverse

(X, ) (11.13)
Where
reverse

(X, A) =
_
A : X =
reverse

(X

, {x
1
} A) : otherwise
(11.14)
We call A as accumulator because it accumulates intermediate reverse result
at any time. Every time we call reverse

(X, A), list X contains the rest of


elements wait to be reversed, and A holds all the reversed elements so far. For
instance when we call reverse

() at i-th time, X and A contains the following


elements:
X = {x
i
, x
i+1
, ..., x
n
} A = {x
i1
, x
i2
, ...x
1
}
In every non-trivial case, we takes the rst element from X in O(1) time;
then put it in front of the accumulator A, which is again O(1) constant time.
We repeat it N times, so this is a linear time (O(N)) algorithm.
The latter version of reverse is obviously a tail-recursion algorithm, see [5]
and [6] for detail. Such characteristic is easy to change from monolithic algo-
rithm to incremental manner.
The solution is state transferring. We can use a state machine contains two
types of stat: reversing state S
r
to indicate that the reverse is still on-going (not
nished), and nish state S
f
to indicate the reverse has been done (nished).
In Haskell programming language, it can be dened as a type.
data State a = | Reverse [a] [a]
| Done [a]
And we can schedule (slow-down) the above reverse

(X, A) function with


these two types of state.
step(S, X, A) =
_
(S
f
, A) : S = S
r
X =
(S
r
, X

, {x
1
} A) : S = S
r
X =
(11.15)
Each step, we examine the state type rst, if the current state is S
r
(on-
going), and the rest elements to be reversed in X is empty, we can turn the
algorithm to nish state S
f
; otherwise, we take the rst element from X, put it
in front of A just as same as above, but we do NOT perform recursion, instead,
we just nish this step. We can store the current state as well as the resulted
434 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
X and A, the reverse can be continued at any time when we call next step
function in the future with the stored state, X and A passed in.
Here is an example of this step-by-step reverse algorithm.
step(S
r
, hello, ) = (S
r
, ello, h)
step(S
r
, ello, h) = (S
r
, llo, eh)
...
step(S
r
, o, lleh) = (S
r
, , olleh)
step(S
r
, , olleh) = (S
f
, olleh)
And in Haskell code manner, the example is like the following.
step $ Reverse "hello" [] = Reverse "ello" "h"
step $ Reverse "ello" "h" = Reverse "llo" "eh"
...
step $ Reverse "o" "lleh" = Reverse [] "olleh"
step $ Reverse [] "olleh" = Done "olleh"
Now we can distribute the reverse into steps in every pop and push op-
erations. However, the problem is just half solved. We want to break down
F reverse(R), and we have broken reverse(R) into steps, we next need to
schedule(slow-down) the list concatenation part F..., which is bound to O(|F|),
into incremental manner so that we can distribute it to pop and push operations.
Incremental concatenate
Its a bit more challenge to implement incremental list concatenation than list
reversing. However, its possible to re-use the result we gained from increment
reverse by a small trick: In order to realize X Y , we can rst reverse X to

X,
then take elements one by one from

X and put them in front of Y just as what
we have done in reverse

.
X Y reverse(reverse(X)) Y
reverse

(reverse(X), ) Y
reverse

(reverse(X), Y )
reverse

X, Y )
(11.16)
This fact indicates us that we can use an extra state to instruct the step()
function to continuously concatenating

F after R is reversed.
The strategy is to do the total work in two phases:
1. Reverse both F and R in parallel to get

F = reverse(F), and

R =
reverse(R) incrementally;
2. Incrementally take elements from

F and put them in front of

R.
So we dene three types of state: S
r
represents reversing; S
c
represents
concatenating; and S
f
represents nish.
In Haskell, these types of state are dened as the following.
data State a = Reverse [a] [a] [a] [a]
| Concat [a] [a]
| Done [a]
11.5. ONE MORE STEP IMPROVEMENT, REAL-TIME QUEUE 435
Because we reverse F and R simultaneously, so reversing state takes two
pairs of lists and accumulators.
The state transfering is dened according to the two phases strategy de-
scribed previously. Denotes that F = {f
1
, f
2
, ...}, F

= tail(F) = {f
2
, f
3
, ...},
R = {r
1
, r
2
, ...}, R

= tail(R) = {r
2
, r
3
, ...}. A state S, contains its type S,
which has the value among S
r
, S
c
, and S
f
. Note that S also contains necessary
parameters such as F,

F , X, A etc as intermediate results. These parameters
vary according to the dierent states.
next(S) =
_

_
(S
r
, F

, {f
1
}

F , R

, {r
1
}

R) : S = S
r
F = R =
(S
c
,

F , {r
1
}

R) : S = S
r
F = R = {r
1
}
(S
f
, A) : S = S
c
X =
(S
c
, X

, {x
1
} A) : S = S
c
X =
(11.17)
The relative Haskell program is list as below.
next (Reverse (x:f) f (y:r) r) = Reverse f (x:f) r (y:r)
next (Reverse [] f [y] r) = Concat f (y:r)
next (Concat 0 _ acc) = Done acc
next (Concat (x:f) acc) = Concat f (x:acc)
All left to us is to distribute these incremental steps into every pop and push
operations to implement a real-time O(1) purely functional queue.
Sum up
Before we dive into the nal real-time queue implementation. Lets analyze
how many incremental steps are taken to achieve the result of F reverse(R).
According to the balance variant we used previously, |R| = |F|+1, Lets denotes
M = |F|.
Once the queue gets unbalanced due to some push or pop operation, we start
this incremental F reverse(R). It needs M +1 steps to reverse R, and at the
same time, we nish reversing the list F within these steps. After that, we need
extra M + 1 steps to execute the concatenation. So there are 2M + 2 steps.
It seems that distribute one step inside one pop or push operation is the
natural solution, However, there is a critical question must be answered: Is it
possible that before we nish these 2M + 2 steps, the queue gets unbalanced
again due to a series push and pop?
There are two facts about this question, one is good news and the other is
bad news.
Lets rst show the good news, that luckily, continuously pushing cant make
the queue unbalanced again before we nish these 2M + 2 steps to achieve
F reverse(R). This is because once we start re-balancing, we can get a new
front list F

= Freverse(R) after 2M+2 steps. While the next time unbalance


is triggered when
|R

| = |F

| + 1
= |F| +|R| + 1
= 2M + 2
(11.18)
That is to say, even we continuously pushing as mush elements as possible
after the last unbalanced time, when the queue gets unbalanced again, the
436 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
front copy on-going computation new rear
{f
i
, f
i+1
, ..., f
M
} (S
r
,

F, ...,

R, ...) {...}
rst i 1 elements popped intermediate result

F and

R new elements pushed
Table 11.1: Intermediate state of a queue before rst M steps nish.
2M + 2 steps exactly get nished at that time point. Which means the new
front list F

is calculated OK. We can safely go on to compute F

reverse(R

).
Thanks to the balance invariant which is designed in previous section.
But, the bad news is that, pop operation can happen at anytime before these
2M +2 steps nish. The situation is that once we want to extract element from
front list, the new front list F

= F reverse(R) hasnt been ready yet. We


dont have a valid front list at hand.
One solution to solve this problem is to keep a copy of original front list
F, during the time we are calculating reverse(F) which is described in phase
1 of our incremental computing strategy. So that we are still safe even if user
continuously performs rst M pop operations. So the queue looks like in table
11.1 at some time after we start the incremental computation and before phase
1 (reverse F and R simultaneously) ending
4
.
After these M pop operations, the copy of F is exhausted. And we just start
incremental concatenation phase at that time. What if user goes on popping?
The fact is that since F is exhausted (becomes ), we neednt do concate-
nation at all. Since F

R =

R =

R.
It indicates us, when doing concatenation, we only need to concatenate those
elements havent been popped, which are still left in F. As user pops elements
one by one continuously from the head of front list F, one method is to use a
counter, record how many elements there are still in F. The counter is initialized
as 0 when we start computing F reverse(R), its increased by one when we
reverse one element in F, which means we need concatenate thie element in the
future; and its decreased by one every time when pop is performed, which means
we can concatenate one element less; of course we need decrease this counter as
well in every steps of concatenation. If and only if this counter becomes zero,
we neednt do concatenations any more.
We can give the realization of purely functional real-time queue according
to the above analysis.
We rst add an idle state S
0
to simplify some state transfering. Below
Haskell program is an example of this modied state denition.
data State a = Empty
| Reverse Int [a] [a] [a] [a] -- n, f, acc_f r, acc_r
| Append Int [a] [a] -- n, rev_f, acc
| Done [a] -- result: f ++ reverse r
4
One may wonder that copying a list takes linear time to the length of the list. If so
the whole solution would make no sense. Actually, this linear time copying wont happen
at all. This is because the purely functional nature, the front list wont be mutated either
by popping or by reversing. However, if trying to realize a symmetric solution with paired-
array and mutate the array in-place, this issue should be stated, and we can perform a lazy
copying, that the real copying work wont execute immediately, instead, it copies one element
every step we do incremental reversing. The detailed implementation is left as an exercise.
11.5. ONE MORE STEP IMPROVEMENT, REAL-TIME QUEUE 437
And the data structure is dened with three parts, the front list (augmented
with length); the on-going state of computing F reverse(R); and the rear list
(augmented with length).
Here is the Haskell denition of real-time queue.
data RealtimeQueue a = RTQ [a] Int (State a) [a] Int
The empty queue is composed with empty front and rear list together with
idle state S
0
as Queue(, 0, S
0
, , 0). And we can test if a queue is empty by
checking if |F| = 0 according to the balance invariant dened before. Push and
pop are changed accordingly.
push(Q, x) = balance(F, |F|, S, {x} R, |R| + 1) (11.19)
pop(Q) = balance(F

, |F| 1, abort(S), R, |R|) (11.20)


The major dierence is abort() function. Based on our above analysis, when
there is popping, we need decrease the counter, so that we can concatenate
one element less. We dene this as aborting. The details will be given after
balance() function.
The relative Haskell code for push and pop are listed like this.
push (RTQ f lenf s r lenr) x = balance f lenf s (x:r) (lenr + 1)
pop (RTQ (_:f) lenf s r lenr) = balance f (lenf - 1) (abort s) r lenr
The balance() function rst check the balance invariant, if its violated, we
need start re-balance it by starting compute F reverse(R) incrementally;
otherwise we just execute one step of the unnished incremental computation.
balance(F, |F|, S, R, |R|) =
_
step(F, |F|, S, R, |R|) : |R| |F|
step(F, |F| +|R|, (S
r
, 0, F, , R, ), 0) : otherwise
(11.21)
The relative Haskell code is given like below.
balance f lenf s r lenr
| lenr lenf = step f lenf s r lenr
| otherwise = step f (lenf + lenr) (Reverse 0 f [] r []) [] 0
The step() function typically transfer the state machine one state ahead, and
it will turn the state to idle (S
0
) when the incremental computation nishes.
step(F, |F|, S, R, |R|) =
_
Queue(F

, |F|, S
0
, R, |R|) : S

= S
f
Queue(F, |F|, S

, R, |R|) : otherwise
(11.22)
Where S

= next(S) is the next state transferred; F

= F reverse(R), is
the nal new front list result from the incremental computing. The real state
transferring is implemented in next() function as the following. Its dierent
from previous version by adding the counter eld n to record how many elements
438 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
left we need to concatenate.
next(S) =
_

_
(S
r
, n + 1, F

, {f
1
}

F , R

, {r
1
}

R) : S = S
r
F =
(S
c
, n,

F , {r
1
}

R) : S = S
r
F =
(S
f
, A) : S = S
c
n = 0
(S
c
, n 1, X

, {x
1
} A) : S = S
c
n = 0
S : otherwise
(11.23)
And the corresponding Haskell code is like this.
next (Reverse n (x:f) f (y:r) r) = Reverse (n+1) f (x:f) r (y:r)
next (Reverse n [] f [y] r) = Concat n f (y:r)
next (Concat 0 _ acc) = Done acc
next (Concat n (x:f) acc) = Concat (n-1) f (x:acc)
next s = s
Function abort() is used to tell the state machine, we can concatenate one
element less since it is popped.
abort(S) =
_

_
(S
f
, A

) : S = S
c
n = 0
(S
c
, n 1, X

A) : S = S
c
n = 0
(S
r
, n 1, F,

F , R,

R) : S = S
r
S : otherwise
(11.24)
Note that when n = 0 we actually rollback one concatenated element by
return A

as the result but not A. (Why? this is left as an exercise.)


The Haskell code for abort function is like the following.
abort (Concat 0 _ (_:acc)) = Done acc -- Note! we rollback 1 elem
abort (Concat n f acc) = Concat (n-1) f acc
abort (Reverse n f f r r) = Reverse (n-1) f f r r
abort s = s
It seems that weve done, however, there is still one tricky issue hidden
behind us. If we push an element x to an empty queue, the result queue will
be:
Queue(, 1, (S
c
, 0, , {x}), , 0)
If we perform pop immediately, well get an error! We found that the front
list is empty although the previous computation of F reverse(R) has been
nished. This is because it takes one more extra step to transfer from the state
(S
c
, 0, , A) to (S
f
, A). Its necessary to rene the S

in step() function a bit.


S

=
_
next(next(S)) : F =
next(S) : otherwise
(11.25)
The modication reects to the below Haskell code:
step f lenf s r lenr =
case s of
Done f RTQ f lenf Empty r lenr
s RTQ f lenf s r lenr
where s = if null f then next $ next s else next s
11.6. LAZY REAL-TIME QUEUE 439
Note that this algorithm diers from the one given by Chris Okasaki in
[6]. Okasakis algorithm executes two steps per pop and push, while the one
presents in this chapter executes only one per pop and push, which leads to
more distributed performance.
Exercise 11.5
Why need we rollback one element when n = 0 in abort() function?
Realize the real-time queue with symmetric paired-array queue solution
in your favorite imperative programming language.
In the footnote, we mentioned that when we start incremental reversing
with in-place paired-array solution, copying the array cant be done mono-
lithic or it will lead to linear time operation. Implement the lazy copying
so that we copy one element per step along with the reversing.
11.6 Lazy real-time queue
The key to realize a real-time queue is to break down the expensive F
reverse(R) to avoid monolithic computation. Lazy evaluation is particularly
helpful in such case. In this section, well explore if there is some more elegant
solution by exploit laziness.
Suppose that there exits a function rotate(), which can compute Freverse(R)
incrementally. thats to say, with some accumulator A, the following two func-
tions are equivalent.
rotate(X, Y, A) X reverse(Y ) A (11.26)
Where we initialized X as the front list F, Y as the rear list R, and the
accumulator A is initialized as empty .
The trigger of rotation is still as same as before when |F| + 1 = |R|. Lets
keep this constraint as an invariant during the whole rotation process, that
|X| + 1 = |Y | always holds.
Its obvious to deduce to the trivial case:
rotate(, {y
1
}, A) = {y
1
} A (11.27)
Denote X = {x
1
, x
2
, ...}, Y = {y
1
, y
2
, ...}, and X

= {x
2
, x
3
, ...}, Y

=
{y
2
, y
3
, ...} are the rest of the lists without the rst element for X and Y re-
spectively. The recursion case is ruled out as the following.
rotate(X, Y, A) X reverse(Y ) A Denition of (11.26)
{x
1
} (X

reverse(Y ) A) Associative of
{x
1
} (X

reverse(Y

) ({y
1
} A)) Nature of reverse and associative of
{x
1
} rotate(X

, Y

, {y
1
} A) Denition of (11.26)
(11.28)
440 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
Summarize the above two cases, yields the nal incremental rotate algorithm.
rotate(X, Y, A) =
_
{y
1
} A : X =
{x
1
} rotate(X

, Y

, {y
1
} A) : otherwise
(11.29)
If we execute lazily instead of strictly, that is, execute once pop or push
operation is performed, the computation of rotate can be distribute to push and
pop naturally.
Based on this idea, we modify the paired-list queue denition to change the
front list to a lazy list, and augment it with a computation stream. [5]. When
the queue triggers re-balance constraint by some pop/push, that |F| + 1 = |R|,
The algorithm creates a lazy rotation computation, then use this lazy rotation
as the new front list F

; the new rear list becomes , and a copy of F

is
maintained as a stream.
After that, when we performs every push and pop; we consume the stream
by forcing a operation. This results us advancing one step along the stream,
{x} F

, where F

= tail(F

). We can discard x, and replace the stream F

with F

.
Once all of the stream is exhausted, we can start another rotation.
In order to illustrate this idea clearly, we turns to Scheme/Lisp programming
language to show example codes, because it gives us explicit control of laziness.
In Scheme/Lisp, we have the following three tools to deal with lazy stream.
(define (cons-stream a b) (cons a (delay b)))
(define stream-car car)
(define (stream-cdr s) (cdr (force s)))
So cons-stream constructs a lazy list from an element x and an existing list
L without really evaluating the value of L; The evaluation is actually delayed
to stream-cdr, where the computation is forced. delaying can be realized by
lambda calculus, please refer to [5] for detail.
The lazy paired-list queue is dened as the following.
(define (make-queue f r s)
(list f r s))
;; Auxiliary functions
(define (front-lst q) (car q))
(define (rear-lst q) (cadr q))
(define (rots q) (caddr q))
A queue is consist of three parts, a front list, a rear list, and a stream which
represents the computation of F reverse(R). Create an empty queue is trivial
as making all these three parts null.
(define empty (make-queue () () ()))
Note that the front-list is also lazy stream actually, so we need use stream
related functions to manipulate it. For example, the following function test if
the queue is empty by checking the front lazy list stream.
11.6. LAZY REAL-TIME QUEUE 441
(define (empty? q) (stream-null? (front-lst q)))
The push function is almost as same as the one given in previous section.
That we put the new element in front of the rear list; and then examine the
balance invariant and do necessary balancing works.
push(Q, x) = balance(F, {x} R, R
s
) (11.30)
Where R represents the lazy stream of front list; R
s
is the stream of rotation
computation. The relative Scheme/Lisp code is give below.
(define (push q x)
(balance (front-lst q) (cons x (rear q)) (rots q)))
While pop is a bit dierent, because the front list is actually lazy stream,
we need force an evaluation. All the others are as same as before.
pop(Q) = balance(F

, R, R
s
) (11.31)
Here F

, force one evaluation to F, the Scheme/Lisp code regarding to this


equation is as the following.
(define (pop q)
(balance (stream-cdr (front-lst q)) (rear q) (rots q)))
For illustration purpose, we skip the error handling (such as pop from an
empty queue etc) here.
And one can access the top element in the queue by extract from the front
list stream.
(define (front q) (stream-car (front-lst q)))
The balance function rst checks if the computation stream is completely
exhausted, and starts new rotation accordingly; otherwise, it just consumes one
evaluation by enforcing the lazy stream.
balance(Q) =
_
Queue(F

, , F

) : R
s
=
Queue(F, R, R

s
) : otherwise
(11.32)
Here F

is dened to start a new rotation.


F

= rotate(F, R, ) (11.33)
The relative Scheme/Lisp program is listed accordingly.
(define (balance f r s)
(if (stream-null? s)
(let ((newf (rotate f r ())))
(make-queue newf () newf))
(make-queue f r (stream-cdr s))))
The implementation of incremental rotate function is just as same as what
we analyzed above.
(define (rotate xs ys acc)
(if (stream-null? xs)
(cons-stream (car ys) acc)
(cons-stream (stream-car xs)
(rotate (stream-cdr xs) (cdr ys)
(cons-stream (car ys) acc)))))
442 CHAPTER 11. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
We used explicit lazy evaluation in Scheme/Lisp. Actually, this program can
be very short by using lazy programming languages, for example, Haskell.
data LazyRTQueue a = LQ [a] [a] [a] -- front, rear, f ++ reverse r
instance Queue LazyRTQueue where
empty = LQ [] [] []
isEmpty (LQ f _ _) = null f
-- O(1) time push
push (LQ f r rot) x = balance f (x:r) rot
-- O(1) time pop
pop (LQ (_:f) r rot) = balance f r rot
front (LQ (x:_) _ _) = x
balance f r [] = let f = rotate f r [] in LQ f [] f
balance f r (_:rot) = LQ f r rot
rotate [] [y] acc = y:acc
rotate (x:xs) (y:ys) acc = x : rotate xs ys (y:acc)
11.7 Notes and short summary
Just as mentioned in the beginning of this book in the rst chapter, queue
isnt so simple as it was thought. Weve tries to explain algorithms and data
structures both in imperative and in function approaches; Sometimes, it gives
impression that functional way is simpler and more expressive in most time.
However, there are still plenty of areas, that more studies and works are needed
to give equivalent functional solution. Queue is such an important topic, that
it links to many fundamental purely functional data structures.
Thats why Chris Okasaki made intensively study and took a great amount of
discussions in [6]. With purely functional queue solved, we can easily implement
dequeue with the similar approach revealed in this chapter. As we can handle
elements eectively in both head and tail, we can advance one step ahead to
realize sequence data structures, which support fast concatenate, and nally we
can realize random access data structures to mimic array in imperative settings.
The details will be explained in later chapters.
Note that, although we havent mentioned priority queue, its quite possible
to realized it with heaps. We have covered topic of heaps in several previous
chapters.
Exercise 11.6
Realize dequeue, wich support adding and removing elements on both
sides in constant O(1) time in purely functional way.
Realize dequeue in a symmetric solution only with array in your favorite
imperative programming language.
Bibliography
[1] Maged M. Michael and Michael L. Scott. Simple, Fast, and Prac-
tical Non-Blocking and Blocking Concurrent Queue Algorithms.
http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html
[2] Herb Sutter. Writing a Generalized Concurrent Queue. Dr. Dobbs Oct
29, 2008. http://drdobbs.com/cpp/211601363?pgno=1
[3] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.
[4] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[5] Wikipedia. Tail-call. http://en.wikipedia.org/wiki/Tail call
[6] Wikipedia. Recursion (computer science).
http://en.wikipedia.org/wiki/Recursion (computer science)#Tail-
recursive functions
[7] Harold Abelson, Gerald Jay Sussman, Julie Sussman. Structure and In-
terpretation of Computer Programs, 2nd Edition. MIT Press, 1996, ISBN
0-262-51087-1
Sequences, The last brick
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
443
444 Sequences
Chapter 12
Sequences, The last brick
12.1 Introduction
In the rst chapter of this book, which introduced binary search tree as the hello
world data structure, we mentioned that neither queue nor array is simple if
realized not only in imperative way, but also in functional approach. In previous
chapter, we explained functional queue, which achieves the similar performance
as its imperative counterpart. In this chapter, well dive into the topic of array-
like data structures.
We have introduced several data structures in this book so far, and it seems
that functional approaches typically bring more expressive and elegant solu-
tion. However, there are some areas, people havent found competitive purely
functional solutions which can match the imperative ones. For instance, the
Ukkonen linear time sux tree construction algorithm. another examples is
Hashing table. Array is also among them.
Array is trivial in imperative settings, it enables randomly accessing any
elements with index in constant O(1) time. However, this performance target
cant be achieved directly in purely functional settings as there is only list can
be used.
In this chapter, we are going to abstract the concept of array to sequences.
Which support the following features
Element can be inserted to or removed from the head of the sequence
quickly in O(1) time;
Element can be inserted to or removed from the head of the sequence
quickly in O(1) time;
Support concatenate two sequences quickly (faster than linear time);
Support randomly access and update any element quickly;
Support split at any position quickly;
We call these features abstract sequence properties, and it easy to see the
fact that even array (here means plain-array) in imperative settings cant meet
them all at the same time.
445
446 CHAPTER 12. SEQUENCES, THE LAST BRICK
Well provide three solutions in this chapter. Firstly, well introduce a so-
lution based on binary tree forest and numeric representation; Secondly, well
show a concatenate-able list solution; Finally, well give the nger tree solution.
Most of the results are based on Chris, Okasakis work in [6].
12.2 Binary random access list
12.2.1 Review of plain-array and list
Lets review the performance of plain-array and singly linked-list so that we
know how they perform in dierent cases.
operation Array Linked-list
operation on head O(N) O(1)
operation on tail O(1) O(N)
access at random position O(1) average O(N)
remove at given position average O(N) O(1)
concatenate O(N
2
) O(N
1
)
Because we hold the head of linked list, operations on head such as insert and
remove perform in constant time; while we need traverse to the end to perform
removing or appending on tail; Given a position i, it need traverse i elements
to access it. Once we are at that position, removing element from there is just
bound to constant time by modifying some pointers. In order to concatenate
two linked-lists, we need traverse to the end of the rst one, and link it to the
second one, which is bound to the length of the rst linked-list;
On the other hand, for array, we must prepare free cell for inserting a new
element to the head of it, and we need release the rst cell after the rst element
being removed, all these two operations are achieved by shifting all the rest
elements forward or backward, which costs linear time. While the operations
on the tail of array are trivial constant time. Array also support accessing
random position i by nature; However, removing the element at that position
causes shifting all elements after it one position ahead. In order to concatenate
two arrays, we need copy all elements from the second one to the end of the
rst one (ignore the memory re-allocation details), which is proportion to the
length of the second array.
In the chapter about binomial heaps, we have explained the idea of using
forest, which is a list of trees. It brings us the merit that, for any given number
N, by representing it in binary number, we know how many binomial trees need
to hold them. That each bit of 1 represents a binomial tree of that rank of bit.
We can go one step ahead, if we have a N nodes binomial heap, for any given
index 1 < i < N, we can quickly know which binomial tree in the heap holds
the i-th node.
12.2.2 Represent sequence by trees
One solution to realize a random-access sequence is to manage the sequence
with a forest of complete binary trees. Figure 12.1 shows how we attach such
trees to a sequence of numbers.
Here two trees t
1
and t
2
are used to represent sequence {x
1
, x
2
, x
3
, x
4
, x
5
, x
6
}.
The size of binary tree t
1
is 2. The rst two elements {x
1
, x
2
} are leaves of t
1
;
12.2. BINARY RANDOM ACCESS LIST 447
t1
x1 x2
t2
x3 x4 x5 x6
Figure 12.1: A sequence of 6 elements can be represented in a forest.
the size of binary tree t
2
is 4. The next four elements {x
3
, x
4
, x
5
, x
6
} are leaves
of t
2
.
For a complete binary tree, we dene the depth as 0 if the tree has only a
leaf. The tree is denoted as as t
i
if its depth is i +1. Its obvious that there are
2
i
leaves in t
i
.
For any sequence contains N elements, it can be turned to a forest of com-
plete binary trees in this manner. First we represent N in binary number like
below.
N = 2
0
e
0
+ 2
1
e
1
+... + 2
M
e
M
(12.1)
Where e
i
is either 1 or 0, so N = (e
M
e
M1
...e
1
e
0
)
2
. If e
i
= 0, we then need
a complete binary tree with size 2
i
, For example in gure 12.1, as the length of
sequence is 6, which is (110)
2
in binary. The lowest bit is 0, so we neednt a
tree of size 1; the second bit is 1, so we need a tree of size 2, which has depth of
2; the highest bit is also 1, thus we need a tree of size 4, which has depth of 3.
This method represents the sequence {x
1
, x
2
, ..., x
N
} to a list of trees {t
0
, t
1
, ..., t
M
}
where t
i
is either empty if e
i
= 0 or a complete binary tree if e
i
= 1. We call
this representation as Binary Random Access List [6].
We can reused the denition of binary tree. For example, the following
Haskell program denes the tree and the binary random access list.
data Tree a = Leaf a
| Node Int (Tree a) (Tree a) -- size, left, right
type BRAList a = [Tree a]
The only dierence from the typical binary tree is that we augment the size
information to the tree. This enable us to get the size without calculation at
every time. For instance.
size (Leaf _) = 1
size (Node sz _ _) = sz
448 CHAPTER 12. SEQUENCES, THE LAST BRICK
12.2.3 Insertion to the head of the sequence
The new forest representation of sequence enables many operation eectively.
For example, the operation of inserting a new element y in front of sequence
can be realized as the following.
1. Create a tree t

, with y as the only one leaf;


2. Examine the rst tree in the forest, compare its size with t

, if its size is
greater than t

, we just let t

be the new head of the forest, since the forest


is a linked-list of tree, insert t

to its head is trivial operation, which is


bound to constant O(1) time;
3. Otherwise, if the size of rst tree in the forest is equal to t

, lets denote
this tree in the forest as t
i
, we can construct a new binary tree t

i+1
by
linking t
i
and t

as its left and right children. After that, we recursively


try to insert t

i+1
to the forest.
Figure 12.2 and 12.3 illustrate the steps of inserting element x
1
, x
2
, ..., x
6
to
an empty tree.
x1
(a) A singleton leaf of x
1
t1
x2 x1
(b) Insert x
2
. It causes linking, results a tree of
height 1.
x3 x2
t1
x1
(c) Insert x
3
. the result is two trees, t
1
and t
2
t2
x4 x3 x2 x1
(d) Insert x
4
. It rst causes
linking two leafs to a binary
tree, then it performs linking
again, which results a nal tree
of height 2.
Figure 12.2: Steps of inserting elements to an empty list, 1
As there are at most M trees in the forest, and M is bound to O(lg N), so
the insertion to head algorithm is ensured to perform in O(lg N) even in worst
case. Well prove the amortized performance is O(1) later.
Lets formalize the algorithm to equations. we dene the function of inserting
an element in front of a sequence as insert(S, x).
insert(S, x) = insertTree(S, leaf(x)) (12.2)
12.2. BINARY RANDOM ACCESS LIST 449
x5
t2
x4 x3 x2 x1
(a) Insert x
5
. The forest is a leaf (t
0
)
and t
2
.
t1
x6 x5
t2
x4 x3 x2 x1
(b) Insert x
6
. It links two leaf to t
1
.
Figure 12.3: Steps of inserting elements to an empty list, 2
This function just wrap element x to a singleton tree with a leaf, and call
insertTree to insert this tree to the forest. Suppose the forest F = {t
1
, t
2
, ...}
if its not empty, and F

= {t
2
, t
3
, ...} is the rest of trees without the rst one.
insertTree(F, t) =
_
_
_
{t} : F =
{t} F : size(t) < size(t
1
)
insertTree(F

, link(t, t
1
)) : otherwise
(12.3)
Where function link(t
1
, t
2
) create a new tree from two small trees with same
size. Suppose function tree(s, t
1
, t
2
) create a tree, set its size as s, makes t
1
as
the left child, and t
2
as the right child, linking can be realized as below.
link(t
1
, t
2
) = tree(size(t
1
) +size(t
2
), t
1
, t
2
) (12.4)
The relative Haskell programs can be given by translating these equations.
cons :: a BRAList a BRAList a
cons x ts = insertTree ts (Leaf x)
insertTree :: BRAList a Tree a BRAList a
insertTree [] t = [t]
insertTree (t:ts) t = if size t < size t then t:t:ts
else insertTree ts (link t t)
-- Precondition: rank t1 = rank t2
link :: Tree a Tree a Tree a
link t1 t2 = Node (size t1 + size t2) t1 t2
Here we use the Lisp tradition to name the function that insert an element
before a list as cons.
Remove the element from the head of the sequence
Its not complex to realize the inverse operation of cons, which can remove
element from the head of the sequence.
If the rst tree in the forest is a singleton leaf, remove this tree from the
forest;
450 CHAPTER 12. SEQUENCES, THE LAST BRICK
otherwise, we can halve the rst tree by unlinking its two children, so the
rst tree in the forest becomes two trees, we recursively halve the rst tree
until it turns to be a leaf.
Figure 12.4 illustrates the steps of removing elements from the head of the
sequence.
x5
t2
x4 x3 x2 x1
(a) A sequence of 5 elements
t2
x4 x3 x2 x1
(b) Result of removing x
5
, the
leaf is removed.
x3 x2
t1
x1
(c) Result of removing x
4
, As there is not leaf tree, the tree
is rstly divided into two sub trees of size 2. The rst tree
is next divided again into two leafs, after that, the rst leaf,
which contains x
4
is removed. What left in the forest is a
leaf tree of x
3
, and a tree of size 2 with elements x
2
, x
1
.
Figure 12.4: Steps of removing elements from head
If we assume the sequence isnt empty, so that we can skip the error han-
dling such as trying to remove an element from an empty sequence, this can be
expressed with the following equation. We denote the forest F = {t
1
, t
2
, ...} and
the trees without the rst one as F

= {t
2
, t
3
, ...}
extractTree(F) =
_
(t
1
, F

) : t
1
is leaf
extractTree({t
l
, t
r
} F

) : otherwise
(12.5)
where {t
l
, t
r
} = unlink(t
1
) are the two children of t
1
.
It can be translated to Haskell programs like below.
extractTree (t@(Leaf x):ts) = (t, ts)
extractTree (t@(Node _ t1 t2):ts) = extractTree (t1:t2:ts)
With this function dened, its convenient to give head() and tail() functions,
the former returns the rst element in the sequence, the latter return the rest.
head(S) = key(first(extractTree(S))) (12.6)
tail(S) = second(extractTree(S)) (12.7)
12.2. BINARY RANDOM ACCESS LIST 451
Where function first() returns the rst element in a paired-value (as known
as tuple); second() returns the second element respectively. function key() is
used to access the element inside a leaf. Below are Haskell programs correspond-
ing to these two equations.
head ts = x where (Leaf x, _) = extractTree ts
tail = snd extractTree
Note that as head and tail functions have already been dened in Haskell
standard library, we given them apostrophes to make them distinct. (another
option is to hide the standard ones by importing. We skip the details as they
are language specic).
Random access the element in binary random access list
As trees in the forest help managing the elements in blocks, giving an arbitrary
index, its easy to locate which tree this element is stored, after that performing
a search in the tree yields the result. As all trees are binary (more accurate,
complete binary tree), the search is essentially binary search, which is bound to
the logarithm of the tree size. This brings us a faster random access capability
than linear search in linked-list setting.
Given an index i, and a sequence S, which is actually a forest of trees, the
algorithm is executed as the following
1
.
1. Compare i with the size of the rst tree T
1
in the forest, if i is less than
or equal to the size, the element exists in T
1
, perform looking up in T
1
;
2. Otherwise, decrease i by the size of T
1
, and repeat the previous step in
the rest of the trees in the forest.
This algorithm can be represented as the below equation.
get(S, i) =
_
lookupTree(T
1
, i) : i |T
1
|
get(S

, i |T
1
|) : otherwise
(12.8)
Where |T| = size(T), and S

= {T
2
, T
3
, ...} is the rest of trees without the
rst one in the forest. Note that we dont handle out of bound error case, this
is left as an exercise to the reader.
Function lookupTree() is just a binary search algorithm, if the index i is 1,
we just return the root of the tree, otherwise, we halve the tree by unlinking, if
i is less than or equal to the size of the halved tree, we recursively look up the
left tree, otherwise, we look up the right tree.
lookupTree(T, i) =
_
_
_
root(T) : i = 1
lookupTree(left(T)) : i
|T|
2

lookupTree(right(T)) : otherwise
(12.9)
Where function left() returns the left tree T
l
of T, while right() returns T
r
.
The corresponding Haskell program is given as below.
1
We follow the tradition that the index i starts from 1 in algorithm description; while it
starts from 0 in most programming languages
452 CHAPTER 12. SEQUENCES, THE LAST BRICK
getAt (t:ts) i = if i < size t then lookupTree t i
else getAt ts (i - size t)
lookupTree (Leaf x) 0 = x
lookupTree (Node sz t1 t2) i = if i < sz div 2 then lookupTree t1 i
else lookupTree t2 (i - sz div 2)
Figure 12.5 illustrates the steps of looking up the 4-th element in a sequence
of size 6. It rst examine the rst tree, since the size is 2 which is smaller than
4, so it goes on looking up for the second tree with the updated index i

= 42,
which is the 2nd element in the rest of the forest. As the size of the next tree is
4, which is greater than 2, so the element to be searched should be located in
this tree. It then examines the left sub tree since the new index 2 is not greater
than the half size 4/2=2; The process next visits the right grand-child, and the
nal result is returned.
t1
x6 x5
t2
x4 x3 x2 x1
(a) getAt(S, 4)), 4 > size(t
1
) = 2
t2
x4 x3 x2 x1
(b) getAt(S

, 4 2) lookupTree(t
2
, 2)
left(t2)
x4 x3
(c) 2 size(t
2
)/2 lookupTree(left(t
2
), 2)
x3
(d) lookupTree(right(left(t
2
)), 1), x
3
is re-
turned.
Figure 12.5: Steps of locating the 4-th element in a sequence.
By using the similar idea, we can update element at any arbitrary position
i. We rst compare the size of the rst tree T
1
in the forest with i, if it is less
than i, it means the element to be updated doesnt exist in the rst tree. We
recursively examine the next tree in the forest, comparing it with i |T
1
|, where
|T
1
| represents the size of the rst tree. Otherwise if this size is greater than or
equal to i, the element is in the tree, we halve the tree recursively until to get
a leaf, at this stage, we can replace the element of this leaf with a new one.
set(S, i, x) =
_
{updateTree(T
1
, i, x)} S

: i < |T
1
|
{T
1
} set(S

, i |T
1
|, x) : otherwise
(12.10)
Where S

= {T
2
, T
3
, ...} is the rest of the trees in the forest without the rst
one.
12.3. NUMERIC REPRESENTATION FOR BINARY RANDOM ACCESS LIST453
Function setTree(T, i, x) performs a tree search and replace the i-th element
with the given value x.
setTree(T, i, x) =
_
_
_
leaf(x) : i = 0 |T| = 1
tree(|T|, setTree(T
l
, i, x), T
r
) : i <
|T|
2

tree(|T|, T
l
, setTree(T
r
, i
|T|
2
, x)) : otherwise
(12.11)
Where T
l
and T
r
are left and right sub tree of T respectively. The following
Haskell program translates the equation accordingly.
setAt :: BRAList a Int a BRAList a
setAt (t:ts) i x = if i < size t then (updateTree t i x):ts
else t:setAt ts (i-size t) x
updateTree :: Tree a Int a Tree a
updateTree (Leaf _) 0 x = Leaf x
updateTree (Node sz t1 t2) i x =
if i < sz div 2 then Node sz (updateTree t1 i x) t2
else Node sz t1 (updateTree t2 (i - sz div 2) x)
As the nature of complete binary search tree, for a sequence with N elements,
which is represented by binary random access list, the number of trees in the
forest is bound to O(lg N). Thus it takes O(lg N) time to locate the tree for
arbitrary index i, that contains the element. the followed tree search is bound
to the heights of the tree, which is O(lg N) as well. So the total performance of
random access is O(lg N).
Exercise 12.1
1. The random access algorithm given in this section doesnt handle the error
such as out of bound index at all. Modify the algorithm to handle this
case, and implement it in your favorite programming language.
2. Its quite possible to realize the binary random access list in imperative
settings, which is beneted with fast operation on the head of the se-
quence. the random access can be realized in two steps: rstly locate
the tree, secondly use the capability of constant random access of array.
Write a program to implement it in your favorite imperative programming
language.
12.3 Numeric representation for binary random
access list
In previous section, we mentioned that for any sequence with N elements, we
can represent N in binary format so that N = 2
0
e
0
+2
1
e
1
+... +2
M
e
M
. Where
e
i
is the i-th bit, which can be either 0 or 1. If e
i
= 0 it means that there is a
complete binary tree with size 2
i
.
This fact indicates us that there is an explicit relationship between the binary
form of N and the forest. Insertion a new element on the head can be simulated
454 CHAPTER 12. SEQUENCES, THE LAST BRICK
by increasing the binary number by one; while remove an element from the head
mimics the decreasing of the corresponding binary number by one. This is as
known as numeric representation [6].
In order to represent the binary access list with binary number, we can dene
two states for a bit. That Zero means there is no such a tree with size which is
corresponding to the bit, while One, means such tree exists in the forest. And
we can attach the tree with the state if it is One.
The following Haskell program for instance denes such states.
data Digit a = Zero
| One (Tree a)
type RAList a = [Digit a]
Here we reuse the denition of complete binary tree and attach it to the
state One. Note that we cache the size information in the tree as well.
With digit dened, forest can be treated as a list of digits. Lets see how
inserting a new element can be realized as binary number increasing. Suppose
function one(t) creates a One state and attaches tree t to it. And function
getTree(s) get the tree which is attached to the One state s. The sequence S
is a list of digits of states that S = {s
1
, s
2
, ...}, and S

is the rest of digits with


the rst one removed.
insertTree(S, t) =
_
_
_
{one(t)} : S =
{one(t)} S

: s
1
= Zero
{Zero} insertTree(S

, link(t, getTree(s
1
))) : otherwise
(12.12)
When we insert a new tree t to a forest S of binary digits, If the forest is
empty, we just create a One state, attach the tree to it, and make this state the
only digit of the binary number. This is just like 0 + 1 = 1;
Otherwise if the forest isnt empty, we need examine the rst digit of the
binary number. If the rst digit is Zero, we just create a One state, attach the
tree, and replace the Zero state with the new created One state. This is just like
(...digits...0)
2
+1 = (...digits...1)
2
. For example 6+1 = (110)
2
+1 = (111)
2
= 7.
The last case is that the rst digit is One, here we make assumption that the
tree t to be inserted has the same size with the tree attached to this One state at
this stage. This can be ensured by calling this function from inserting a leaf, so
that the size of the tree to be inserted grows in a series of 1, 2, 4, ..., 2
i
, .... In such
case, we need link these two trees (one is t, the other is the tree attached to the
One state), and recursively insert the linked result to the rest of the digits. Note
that the previous One state has to be replaced with a Zero state. This is just
like (...digits...1)
2
+1 = (...digits

...0)
2
, where (...digits

...)
2
= (...digits...)
2
+1.
For example 7 + 1 = (111)
2
+ 1 = (1000)
2
= 8
Translating this algorithm to Haskell yields the following program.
insertTree :: RAList a Tree a RAList a
insertTree [] t = [One t]
insertTree (Zero:ts) t = One t : ts
insertTree (One t :ts) t = Zero : insertTree ts (link t t)
All the other functions, including link(), cons() etc. are as same as before.
12.3. NUMERIC REPRESENTATION FOR BINARY RANDOM ACCESS LIST455
Next lets see how removing an element from a sequence can be represented
as binary number deduction. If the sequence is a singleton One state attached
with a leaf. After removal, it becomes empty. This is just like 1 1 = 0;
Otherwise, we examine the rst digit, if it is One state, it will be replaced
with a Zero state to indicate that this tree will be no longer exist in the forest
as it being removed. This is just like (...digits...1)
2
1 = (...digits...0)
2
. For
example 7 1 = (111)
2
1 = (110)
2
= 6;
If the rst digit in the sequence is a Zero state, we have to borrow from the
further digits for removal. We recursively extract a tree from the rest digits,
and halve the extracted tree to its two children. Then the Zero state will be
replaced with a One state attached with the right children, and the left children
is removed. This is something like (...digits...0)
2
1 = (...digits

...1)
2
, where
(...digits

...)
2
= (...digits)
2
1. For example 41 = (100)
2
1 = (11)
2
= 3.The
following equation illustrated this algorithm.
extractTree(S) =
_
_
_
(t, ) : S = {one(t)}
(t, S

) : s
1
= one(t)
(t
l
, {one(t
r
)} S

: otherwise
(12.13)
Where (t

, S

) = extractTree(S

), t
l
and t
r
are left and right sub-trees of t

.
All other functions, including head(), tail() are as same as before.
Numeric representation doesnt change the performance of binary random
access list, readers can refer to [2] for detailed discussion. Lets take for example,
analyze the average performance (or amortized) of insertion on head algorithm
by using aggregation analysis.
Considering the process of inserting N = 2
m
elements to an empty binary
random access list. The numeric representation of the forest can be listed as
the following.
i forest (MSB ... LSB)
0 0, 0, ..., 0, 0
1 0, 0, ..., 0, 1
2 0, 0, ..., 1, 0
3 0, 0, ..., 1, 1
... ...
2
m
1 1, 1, ..., 1, 1
2
m
1, 0, 0, ..., 0, 0
bits changed 1, 1, 2, ... 2
m1
. 2
m
The LSB of the forest changed every time when there is a new element
inserted, it costs 2
m
units of computation; The next bit changes every two
times due to a linking operation, so it costs 2
m1
units; the bit next to MSB of
the forest changed only one time which links all previous trees to a big tree as
the only one in the forest. This happens at the half time of the total insertion
process, and after the last element is inserted, the MSB ips to 1.
Sum these costs up yield to the total cost T = 1+1+2+3+... +2
m
1
+2
m
=
2
m+1
So the average cost for one insertion is
O(T/N) = O(
2
m+1
2
m
) = O(1) (12.14)
Which proves that the insertion algorithm performs in amortized O(1) con-
stant time. The proof for deletion are left as an exercise to the reader.
456 CHAPTER 12. SEQUENCES, THE LAST BRICK
12.3.1 Imperative binary access list
Its trivial to implement the binary access list by using binary trees, and the
recursion can be eliminated by updating the focused tree in loops. This is left as
an exercise to the reader. In this section, well show some dierent imperative
implementation by using the properties of numeric representation.
Remind the chapter about binary heap. Binary heap can be represented by
implicit array. We can use similar approach that use an array of 1 element to
represent the leaf; use an array of 2 elements to represent a binary tree of height
1; and use an array of 2
m
to represent a complete binary tree of height m.
This brings us the capability of accessing any element with index directly
instead of divide and conquer tree search. However, the tree linking operation
has to be implemented as array copying as the expense.
The following ANSI C code denes such a forest.
#define M sizeof(int) 8
typedef int Key;
struct List {
int n;
Key tree[M];
};
Where n is the number of the elements stored in this forest. Of course we can
avoid limiting the max number of trees by using dynamic arrays, for example
as the following ISO C++ code.
template<typename Key>
struct List {
int n;
vector<vector<key> > tree;
};
For illustration purpose only, we use ANSI C here, readers can nd the
complete ISO C++ example programs along with this book.
Lets review the insertion process, if the rst tree is empty (a Zero digit), we
simply set the rst tree as a leaf of the new element to be inserted; otherwise, the
insertion will cause tree linking anyway, and such linking may be recursive until
it reach a position (digit) that the corresponding tree is empty. The numeric
representation reveals an important fact that if the rst, second, ..., (i 1)-th
trees all exist, and the i-th tree is empty, the result is creating a tree of size 2
i
,
and all the elements together with the new element to be inserted are stored in
this new created tree. Whats more, all trees after position i are kept as same
as before.
Is there any good methods to locate this i position? As we can use binary
number to represent the forest of N element, after a new element is inserted, N
increases to N + 1. Compare the binary form of N and N + 1, we nd that all
bits before i change from 1 to 0, the i-th bit ip from 0 to 1, and all the bits
after i keep unchanged. So we can use bit-wise exclusive or () to detect this
bit. Here is the algorithm.
function Number-Of-Bits(N)
i 0
while
N
2
= 0 do
12.3. NUMERIC REPRESENTATION FOR BINARY RANDOM ACCESS LIST457
N
N
2

i i + 1
return i
i Number-Of-Bits(N (N + 1))
And it can be easily implemented with bit shifting, for example the below
ANSI C code.
int nbits(int n) {
int i=0;
while(n >>= 1)
++i;
return i;
}
So the imperative insertion algorithm can be realized by rst locating the
bit which ip from 0 to 1, then creating a new array of size 2
i
to represent
a complete binary tree, and moving content of all trees before this bit to this
array as well as the new element to be inserted.
function Insert(L, x)
i Number-Of-Bits(N (N + 1))
Tree(L)[i + 1] Create-Array(2
i
)
l 1
Tree(L)[i + 1][l] x
for j [1, i] do
for k [1, 2
j
] do
l l + 1
Tree(L)[i + 1][l] Tree(L)[j][k]
Tree(L)[j] NIL
Size(L) Size(L) + 1
return L
The corresponding ANSI C program is given as the following.
struct List insert(struct List a, Key x) {
int i, j, sz;
Key xs;
i = nbits( (a.n+1) ^ a.n );
xs = a.tree[i] = (Key)malloc(sizeof(Key)(1<<i));
for(j=0, xs++ = x, sz = 1; j<i; ++j, sz < 1) {
memcpy((void)xs, (void)a.tree[j], sizeof(Key)(sz));
xs += sz;
free(a.tree[j]);
a.tree[j] = NULL;
}
++a.n;
return a;
}
However, the performance in theory isnt as good as before. This is be-
cause the linking operation downgrade from O(1) constant time to linear array
copying.
We can again calculate the average (amortized) performance by using ag-
gregation analysis. When insert N = 2
m
elements to an empty list which is
458 CHAPTER 12. SEQUENCES, THE LAST BRICK
represented by implicit binary trees in arrays, the numeric presentation of the
forest of arrays are as same as before except for the cost of bit ipping.
i forest (MSB ... LSB)
0 0, 0, ..., 0, 0
1 0, 0, ..., 0, 1
2 0, 0, ..., 1, 0
3 0, 0, ..., 1, 1
... ...
2
m
1 1, 1, ..., 1, 1
2
m
1, 0, 0, ..., 0, 0
bit change cost 1 2
m
, 1 2
m1
, 2 2
m2
, ... 2
m2
2, 2
m1
1
The LSB of the forest changed every time when there is a new element
inserted, however, it creates leaf tree and performs copying only it changes from
0 to 1, so the cost is half of N unit, which is 2
m1
; The next bit ips as half as
the LSB. Each time the bit gets ipped to 1, it copies the rst tree as well as
the new element to the second tree. the the cost of ipping a bit to 1 in this bit
is 2 units, but not 1; For the MSB, it only ips to 1 at the last time, but the
cost of ipping this bit, is copying all the previous trees to ll the array of size
2
m
.
Summing all to cost and distributing them to the N times of insertion yields
the amortized performance as below.
O(T/N) = O(
12
m
+12
m1
+22
m2
+...+2
m1
1
2
m
)
= O(1 +
m
2
)
= O(m)
(12.15)
As m = O(lg N), so the amortized performance downgrade from constant
time to logarithm, although it is still faster than the normal array insertion
which is O(N) in average.
The random accessing gets a bit faster because we can use array indexing
instead of tree search.
function Get(L, i)
for each t Trees(L) do
if t = NIL then
if i Size(t) then
return t[i]
else
i i Size(t)
Here we skip the error handling such as out of bound indexing etc. The
ANSI C program of this algorithm is like the following.
Key get(struct List a, int i) {
int j, sz;
for(j = 0, sz = 1; j < M; ++j, sz < 1)
if(a.tree[j]) {
if(i < sz)
break;
i -= sz;
}
return a.tree[j][i];
}
12.4. IMPERATIVE PAIRED-ARRAY LIST 459
The imperative removal and random mutating algorithms are left as exercises
to the reader.
Exercise 12.2
1. Please implement the random access algorithms, including looking up and
updating, for binary random access list with numeric representation in
your favorite programming language.
2. Prove that the amortized performance of deletion is O(1) constant time
by using aggregation analysis.
3. Design and implement the binary random access list by implicit array in
your favorite imperative programming language.
12.4 Imperative paired-array list
12.4.1 Denition
In previous chapter about queue, a symmetric solution of paired-array is pre-
sented. It is capable to operate on both ends of the list. Because the nature
that array supports fast random access. It can be also used to realize a fast
random access sequence in imperative setting.
x[n] ... x[2] x[1] y[1] y[2] ... y[m]
Figure 12.6: A paired-array list, which is consist of 2 arrays linking in head-head
manner.
Figure 12.6 shows the design of paired-array list. Tow arrays are linked in
head-head manner. To insert a new element on the head of the sequence, the
element is appended at the end of front list; To append a new element on the
tail of the sequence, the element is appended at the end of rear list;
Here is a ISO C++ code snippet to dene the this data structure.
template<typename Key>
struct List {
int n, m;
vector<Key> front;
vector<Key> rear;
List() : n(0), m(0) {}
int size() { return n + m; }
};
Here we use vector provides in standard library to cover the dynamic memory
management issues, so that we can concentrate on the algorithm design.
460 CHAPTER 12. SEQUENCES, THE LAST BRICK
12.4.2 Insertion and appending
Suppose function Front(L) returns the front array, while Rear(L) returns the
rear array. For illustration purpose, we assume the arrays are dynamic allocated.
inserting and appending can be realized as the following.
function Insert(L, x)
F Front(L)
Size(F) Size(F) + 1
F[Size(F)] x
function Append(L, x)
R Rear(L)
Size(R) Size(R) + 1
R[Size(R)] x
As all the above operations manipulate the front and rear array on tail, they
are all constant O(1) time. And the following are the corresponding ISO C++
programs.
template<typename Key>
void insert(List<Key>& xs, Key x) {
++xs.n;
xs.front.push_back(x);
}
template<typename Key>
void append(List<Key>& xs, Key x) {
++xs.m;
xs.rear.push_back(x);
}
12.4.3 random access
As the inner data structure is array (dynamic array as vector), which supports
random access by nature, its trivial to implement constant time indexing algo-
rithm.
function Get(L, i)
F Front(L)
N Size(F)
if i N then
return F[N i + 1]
else
Rear(L)[i N]
Here the index i [1, |L|] starts from 1. If it is not greater than the size of
front array, the element is stored in front. However, as front and rear arrays are
connect head-to-head, so the elements in front array are in reverse order. We
need locate the element by subtracting the size of front array by i; If the index i
is greater than the size of front array, the element is stored in rear array. Since
elements are stored in normal order in rear, we just need subtract the index i
by an oset which is the size of front array.
Here is the ISO C++ program implements this algorithm.
12.4. IMPERATIVE PAIRED-ARRAY LIST 461
template<typename Key>
Key get(List<Key>& xs, int i) {
if( i < xs.n )
return xs.front[xs.n-i-1];
else
return xs.rear[i-xs.n];
}
The random mutating algorithm is left as an exercise to the reader.
12.4.4 removing and balancing
Removing isnt as simple as insertion and appending. This is because we must
handle the condition that one array (either front or rear) becomes empty due to
removal, while the other still contains elements. In extreme case, the list turns
to be quite unbalanced. So we must x it to resume the balance.
One idea is to trigger this xing when either front or rear array becomes
empty. We just cut the other array in half, and reverse the rst half to form
the new pair. The algorithm is described as the following.
function Balance(L)
F Front(L), R Rear(L)
N Size(F), M Size(R)
if F = then
F Reverse(R[1 ...
M
2
])
R R[
M
2
+ 1...M]
else if R = then
R Reverse(F[1 ...
N
2
])
F F[
N
2
+ 1...N]
Actually, the operations are symmetric for the case that front is empty and
the case that rear is empty. Another approach is to swap the front and rear for
one symmetric case and recursive resumes the balance, then swap the front and
rear back. For example below ISO C++ program uses this method.
template<typename Key>
void balance(List<Key>& xs) {
if(xs.n == 0) {
back_insert_iterator<vector<Key> > i(xs.front);
reverse_copy(xs.rear.begin(), xs.rear.begin() + xs.m/2, i);
xs.rear.erase(xs.rear.begin(), xs.rear.begin() +xs.m/2);
xs.n = xs.m/2;
xs.m -= xs.n;
}
else if(xs.m == 0) {
swap(xs.front, xs.rear);
swap(xs.n, xs.m);
balance(xs);
swap(xs.front, xs.rear);
swap(xs.n, xs.m);
}
}
With Balance algorithm dened, its trivial to implement remove algorithm
both on head and on tail.
462 CHAPTER 12. SEQUENCES, THE LAST BRICK
function Remove-Head(L)
Balance(L)
F Front(L)
if F = then
Remove-Tail(L)
else
Size(F) Size(F) - 1
function Remove-Tail(L)
Balance(L)
R Rear(L)
if R = then
Remove-Head(L)
else
Size(R) Size(R) - 1
There is an edge case for each, that is even after balancing, the array targeted
to perform removal is still empty. This happens that there is only one element
stored in the paired-array list. The solution is just remove this singleton left
element, and the overall list results empty. Below is the ISO C++ program
implements this algorithm.
template<typename Key>
void remove_head(List<Key>& xs) {
balance(xs);
if(xs.front.empty())
remove_tail(xs); //remove the singleton elem in rear
else {
xs.front.pop_back();
--xs.n;
}
}
template<typename Key>
void remove_tail(List<Key>& xs) {
balance(xs);
if(xs.rear.empty())
remove_head(xs); //remove the singleton elem in front
else {
xs.rear.pop_back();
--xs.m;
}
}
Its obvious that the worst case performance is O(N) where N is the number
of elements stored in paired-array list. This happens when balancing is triggered,
and both reverse and shifting are linear operation. However, the amortized
performance of removal is still O(1), the proof is left as exercise to the reader.
Exercise 12.3
1. Implement the random mutating algorithm in your favorite imperative
programming language.
12.5. CONCATENATE-ABLE LIST 463
2. We utilized vector provided in standard library to manage memory dynam-
ically, try to realize a version using plain array and manage the memory
allocation manually. Compare this version and consider how does this
aect the performance?
3. Prove that the amortized performance of removal is O(1) for paired-array
list.
12.5 Concatenate-able list
By using binary random access list, we realized sequence data structure which
supports O(lg N) time insertion and removal on head, as well as random access-
ing element with a given index.
However, its not so easy to concatenate two lists. As both lists are forests of
complete binary trees, we cant merely merge them (Since forests are essentially
list of trees, and for any size, there is at most one tree of that size. Even
concatenate forests directly is not fast). One solution is to push the element
from the rst sequence one by one to a stack and then pop those elements and
insert them to the head of the second one by using cons function. Of course
the stack can be implicitly used in recursion manner, for instance:
concat(s
1
, s
2
) =
_
s
2
: s
1
=
cons(head(s
1
), concat(tail(s
1
), s
2
)) : otherwise
(12.16)
Where function cons(), head() and tail() are dened in previous section.
If the length of the two sequence is N, and M, this method takes O(N lg N)
time repeatedly push all elements from the rst sequence to stacks, and then
takes (N lg(N + M)) to insert the elements in front of the second sequence.
Note that means the upper limit, There is detailed denition for it in [2].
We have already implemented the real-time queue in previous chapter. It
supports O(1) time pop and push. If we can turn the sequence concatenation
to a kind of pushing operation to queue, the performance will be improved to
O(1) as well. Okasaki gave such realization in [6], which can concatenate lists
in constant time.
To represent a concatenate-able list, the data structure designed by Okasaki
is essentially a K-ary tree. The root of the tree stores the rst element in the
list. So that we can access it in constant O(1) time. The sub-trees or children
are all small concatenate-able lists, which are managed by real-time queues.
Concatenating another list to the end is just adding it as the last child, which
is in turn a queue pushing operation. Appending a new element can be realized
as that, rst wrapping the element to a singleton tree, which is a leaf with no
children. Then, concatenate this singleton to nalize appending.
Figure 12.7 illustrates this data structure.
Such recursively designed data structure can be dened in the following
Haskell code.
data CList a = Empty | CList a (Queue (CList a)) deriving (Show, Eq)
464 CHAPTER 12. SEQUENCES, THE LAST BRICK
x[1]
c[1] c[2] ... c[n]
x[2]...x[i] x[i+1]...x[j] x[k]...x[n]
(a) The data structure for list {x
1
, x
2
, ..., x
n
}
x[1]
c[1] c[2] ... c[n] c[n+1]
x[2]...x[i] x[i+1]...x[j] x[k]...x[n] y[1]...y[m]
(b) The result after concatenated with list {y
1
, y
2
, ..., y
m
}
Figure 12.7: Data structure for concatenate-able list
It means that a concatenate-able list is either empty or a K-ary tree, which
again consists of a queue of concatenate-able sub-lists and a root element. Here
we reuse the realization of real-time queue mentioned in previous chapter.
Suppose function clist(x, Q) constructs a concatenate-able list from an el-
ement x, and a queue of sub-lists Q. While function root(s) returns the root
element of such K-ary tree implemented list. and function queue(s) returns the
queue of sub-lists respectively. We can implement the algorithm to concatenate
two lists like this.
concat(s
1
, s
2
) =
_
_
_
s
1
: s
2
=
s
2
: s
1
=
clist(x, push(Q, s
2
)) : otherwise
(12.17)
Where x = root(s
1
) and Q = queue(s
1
). The idea of concatenation is that
if either one of the list to be concatenated is empty, the result is just the other
list; otherwise, we push the second list as the last child to the queue of the rst
list.
Since the push operation is O(1) constant time for a well realized real-time
queue, the performance of concatenation is bound to O(1).
The concat() function can be translated to the below Haskell program.
concat x Empty = x
concat Empty y = y
concat (CList x q) y = CList x (push q y)
12.5. CONCATENATE-ABLE LIST 465
Besides the good performance of concatenation, this design also brings sat-
ised features for adding element both on head and tail.
cons(x, s) = concat(clist(x, ), s) (12.18)
append(s, x) = concat(s, clist(x, )) (12.19)
Its a bit complex to realize the algorithm that removes the rst element
from a concatenate-able list. This is because after the root, which is the rst
element in the sequence got removed, we have to re-construct the rest things, a
queue of sub-lists, to a K-ary tree.
Before diving into the re-construction, lets solve the trivial part rst. Get-
ting the rst element is just returning the root of the K-ary tree.
head(s) = root(s) (12.20)
As we mentioned above, after root being removed, there left all children of
the K-ary tree. Note that all of them are also concatenate-able list, so that one
natural solution is to concatenate them all together to a big list.
concatAll(Q) =
_
: Q =
concat(front(Q), concatAll(pop(Q))) : otherwise
(12.21)
Where function front() just returns the rst element from a queue without
removing it, while pop() does the removing work.
If the queue is empty, it means that there is no children at all, so the result is
also an empty list; Otherwise, we pop the rst child, which is a concatenate-able
list, from the queue, and recursively concatenate all the rest children to a list;
nally, we concatenate this list behind the already popped rst children.
With concatAll() dened, we can then implement the algorithm of removing
the rst element from a list as below.
tail(s) = linkAll(queue(s)) (12.22)
The corresponding Haskell program is given like the following.
head (CList x _) = x
tail (CList _ q) = linkAll q
linkAll q | isEmptyQ q = Empty
| otherwise = link (front q) (linkAll (pop q))
Function isEmptyQ is used to test a queue is empty, it is trivial and we omit
its denition. Readers can refer to the source code along with this book.
linkAll() algorithm actually traverses the queue data structure, and reduces
to a nal result. This remind us of folding mentioned in the chapter of binary
search tree. readers can refer to the appendix of this book for the detailed
466 CHAPTER 12. SEQUENCES, THE LAST BRICK
description of folding. Its quite possible to dene a folding algorithm for queue
instead of list
2
[8].
foldQ(f, e, Q) =
_
e : Q =
f(front(Q), foldQ(f, e, pop(Q))) : otherwise
(12.23)
Function foldQ() takes three parameters, a function f, which is used for
reducing, an initial value e, and the queue Q to be traversed.
Here are some examples to illustrate folding on queue. Suppose a queue Q
contains elements {1, 2, 3, 4, 5} from head to tail.
foldQ(+, 0, Q) = 1 + (2 + (3 + (4 + (5 + 0)))) = 15
foldQ(, 1, Q) = 1 (2 (3 (4 (5 1)))) = 120
foldQ(, 0, Q) = 1 (2 (3 (4 (5 0)))) = 0
Function linkAll can be changed by using foldQ accordingly.
linkAll(Q) = foldQ(link, , Q) (12.24)
The Haskell program can be modied as well.
linkAll = foldQ link Empty
foldQ :: (a b b) b Queue a b
foldQ f z q | isEmptyQ q = z
| otherwise = (front q) f foldQ f z (pop q)
However, the performance of removing cant be ensured in all cases. The
worst case is that, user keeps appending N elements to a empty list, and then
immediately performs removing. At this time, the K-ary tree has the rst
element stored in root. There are N 1 children, all are leaves. So linkAll()
algorithm downgrades to O(N) which is linear time.
The average case is amortized O(1), if the add, append, concatenate and
removing operations are randomly performed. The proof is left as en exercise
to the reader.
Exercise 12.4
1. Can you gure out a solution to append an element to the end of a binary
random access list?
2. Prove that the amortized performance of removal operation is O(1). Hint:
using the bankers method.
3. Implement the concatenate-able list in your favorite imperative language.
2
Some functional programming language, such as Haskell, dened type class, which is a
concept of monoid so that its easy to support folding on a customized data structure.
12.6. FINGER TREE 467
12.6 Finger tree
We havent been able to meet all the performance targets listed at the beginning
of this chapter.
Binary random access list enables to insert, remove element on the head of
sequence, and random access elements fast. However, it performs poor when
concatenates lists. There is no good way to append element at the end of binary
access list.
Concatenate-able list is capable to concatenates multiple lists in a y, and
it performs well for adding new element both on head and tail. However, it
doesnt support randomly access element with a given index.
These two examples bring us some ideas:
In order to support fast manipulation both on head and tail of the se-
quence, there must be some way to easily access the head and tail position;
Tree like data structure helps to turn the random access into divide and
conquer search, if the tree is well balance, the search can be ensured to be
logarithm time.
12.6.1 Denition
Finger tree[6], which was rst invented in 1977, can help to realize ecient
sequence. And it is also well implemented in purely functional settings[5].
As we mentioned that the balance of the tree is critical to ensure the perfor-
mance for search. One option is to use balanced tree as the under ground data
structure for nger tree. For example the 2-3 tree, which is a special B-tree.
(readers can refer to the chapter of B-tree of this book).
A 2-3 tree either contains 2 children or 3. It can be dened as below in
Haskell.
data Node a = Br2 a a | Br3 a a a
In imperative settings, node can be dened with a list of sub nodes, which
contains at most 3 children. For instance the following ANSI C code denes
node.
union Node {
Key keys;
union Node children;
};
Note in this denition, a node can either contain 2 3 keys, or 2 3 sub
nodes. Where key is the type of elements stored in leaf node.
We mark the left-most none-leaf node as the front nger and the right-most
none-leaf node as the rear nger. Since both ngers are essentially 2-3 trees
with all leafs as children, they can be directly represented as list of 2 or 3 leafs.
Of course a nger tree can be empty or contain only one element as leaf.
So the denition of a nger tree is specied like this.
A nger tree is either empty;
or a singleton leaf;
468 CHAPTER 12. SEQUENCES, THE LAST BRICK
or contains three parts: a left nger which is a list contains at most 3
elements; a sub nger tree; and a right nger which is also a list contains
at most 3 elements.
Note that this denition is recursive, so its quite possible to be translated
to functional settings. The following Haskell denition summaries these cases
for example.
data Tree a = Empty
| Lf a
| Tr [a] (Tree (Node a)) [a]
In imperative settings, we can dene the nger tree in a similar manner.
Whats more, we can add a parent eld, so that its possible to back-track to
root from any tree node. Below ANSI C code denes nger tree accordingly.
struct Tree {
union Node front;
union Node rear;
Tree mid;
Tree parent;
};
We can use NIL pointer to represent an empty tree; and a leaf tree contains
only one element in its front nger, both its rear nger and middle part are
empty;
Figure 12.8 and 12.9 show some examples of gure tree.
NIL
(a) An empty tree
a
(b) A singleton leaf
b NIL a
(c) Front nger and rear
nger contain one element
for each, the middle part is
empty
Figure 12.8: Examples of nger tree, 1
The rst example is an empty nger tree; the second one shows the result
after inserting one element to empty, it becomes a leaf of one node; the third
example shows a nger tree contains 2 elements, one is in front nger, the other
is in rear;
If we continuously insert new elements, to the tree, those elements will be
put in the front nger one by one, until it exceeds the limit of 2-3 tree. The 4-th
example shows such condition, that there are 4 elements in front nger, which
isnt balanced any more.
12.6. FINGER TREE 469
e d c b NIL a
(a) After inserting extra 3 ele-
ments to front nger, it exceeds
the 2-3 tree constraint, which
isnt balanced any more
f e a
d c b
(b) The tree resumes bal-
ancing. There are 2 el-
ements in front nger;
The middle part is a
leaf, which contains a 3-
branches 2-3 tree.
Figure 12.9: Examples of nger tree, 2
The last example shows that the nger tree gets xed so that it resumes
balancing. There are two elements in the front nger. Note that the middle
part is not empty any longer. Its a leaf of a 2-3 tree. The content of the leaf is
a tree with 3 branches, each contains an element.
We can express these 5 examples as the following Haskell expression.
Empty
Lf a
[b] Empty [a]
[e, d, c, b] Empty [a]
[f, e] Lf (Br3 d c b) [a]
As we mentioned that the denition of nger tree is recursive. The middle
part besides the front and rear nger is a deeper nger tree, which is dened as
Tree(Node(a)). Every time we go deeper, the Node() is embedded one more
level. if the element type of the rst level tree is a, the element type for the
second level tree is Node(a), the third level is Node(Node(a)), ..., the n-th level
is Node(Node(Node(...(a))...)) = Node
n
(a), where
n
indicates the Node() is
applied n times.
12.6.2 Insert element to the head of sequence
The examples list above actually reveal the typical process that the elements are
inserted one by one to a nger tree. Its possible to summarize these examples
to some cases for insertion on head algorithm.
When we insert an element x to a nger tree T,
If the tree is empty, the result is a leaf which contains the singleton element
x;
If the tree is a singleton leaf of element y, the result is a new nger tree.
The front nger contains the new element x, the rear nger contains the
previous element y; the middle part is a empty nger tree;
470 CHAPTER 12. SEQUENCES, THE LAST BRICK
If the number of elements stored in front nger isnt bigger than the upper
limit of 2-3 tree, which is 3, the new element is just inserted to the head
of front nger;
otherwise, it means that the number of elements stored in front nger
exceeds the upper limit of 2-3 tree. the last 3 elements in front nger is
wrapped in a 2-3 tree and recursively inserted to the middle part. the new
element x is inserted in front of the rest elements in front nger.
Suppose that function leaf(x) creates a leaf of element x, function tree(F, T

, R)
creates a nger tree from three part: F is the front nger, which is a list contains
several elements. Similarity, R is the rear nger, which is also a list. T

is the
middle part which is a deeper nger tree. Function tr3(a, b, c) creates a 2-3 tree
from 3 elements a, b, c; while tr2(a, b) creates a 2-3 tree from 2 elements a and
b.
insertT(x, T) =
_

_
leaf(x) : T =
tree({x}, , {y}) : T = leaf(y)
tree({x, x
1
}, insertT(tr3(x
2
, x
3
, x
4
), T

), R) : T = tree({x
1
, x
2
, x
3
, x
4
}, T

, R)
tree({x} F, T

, R) : otherwise
(12.25)
The performance of this algorithm is dominated by the recursive case. All
the other cases are constant O(1) time. The recursion depth is proportion to
the height of the tree, so the algorithm is bound to O(h) time, where h is the
height. As we use 2-3 tree to ensure that the tree is well balanced, h = O(lg N),
where N is the number of elements stored in the nger tree.
More analysis reveal that the amortized performance of insertT() is O(1)
because we can amortize the expensive recursion case to other trivial cases.
Please refer to [6] and [5] for the detailed proof.
Translating the algorithm yields the below Haskell program.
cons :: a Tree a Tree a
cons a Empty = Lf a
cons a (Lf b) = Tr [a] Empty [b]
cons a (Tr [b, c, d, e] m r) = Tr [a, b] (cons (Br3 c d e) m) r
cons a (Tr f m r) = Tr (a:f) m r
Here we use the LISP naming convention to illustrate inserting a new element
to a list.
The insertion algorithm can also be implemented in imperative approach.
Suppose function Tree() creates an empty tree, that all elds, including front
and rear nger, the middle part inner tree and parent are empty. Function
Node() creates an empty node.
function Prepend-Node(n, T)
r Tree()
p r
Connect-Mid(p, T)
while Full?(Front(T)) do
F Front(T) F = {n
1
, n
2
, n
3
, ...}
Front(T) {n, F[1]} F[1] = n
1
n Node()
12.6. FINGER TREE 471
Children(n) F[2..] F[2..] = {n
2
, n
3
, ...}
p T
T Mid(T)
if T = NIL then
T Tree()
Front(T){n}
else if | Front(T) | = 1 Rear(T) = then
Rear(T) Front(T)
Front(T) {n}
else
Front(T) {n} Front(T)
Connect-Mid(p, T) T
return Flat(r)
Where the notation L[i..] means a sub list of L with the rst i 1 elements
removed, that if L = {a
1
, a
2
, ..., a
n
}, L[i..] = {a
i
, a
i+1
, ..., a
n
}.
Functions Front(), Rear(), Mid(), and Parent() are used to access the
front nger, the rear nger, the middle part inner tree and the parent tree
respectively; Function Children() accesses the children of a node.
Function Connect-Mid(T
1
, T
2
), connect T
2
as the inner middle part tree
of T
1
, and set the parent of T
2
as T
1
if T
2
isnt empty.
In this algorithm, we performs a one pass top-down traverse along the middle
part inner tree if the front nger is full that it cant aord to store any more.
The criteria for full for a 2-3 tree is that the nger contains 3 elements already.
In such case, we extract all the elements except the rst one o, wrap them to a
new node (one level deeper node), and continuously insert this new node to its
middle inner tree. The rst element is left in the front nger, and the element
to be inserted is put in front of it, so that this element becomes the new rst
one in the front nger.
After this traversal, the algorithm either reach an empty tree, or the tree
still has room to hold more element in its front nger. We create a new leaf
for the former case, and perform a trivial list insert to the front nger for the
latter.
During the traversal, we use p to record the parent of the current tree we
are processing. So any new created tree are connected as the middle part inner
tree to p.
Finally, we return the root of the tree r. The last trick of this algorithm is
the Flat() function. In order to simplify the logic, we create an empty ground
tree and set it as the parent of the root. We need eliminate this extra ground
level before return the root. This atten algorithm is realized as the following.
function Flat(T)
while T = NIL T is empty do
T Mid(T)
if T = then
Parent(T)
return T
The while loop test if T is trivial empty, that its not NIL(= ), while both
its front and rear ngers are empty.
Below Python code implements the insertion algorithm for nger tree.
472 CHAPTER 12. SEQUENCES, THE LAST BRICK
def insert(x, t):
return prepend_node(wrap(x), t)
def prepend_node(n, t):
root = prev = Tree()
prev.set_mid(t)
while frontFull(t):
f = t.front
t.front = [n] + f[:1]
n = wraps(f[1:])
prev = t
t = t.mid
if t is None:
t = leaf(n)
elif len(t.front)==1 and t.rear == []:
t = Tree([n], None, t.front)
else:
t = Tree([n]+t.front, t.mid, t.rear)
prev.set_mid(t)
return flat(root)
def flat(t):
while t is not None and t.empty():
t = t.mid
if t is not None:
t.parent = None
return t
The implementation of function set_mid(), frontFull(), wrap(), wraps(),
empty(), and tree constructor are trivial enough, that we skip the detail of
them here. Readers can take these as exercises.
12.6.3 Remove element from the head of sequence
Its easy to implement the reverse operation that remove the rst element from
the list by reversing the insertT() algorithm line by line.
Lets denote F = {f
1
, f
2
, ...} is the front nger list, M is the middle part
inner nger tree. R = {r
1
, r
2
, ...} is the rear nger list of a nger tree, and
R

= {r
2
, r
3
, ...} is the rest of element with the rst one removed from R.
extractT(T) =
_

_
(x, ) : T = leaf(x)
(x, leaf(y)) : T = tree({x}, , {y})
(x, tree({r
1
}, , R

)) : T = tree({x}, , R)
(x, tree(toList(F

), M

, R)) : T = tree({x}, M, R), (F

, M

) = extractT(M)
(f
1
, tree({f
2
, f
3
, ...}, M, R)) : otherwise
(12.26)
Where function toList(T) converts a 2-3 tree to plain list as the following.
toList(T) =
_
{x, y} : T = tr2(x, y)
{x, y, z} : T = tr3(x, y, z)
(12.27)
Here we skip the error handling such as trying to remove element from empty
tree etc. If the nger tree is a leaf, the result after removal is an empty tree; If
12.6. FINGER TREE 473
the nger tree contains two elements, one in the front rear, the other in rear, we
return the element stored in front rear as the rst element, and the resulted tree
after removal is a leaf; If there is only one element in front nger, the middle
part inner tree is empty, and the rear nger isnt empty, we return the only
element in front nger, and borrow one element from the rear nger to front;
If there is only one element in front nger, however, the middle part inner tree
isnt empty, we can recursively remove a node from the inner tree, and atten it
to a plain list to replace the front nger, and remove the original only element
in front nger; The last case says that if the front nger contains more than one
element, we can just remove the rst element from front nger and keep all the
other part unchanged.
Figure 12.10 shows the steps of removing two elements from the head of
a sequence. There are 10 elements stored in the nger tree. When the rst
element is removed, there is still one element left in the front nger. However,
when the next element is removed, the front nger is empty. So we borrow one
tree node from the middle part inner tree. This is a 2-3 tree. it is converted
to a list of 3 elements, and the list is used as the new nger. the middle part
inner tree change from three parts to a singleton leaf, which contains only one
2-3 tree node. There are three elements stored in this tree node.
Below is the corresponding Haskell program for uncons.
uncons :: Tree a (a, Tree a)
uncons (Lf a) = (a, Empty)
uncons (Tr [a] Empty [b]) = (a, Lf b)
uncons (Tr [a] Empty (r:rs)) = (a, Tr [r] Empty rs)
uncons (Tr [a] m r) = (a, Tr (nodeToList f) m r) where (f, m) = uncons m
uncons (Tr f m r) = (head f, Tr (tail f) m r)
And the function nodeToList is dened like this.
nodeToList :: Node a [a]
nodeToList (Br2 a b) = [a, b]
nodeToList (Br3 a b c) = [a, b, c]
Similar as above, we can dene head and tail function from uncons.
head = fst uncons
tail = snd uncons
12.6.4 Handling the ill-formed nger tree when removing
The strategy used so far to remove element from nger tree is a kind of removing
and borrowing. If the front nger becomes empty after removing, we borrows
more nodes from the middle part inner tree. However there exists cases that the
tree is ill-formed, for example, both the front ngers of the tree and its middle
part inner tree are empty. Such ill-formed tree can result from imperatively
splitting, which well introduce later.
Here we developed an imperative algorithm which can remove the rst ele-
ment from nger tree even it is ill-formed. The idea is rst perform a top-down
traverse to nd a sub tree which either has a non-empty front nger or both
its front nger and middle part inner tree are empty. For the former case, we
can safely extract the rst element which is a node from the front nger; For
474 CHAPTER 12. SEQUENCES, THE LAST BRICK
x[10] x[9] x[2] x[1]
NIL
x[8] x[7] x[6] x[5] x[4] x[3]
(a) A sequence of 10 elements represented as
a nger tree
x[9] x[2] x[1]
NIL
x[8] x[7] x[6] x[5] x[4] x[3]
(b) The rst element is removed. There is one
element left in front nger.
x[8] x[7] x[6] x[2] x[1]
x[5] x[4] x[3]
(c) Another element is remove from head. We borrowed one node
from the middle part inner tree, change the node, which is a 2-3
tree to a list, and use it as the new front nger. the middle part
inner tree becomes a leaf of one 2-3 tree node.
Figure 12.10: Examples of remove 2 elements from the head of a sequence
12.6. FINGER TREE 475
1
[] 2 r[1][1] r[1][2] ...
[] 3 r[2][1] r[2][2] ...
...
i
n[i][1] n[i][2] ... r[i][1] r[i][2] ...
...
Figure 12.11: Example of an ill-formed tree. The front nger of the i-th level
sub tree isnt empty.
the latter case, since only the rear nger isnt empty, we can swap it with the
empty front nger, and change it to the former case.
After that, we need examine the node we extracted from the front nger is
leaf node (How to do that? this is left as an exercise to the reader). If not, we
need go on extracting the rst sub node from the children of this node, and left
the rest of other children as the new front nger to the parent of the current
tree. We need repeatedly go up along with the parent eld till the node we
extracted is a leaf. At that time point, we arrive at the root of the tree. Figure
12.12 illustrates this process.
Based on this idea, the following algorithm realizes the removal operation
on head. The algorithm assumes that the tree passed in isnt empty.
function Extract-Head(T)
r Tree()
Connect-Mid(r, T)
while Front(T) = Mid(T) = NIL do
T Mid(T)
if Front(T) = Rear(T) = then
Exchange Front(T) Rear(T)
n Node()
Children(n) Front(T)
repeat
L Children(n) L = {n
1
, n
2
, n
3
, ...}
n L[1] n n
1
Front(T) L[2..] L[2..] = {n
2
, n
3
, ...}
T Parent(T)
if Mid(T) becomes empty then
Mid(T) NIL
until n is a leaf
476 CHAPTER 12. SEQUENCES, THE LAST BRICK
1
[] 2 r[1][1] r[1][2] ...
[] 3 r[2][1] r[2][2] ...
...
i
n[i-1][1] n[i-1][2] ... r[i-1][1] r[i-1][2] ... children of n[i][1]=
i+1
n[i][2] ... r[i][1] r[i][2] ...
...
(a) Extract the rst element n[i][1] and put its children to
the front nger of upper level tree.
x[1] is extracted 1
x[2] x[3] ... 2 r[1][1] r[1][2] ...
n[2][2] n[2][3] ... 3 r[2][1] r[2][2] ...
...
i
n[i-1][2] n[i-1][3] ... r[i-1][1] r[i-1][2] ...
i+1
n[i][2] ... r[i][1] r[i][2] ...
...
(b) Repeat this process i times, and nally
x[1] is extracted.
Figure 12.12: Traverse bottom-up till a leaf is extracted.
12.6. FINGER TREE 477
return (Elem(n), Flat(r))
Note that function Elem(n) returns the only element stored inside leaf node
n. Similar as imperative insertion algorithm, a stub ground tree is used as the
parent of the root, which can simplify the logic a bit. Thats why we need atten
the tree nally.
Below Python program translates the algorithm.
def extract_head(t):
root = Tree()
root.set_mid(t)
while t.front == [] and t.mid is not None:
t = t.mid
if t.front == [] and t.rear != []:
(t.front, t.rear) = (t.rear, t.front)
n = wraps(t.front)
while True: # a repeat-until loop
ns = n.children
n = ns[0]
t.front = ns[1:]
t = t.parent
if t.mid.empty():
t.mid.parent = None
t.mid = None
if n.leaf:
break
return (elem(n), flat(root))
Member function Tree.empty() returns true if all the three parts - the front
nger, the rear nger and the middle part inner tree - are empty. We put a ag
Node.leaf to mark if a node is a leaf or compound node. The exercise of this
section asks the reader to consider some alternatives.
As the ill-formed tree is allowed, the algorithms to access the rst and last
element of the nger tree must be modied, so that they dont blindly return the
rst or last child of the nger as the nger can be empty if the tree is ill-formed.
The idea is quite similar to the Extract-Head, that in case the nger is
empty while the middle part inner tree isnt, we need traverse along with the
inner tree till a point that either the nger becomes non-empty or all the nodes
are stored in the other nger. For instance, the following algorithm can return
the rst leaf node even the tree is ill-formed.
function First-Lf(T)
while Front(T) = Mid(T) = NIL do
T Mid(T)
if Front(T) = Rear(T) = then
n Rear(T)[1]
else
n Front(T)[1]
while n is NOT leaf do
n Children(n)[1]
return n
Note the second loop in this algorithm that it keeps traversing on the rst
sub-node if current node isnt a leaf. So we always get a leaf node and its trivial
478 CHAPTER 12. SEQUENCES, THE LAST BRICK
to get the element inside it.
function First(T)
return Elem(First-Lf(T))
The following Python code translates the algorithm to real program.
def first(t):
return elem(first_leaf(t))
def first_leaf(t):
while t.front == [] and t.mid is not None:
t = t.mid
if t.front == [] and t.rear != []:
n = t.rear[0]
else:
n = t.front[0]
while not n.leaf:
n = n.children[0]
return n
To access the last element is quite similar, and we left it as an exercise to
the reader.
12.6.5 append element to the tail of the sequence
Because nger tree is symmetric, we can give the realization of appending ele-
ment on tail by referencing to insertT() algorithm.
appendT(T, x) =
_

_
leaf(x) : T =
tree({y}, , {x}) : T = leaf(y)
tree(F, appendT(M, tr3(x
1
, x
2
, x
3
)), {x
4
, x}) : T = tree(F, M, {x
1
, x
2
, x
3
, x
4
})
tree(F, M, R {x}) : otherwise
(12.28)
Generally speaking, if the rear nger is still valid 2-3 tree, that the number
of elements is not greater than 4, the new elements is directly appended to rear
nger. Otherwise, we break the rear nger, take the rst 3 elements in rear
nger to create a new 2-3 tree, and recursively append it to the middle part
inner tree. If the nger tree is empty or a singleton leaf, it will be handled in
the rst two cases.
Translating the equation to Haskell yields the below program.
snoc :: Tree a a Tree a
snoc Empty a = Lf a
snoc (Lf a) b = Tr [a] Empty [b]
snoc (Tr f m [a, b, c, d]) e = Tr f (snoc m (Br3 a b c)) [d, e]
snoc (Tr f m r) a = Tr f m (r++[a])
Function name snoc is mirror of cons, which indicates the symmetric
relationship.
Appending new element to the end imperatively is quite similar. The fol-
lowing algorithm realizes appending.
function Append-Node(T, n)
r Tree()
12.6. FINGER TREE 479
p r
Connect-Mid(p, T)
while Full?(Rear(T)) do
R Rear(T) R = {n
1
, n
2
, ..., , n
m1
, n
m
}
Rear(T) {n, Last(R) } last element n
m
n Node()
Children(n) R[1...m1] {n1, n2, ..., n
m1
}
p T
T Mid(T)
if T = NIL then
T Tree()
Front(T) {n}
else if | Rear(T) | = 1 Front(T) = then
Front(T) Rear(T)
Rear(T) {n}
else
Rear(T) Rear(T) {n}
Connect-Mid(p, T) T
return Flat(r)
And the corresponding Python programs is given as below.
def append_node(t, n):
root = prev = Tree()
prev.set_mid(t)
while rearFull(t):
r = t.rear
t.rear = r[-1:] + [n]
n = wraps(r[:-1])
prev = t
t = t.mid
if t is None:
t = leaf(n)
elif len(t.rear) == 1 and t.front == []:
t = Tree(t.rear, None, [n])
else:
t = Tree(t.front, t.mid, t.rear + [n])
prev.set_mid(t)
return flat(root)
12.6.6 remove element from the tail of the sequence
Similar to appendT(), we can realize the algorithm which remove the last ele-
ment from nger tree in symmetric manner of extractT().
We denote the non-empty, non-leaf nger tree as tree(F, M, R), where F is
480 CHAPTER 12. SEQUENCES, THE LAST BRICK
the front nger, M is the middle part inner tree, and R is the rear nger.
removeT(T) =
_

_
(, x) : T = leaf(x)
(leaf(y), x) : T = tree({y}, , {x})
(tree(init(F), , last(F)), x) : T = tree(F, , {x}) F =
(tree(F, M

, toList(R

)), x) : T = tree(F, M, {x}), (M

, R

) = removeT(M)
(tree(F, M, init(R)), last(R)) : otherwise
(12.29)
Function toList(T) is used to atten a 2-3 tree to plain list, which is dened
previously. Function init(L) returns all elements except for the last one in list
L, that if L = {a
1
, a
2
, ..., a
n1
, a
n
}, init(L) = {a
1
, a
2
, ..., a
n1
}. And Function
last(L) returns the last element, so that last(L) = a
n
. Please refer to the
appendix of this book for their implementation.
Algorithm removeT() can be translated to the following Haskell program,
we name it as unsnoc to indicate its the reverse function of snoc.
unsnoc :: Tree a (Tree a, a)
unsnoc (Lf a) = (Empty, a)
unsnoc (Tr [a] Empty [b]) = (Lf a, b)
unsnoc (Tr f@(_:_) Empty [a]) = (Tr (init f) Empty [last f], a)
unsnoc (Tr f m [a]) = (Tr f m (nodeToList r), a) where (m, r) = unsnoc m
unsnoc (Tr f m r) = (Tr f m (init r), last r)
And we can dene a special function last and init for nger tree which
is similar to their counterpart for list.
last = snd unsnoc
init = fst unsnoc
Imperatively removing the element from the end is almost as same as remov-
ing on the head. Although there seems to be a special case, that as we always
store the only element (or sub node) in the front nger while the rear nger
and middle part inner tree are empty (e.g. Tree({n}, NIL, )), it might get
nothing if always try to fetch the last element from rear nger.
This can be solved by swapping the front and the rear nger if the rear is
empty as in the following algorithm.
function Extract-Tail(T)
r Tree()
Connect-Mid(r, T)
while Rear(T) = Mid(T) = NIL do
T Mid(T)
if Rear(T) = Front(T) = then
Exchange Front(T) Rear(T)
n Node()
Children(n) Rear(T)
repeat
L Children(n) L = {n
1
, n
2
, ..., n
m1
, n
m
}
n Last(L) n n
m
Rear(T) L[1...m1] {n
1
, n
2
, ..., n
m1
}
T Parent(T)
if Mid(T) becomes empty then
12.6. FINGER TREE 481
Mid(T) NIL
until n is a leaf
return (Elem(n), Flat(r))
How to access the last element as well as implement this algorithm to working
program are left as exercises.
12.6.7 concatenate
Consider the none-trivial case that concatenate two nger trees T
1
= tree(F
1
, M
1
, R
1
)
and T
2
= tree(F
2
, M
2
, R
2
). One natural idea is to use F
1
as the new front nger
for the concatenated result, and keep R
2
being the new rear nger. The rest of
work is to merge M
1
, R
1
, F
2
and M
2
to a new middle part inner tree.
Note that both R
1
and F
2
are plain lists of node, so the sub-problem is to
realize a algorithm like this.
merge(M
1
, R
1
F
2
, M
2
) =?
More observation reveals that both M
1
and M
2
are also nger trees, except
that they are one level deeper than T
1
and T
2
in terms of Node(a), where a is
the type of element stored in the tree. We can recursively use the strategy that
keep the front nger of M
1
and the rear nger of M
2
, then merge the middle
part inner tree of M
1
, M
2
, as well as the rear nger of M
1
and front nger of
M
2
.
If we denote function front(T) returns the front nger, rear(T) returns
the rear nger, mid(T) returns the middle part inner tree. the above merge()
algorithm can be expressed for non-trivial case as the following.
merge(M
1
, R
1
F
2
, M
2
) = tree(front(M
1
), S, rear(M
2
))
S = merge(mid(M
1
), rear(M
1
) R
1
F
2
front(M
2
), mid(M
2
))
(12.30)
If we look back to the original concatenate solution, it can be expressed as
below.
concat(T
1
, T
2
) = tree(F
1
, merge(M
1
, R
1
R
2
, M
2
), R
2
) (12.31)
And compare it with equation 12.30, its easy to note the fact that concate-
nating is essentially merging. So we have the nal algorithm like this.
concat(T
1
, T
2
) = merge(T
1
, , T
2
) (12.32)
By adding edge cases, the merge() algorithm can be completed as below.
merge(T
1
, S, T
2
) =
_

_
foldR(insertT, T
2
, S) : T
1
=
foldL(appendT, T
1
, S) : T
2
=
merge(, {x} S, T
2
) : T
1
= leaf(x)
merge(T
1
, S {x}, ) : T
2
= leaf(x)
tree(F
1
, merge(M
1
, nodes(R
1
S F
2
), M2), R
2
) : otherwise
(12.33)
Most of these cases are straightforward. If any one of T
1
or T
2
is empty,
the algorithm repeatedly insert/append all elements in S to the other tree;
482 CHAPTER 12. SEQUENCES, THE LAST BRICK
Function foldL and foldR are kinds of for-each process in imperative settings.
The dierence is that foldL processes the list S from left to right while foldR
processes from right to left.
Here are their denition. Suppose list L = {a
1
, a
2
, ..., a
n1
, a
n
}, L

=
{a
2
, a
3
, ..., a
n1
, a
n
} is the rest of elements except for the rst one.
foldL(f, e, L) =
_
e : L =
foldL(f, f(e, a
1
), L

) : otherwise
(12.34)
foldR(f, e, L) =
_
e : L =
f(a
1
, foldR(f, e, L

)) : otherwise
(12.35)
They are detailed explained in the appendix of this book.
If either one of the tree is a leaf, we can insert or append the element of this
leaf to S, so that it becomes the trivial case of concatenating one empty tree
with another.
Function nodes() is used to wrap a list of elements to a list of 2-3 trees.
This is because the contents of middle part inner tree, compare to the contents
of nger, are one level deeper in terms of Node(). Consider the time point
that transforms from recursive case to edge case. Lets suppose M
1
is empty at
that time, we then need repeatedly insert all elements from R
1
S F
2
to M
2
.
However, we cant directly do the insertion. If the element type is a, we can
only insert Node(a) which is 2-3 tree to M
2
. This is just like what we did in
the insertT() algorithm, take out the last 3 elements, wrap them in a 2-3 tree,
and recursive perform insertT(). Here is the denition of nodes().
nodes(L) =
_

_
{tr2(x
1
, x
2
)} : L = {x
1
, x
2
}
{tr3(x
1
, x
2
, x
3
)} : L = {x
1
, x
2
, x
3
}
{tr2(x
1
, x
2
), tr2(x
3
, x
4
)} : L = {x
1
, x
2
, x
3
, x
4
}
{tr3(x
1
, x
2
, x
3
)} nodes({x
4
, x
5
, ...}) : otherwise
(12.36)
Function nodes() follows the constraint of 2-3 tree, that if there are only 2
or 3 elements in the list, it just wrap them in singleton list contains a 2-3 tree;
If there are 4 elements in the lists, it split them into two trees each is consist
of 2 branches; Otherwise, if there are more elements than 4, it wraps the rst
three in to one tree with 3 branches, and recursively call nodes() to process the
rest.
The performance of concatenation is determined by merging. Analyze the
recursive case of merging reveals that the depth of recursion is proportion to the
smaller height of the two trees. As the tree is ensured to be balanced by using
2-3 tree. its height is bound to O(lg N

) where N

is the number of elements.


The edge case of merging performs as same as insertion, (It calls insertT() at
most 8 times) which is amortized O(1) time, and O(lg M) at worst case, where
M is the dierence in height of the two trees. So the overall performance is
bound to O(lg N), where N is the total number of elements contains in two
nger trees.
The following Haskell program implements the concatenation algorithm.
concat :: Tree a Tree a Tree a
concat t1 t2 = merge t1 [] t2
12.6. FINGER TREE 483
Note that there is concat function dened in prelude standard library, so
we need distinct them either by hiding import or take a dierent name.
merge :: Tree a [a] Tree a Tree a
merge Empty ts t2 = foldr cons t2 ts
merge t1 ts Empty = foldl snoc t1 ts
merge (Lf a) ts t2 = merge Empty (a:ts) t2
merge t1 ts (Lf a) = merge t1 (ts++[a]) Empty
merge (Tr f1 m1 r1) ts (Tr f2 m2 r2) = Tr f1 (merge m1 (nodes (r1 ++ ts ++
f2)) m2) r2
And the implementation of nodes() is as below.
nodes :: [a] [Node a]
nodes [a, b] = [Br2 a b]
nodes [a, b, c] = [Br3 a b c]
nodes [a, b, c, d] = [Br2 a b, Br2 c d]
nodes (a:b:c:xs) = Br3 a b c:nodes xs
To concatenate two nger trees T
1
and T
2
in imperative approach, we can
traverse the two trees along with the middle part inner tree till either tree turns
to be empty. In every iteration, we create a new tree T, choose the front nger
of T
1
as the front nger of T; and choose the rear nger of T
2
as the rear nger
of T. The other two ngers (rear nger of T
1
and front nger of T
2
) are put
together as a list, and this list is then balanced grouped to several 2-3 tree nodes
as N. Note that N grows along with traversing not only in terms of length, the
depth of its elements increases by one in each iteration. We attach this new tree
as the middle part inner tree of the upper level result tree to end this iteration.
Once either tree becomes empty, we stop traversing, and repeatedly insert
the 2-3 tree nodes in N to the other non-empty tree, and set it as the new
middle part inner tree of the upper level result.
Below algorithm describes this process in detail.
function Concat(T
1
, T
2
)
return Merge(T
1
, , T
2
)
function Merge(T
1
, N, T
2
)
r Tree()
p r
while T
1
= NIL T
2
= NIL do
T Tree()
Front(T) Front(T
1
)
Rear(T) Rear(T
2
)
Connect-Mid(p, T)
p T
N Nodes(Rear(T
1
) N Front(T
2
))
T
1
Mid(T
1
)
T
2
Mid(T
2
)
if T
1
= NIL then
T T
2
for each n Reverse(N) do
T Prepend-Node(n, T)
else if T
2
= NIL then
484 CHAPTER 12. SEQUENCES, THE LAST BRICK
T T
1
for each n N do
T Append-Node(T, n)
Connect-Mid(p, T)
return Flat(r)
Note that the for-each loops in the algorithm can also be replaced by folding
from left and right respectively. Translating this algorithm to Python program
yields the below code.
def concat(t1, t2):
return merge(t1, [], t2)
def merge(t1, ns, t2):
root = prev = Tree() #sentinel dummy tree
while t1 is not None and t2 is not None:
t = Tree(t1.size + t2.size + sizeNs(ns), t1.front, None, t2.rear)
prev.set_mid(t)
prev = t
ns = nodes(t1.rear + ns + t2.front)
t1 = t1.mid
t2 = t2.mid
if t1 is None:
prev.set_mid(foldR(prepend_node, ns, t2))
elif t2 is None:
prev.set_mid(reduce(append_node, ns, t1))
return flat(root)
Because Python only provides folding function from left as reduce(), a
folding function from right is given like what we shown in pseudo code, that it
repeatedly applies function in reverse order of the list.
def foldR(f, xs, z):
for x in reversed(xs):
z = f(x, z)
return z
The only function in question is how to balanced-group nodes to bigger 2-3
trees. As a 2-3 tree can hold at most 3 sub trees, we can rstly take 3 nodes
and wrap them to a ternary tree if there are more than 4 nodes in the list and
continuously deal with the rest. If there are just 4 nodes, they can be wrapped
to two binary trees. For other cases (there are 3 trees, 2 trees, 1 tree), we simply
wrap them all to a tree.
Denote node list L = {n
1
, n
2
, ...}, The following algorithm realizes this pro-
cess.
function Nodes(L)
N =
while |L| > 4 do
n Node()
Children(n) L[1..3] {n
1
, n
2
, n
3
}
N N {n}
L L[4...] {n
4
, n
5
, ...}
if |L| = 4 then
x Node()
12.6. FINGER TREE 485
Children(x) {L[1], L[2]}
y Node()
Children(y) {L[3], L[4]}
N N {x, y}
else if L = then
n Node()
Children(n) L
N N {n}
return N
Its straight forward to translate the algorithm to below Python program.
Where function wraps() helps to create an empty node, then set a list as the
children of this node.
def nodes(xs):
res = []
while len(xs) > 4:
res.append(wraps(xs[:3]))
xs = xs[3:]
if len(xs) == 4:
res.append(wraps(xs[:2]))
res.append(wraps(xs[2:]))
elif xs != []:
res.append(wraps(xs))
return res
Exercise 12.5
1. Implement the complete nger tree insertion program in your favorite
imperative programming language. Dont check the example programs
along with this chapter before having a try.
2. How to determine a node is a leaf? Does it contain only a raw element
inside or a compound node, which contains sub nodes as children? Note
that we cant distinguish it by testing the size, as there is case that node
contains a singleton leaf, such as node(1, {node(1, {x}}). Try to solve this
problem in both dynamic typed language (e.g. Python, lisp etc) and in
strong static typed language (e.g. C++).
3. Implement the Extract-Tail algorithm in your favorite imperative pro-
gramming language.
4. Realize algorithm to return the last element of a nger tree in both func-
tional and imperative approach. The later one should be able to handle
ill-formed tree.
5. Try to implement concatenation algorithm without using folding. You can
either use recursive methods, or use imperative for-each method.
486 CHAPTER 12. SEQUENCES, THE LAST BRICK
12.6.8 Random access of nger tree
size augmentation
The strategy to provide fast random access, is to turn the looking up into tree-
search. In order to avoid calculating the size of tree many times, we augment
an extra eld to tree and node. The denition should be modied accordingly,
for example the following Haskell denition adds size eld in its constructor.
data Tree a = Empty
| Lf a
| Tr Int [a] (Tree (Node a)) [a]
And the previous ANSI C structure is augmented with size as well.
struct Tree {
union Node front;
union Node rear;
Tree mid;
Tree parent;
int size;
};
Suppose the function tree(s, F, M, R) creates a nger tree from size s, front
nger F, rear nger R, and middle part inner tree M. When the size of the tree
is needed, we can call a size(T) function. It will be something like this.
size(T) =
_
_
_
0 : T =
? : T = leaf(x)
s : T = tree(s, F, M, R)
If the tree is empty, the size is denitely zero; and if it can be expressed as
tree(s, F, M, R), the size is s; however, what if the tree is a singleton leaf? is
it 1? No, it can be 1 only if T = leaf(a) and a isnt a tree node, but a raw
element stored in nger tree. In most cases, the size is not 1, because a can be
again a tree node. Thats why we put a ? in above equation.
The correct way is to call some size function on the tree node as the following.
size(T) =
_
_
_
0 : T =
size

(x) : T = leaf(x)
s : T = tree(s, F, M, R)
(12.37)
Note that this isnt a recursive denition since size = size

, the argument
to size

is either a tree node, which is a 2-3 tree, or a plain element stored in


the nger tree. To uniform these two cases, we can anyway wrap the single
plain element to a tree node of only one element. So that we can express all
the situation as a tree node augmented with a size eld. The following Haskell
program modies the denition of tree node.
data Node a = Br Int [a]
The ANSI C node denition is modied accordingly.
struct Node {
Key key;
struct Node children;
int size;
};
12.6. FINGER TREE 487
We change it from union to structure. Although there is a overhead eld
key if the node isnt a leaf.
Suppose function tr(s, L), creates such a node (either one element being
wrapped or a 2-3 tree) from a size information s, and a list L. Here are some
example.
tr(1, {x}) a tree contains only one element
tr(2, {x, y}) a 2-3 tree contains two elements
tr(3, {x, y, z}) a 2-3 tree contains three elements
So the function size

can be implemented as returning the size information


of a tree node. We have size

(tr(s, L)) = s.
Wrapping an element x is just calling tr(1, {x}). We can dene auxiliary
functions wrap and unwrap, for instance.
wrap(x) = tr(1, {x})
unwrap(n) = x : n = tr(1, {x})
(12.38)
As both front nger and rear nger are lists of tree nodes, in order to calcu-
late the total size of nger, we can provide a size

(L) function, which sums up


size of all nodes stored in the list. Denote L = {a
1
, a
2
, ...} and L

= {a
2
, a
3
, ...}.
size

(L) =
_
0 : L =
size

(a
1
) +size

(L

) : otherwise
(12.39)
Its quite OK to dene size

(L) by using some high order functions. For


example.
size

(L) = sum(map(size

, L)) (12.40)
And we can turn a list of tree nodes into one deeper 2-3 tree and vice-versa.
wraps(L) = tr(size

(L), L)
unwraps(n) = L : n = tr(s, L)
(12.41)
These helper functions are translated to the following Haskell code.
size (Br s _) = s
sizeL = sum (map size)
sizeT Empty = 0
sizeT (Lf a) = size a
sizeT (Tr s _ _ _) = s
Here are the wrap and unwrap auxiliary functions.
wrap x = Br 1 [x]
unwrap (Br 1 [x]) = x
wraps xs = Br (sizeL xs) xs
unwraps (Br _ xs) = xs
We omitted their type denitions for illustration purpose.
In imperative settings, the size information for node and tree can be accessed
through the size eld. And the size of a list of nodes can be summed up for this
eld as the below algorithm.
488 CHAPTER 12. SEQUENCES, THE LAST BRICK
function Size-Nodes(L)
s 0
for n L do
s s+ Size(n)
return s
The following Python code, for example, translates this algorithm by using
standard sum() and map() functions provided in library.
def sizeNs(xs):
return sum(map(lambda x: x.size, xs))
As NIL is typically used to represent empty tree in imperative settings, its
convenient to provide a auxiliary size function to uniformed calculate the size
of tree no matter it is NIL.
function Size-Tr(T)
if T = NIL then
return 0
else
return Size(T)
The algorithm is trivial and we skip its implementation example program.
Modication due to the augmented size
The algorithms have been presented so far need to be modied to accomplish
with the augmented size. For example the insertT() function now inserts a tree
node instead of a plain element.
insertT(x, T) = insertT

(wrap(x), T) (12.42)
The corresponding Haskell program is changed as below.
cons a t = cons (wrap a) t
After being wrapped, x is augmented with size information of 1. In the
implementation of previous insertion algorithm, function tree(F, M, R) is used
to create a nger tree from a front nger, a middle part inner tree and a rear
nger. This function should also be modied to add size information of these
three arguments.
tree

(F, M, R) =
_

_
fromL(F) : M = R =
fromL(R) : M = F =
tree

(unwraps(F

), M

, R) : F = , (F

, M

) = extractT

(M)
tree

(F, M

, unwraps(R

)) : R = , (M

, R

) = removeT

(M)
tree(size

(F) +size(M) +size

(R), F, M, R) : otherwise
(12.43)
Where fromL() helps to turn a list of nodes to a nger tree by repeatedly
inserting all the element one by one to an empty tree.
fromL(L) = foldR(insertT

, , L)
Of course it can be implemented in pure recursive manner without using
folding as well.
12.6. FINGER TREE 489
The last case is the most straightforward one. If none of F, M, and R is
empty, it adds the size of these three part and construct the tree along with
this size information by calling tree(s, F, M, R) function. If both the middle
part inner tree and one of the nger is empty, the algorithm repeatedly insert
all elements stored in the other nger to an empty tree, so that the result is
constructed from a list of tree nodes. If the middle part inner tree isnt empty,
and one of the nger is empty, the algorithm borrows one tree node from the
middle part, either by extracting from head if front nger is empty or removing
from tail if rear nger is empty. Then the algorithm unwraps the borrowed
tree node to a list, and recursively call tree

() function to construct the result.


This algorithm can be translated to the following Haskell code for example.
tree f Empty [] = foldr cons Empty f
tree [] Empty r = foldr cons Empty r
tree [] m r = let (f, m) = uncons m in tree (unwraps f) m r
tree f m [] = let (m, r) = unsnoc m in tree f m (unwraps r)
tree f m r = Tr (sizeL f + sizeT m + sizeL r) f m r
Function tree

() helps to minimize the modication. insertT

() can be real-
ized by using it like the following.
insertT

(x, T) =
_

_
leaf(x) : T =
tree

({x}, , {y}) : T = leaf(x)


tree

({x, x
1
}, insertT

(wraps({x
2
, x
3
, x
4
}), M), R) : T = tree(s, {x
1
, x
2
, x
3
, x
4
}, M, R)
tree

({x} F, M, R) : otherwise
(12.44)
And its corresponding Haskell code is a line by line translation.
cons a Empty = Lf a
cons a (Lf b) = tree [a] Empty [b]
cons a (Tr _ [b, c, d, e] m r) = tree [a, b] (cons (wraps [c, d, e]) m) r
cons a (Tr _ f m r) = tree (a:f) m r
The similar modication for augment size should also be tuned for imperative
algorithms, for example, when a new node is prepend to the head of the nger
tree, we should update size when traverse the tree.
function Prepend-Node(n, T)
r Tree()
p r
Connect-Mid(p, T)
while Full?(Front(T)) do
F Front(T)
Front(T) {n, F[1]}
Size(T) Size(T) + Size(n) update size
n Node()
Children(n) F[2..]
p T
T Mid(T)
if T = NIL then
T Tree()
Front(T){n}
else if | Front(T) | = 1 Rear(T) = then
490 CHAPTER 12. SEQUENCES, THE LAST BRICK
Rear(T) Front(T)
Front(T) {n}
else
Front(T) {n} Front(T)
Size(T) Size(T) + Size(n) update size
Connect-Mid(p, T) T
return Flat(r)
The corresponding Python code are modied accordingly as below.
def prepend_node(n, t):
root = prev = Tree()
prev.set_mid(t)
while frontFull(t):
f = t.front
t.front = [n] + f[:1]
t.size = t.size + n.size
n = wraps(f[1:])
prev = t
t = t.mid
if t is None:
t = leaf(n)
elif len(t.front)==1 and t.rear == []:
t = Tree(n.size + t.size, [n], None, t.front)
else:
t = Tree(n.size + t.size, [n]+t.front, t.mid, t.rear)
prev.set_mid(t)
return flat(root)
Note that the tree constructor is also modied to take a size argument as
the rst parameter. And the leaf() helper function does not only construct
the tree from a node, but also set the size of the tree with the same size of the
node inside it.
For simplication purpose, we skip the detailed description of what are mod-
ied in extractT()

, appendT(), removeT(), and concat() algorithms. They are


left as exercises to the reader.
Split a nger tree at a given position
With size information augmented, its easy to locate a node at given position
by performing a tree search. Whats more, as the nger tree is constructed from
three part F, M, and R; and its nature of recursive, its also possible to split
it into three sub parts with a given position i: the left, the node at i, and the
right part.
The idea is straight forward. Since we have the size information for F,
M, and R. Denote these three sizes as S
f
, S
m
, and S
r
. if the given position
i S
f
, the node must be stored in F, we can go on seeking the node inside F; if
S
f
< i S
f
+S
m
, the node must be stored in M, we need recursively perform
search in M; otherwise, the node should be in R, we need search inside R.
If we skip the error handling of trying to split an empty tree, there is only
one edge case as below.
12.6. FINGER TREE 491
splitAt(i, T) =
_
(, x, ) : T = leaf(x)
... : otherwise
Splitting a leaf results both the left and right parts empty, the node stored
in leaf is the resulting node.
The recursive case handles the three sub cases by comparing i with the
sizes. Suppose function splitAtL(i, L) splits a list of nodes at given position i
into three parts: (A, x, B) = splitAtL(i, L), where x is the i-th node in L, A is
a sub list contains all nodes before position i, and B is a sub list contains all
rest nodes after i.
splitAt(i, T) =
_

_
(, x, ) : T = leaf(x)
(fromL(A), x, tree

(B, M, R) : i S
f
, (A, x, B) = splitAtL(i, F)
(tree

(F, M
l
, A), x, tree

(B, M
r
, R) : S
f
< i S
f
+S
m
(tree

(F, M, A), x, fromL(B)) : otherwise, (A, x, B) = splitAtL(i S


f
S
m
, R)
(12.45)
Where M
l
, x, M
r
, A, B in the thrid case are calculated as the following.
(M
l
, t, M
r
) = splitAt(i S
f
, M)
(A, x, B) = splitAtL(i S
f
size(M
l
), unwraps(t))
And the function splitAtL() is just a linear traverse, since the length of list is
limited not to exceed the constraint of 2-3 tree, the performance is still ensured
to be constant O(1) time. Denote L = {x
1
, x
2
, ...} and L

= {x
2
, x
3
, ...}.
splitAtL(i, L) =
_
_
_
(, x
1
, ) : i = 0 L = {x
1
}
(, x
1
, L

) : i < size

(x
1
)
({x
1
} A, x, B) : otherwise, (A, x, B) = splitAtL(i size

(x
1
), L

)
(12.46)
The solution of splitting is a typical divide and conquer strategy. The per-
formance of this algorithm is determined by the recursive case of searching in
middle part inner tree. Other cases are all constant time as weve analyzed. The
depth of recursion is proportion to the height of the tree h, so the algorithm is
bound to O(h). Because the tree is well balanced (by using 2-3 tree, and all
the insertion/removal algorithms keep the tree balanced), so h = O(lg N) where
N is the number of elements stored in nger tree. The overall performance of
splitting is O(lg N).
Lets rst give the Haskell program for splitAtL() function
splitNodesAt 0 [x] = ([], x, [])
splitNodesAt i (x:xs) | i < size x = ([], x, xs)
| otherwise = let (xs, y, ys) = splitNodesAt (i-size x) xs
in (x:xs, y, ys)
Then the program for splitAt(), as there is already function dened in stan-
dard library with this name, we slightly change the name by adding a apostro-
phe.
splitAt _ (Lf x) = (Empty, x, Empty)
splitAt i (Tr _ f m r)
| i < szf = let (xs, y, ys) = splitNodesAt i f
492 CHAPTER 12. SEQUENCES, THE LAST BRICK
in ((foldr cons Empty xs), y, tree ys m r)
| i < szf + szm = let (m1, t, m2) = splitAt (i-szf) m
(xs, y, ys) = splitNodesAt (i-szf - sizeT m1) (unwraps t)
in (tree f m1 xs, y, tree ys m2 r)
| otherwise = let (xs, y, ys) = splitNodesAt (i-szf -szm) r
in (tree f m xs, y, foldr cons Empty ys)
where
szf = sizeL f
szm = sizeT m
Random access
With the help of splitting at any arbitrary position, its trivial to realize random
access in O(lg N) time. Denote function mid(x) returns the 2-nd element of a
tuple, left(x), and right(x) return the rst element and the 3-rd element of the
tuple respectively.
getAt(S, i) = unwrap(mid(splitAt(i, S))) (12.47)
It rst splits the sequence at position i, then unwraps the node to get the el-
ement stored inside it. When mutate the i-th element of sequence S represented
by nger tree, we rst split it at i, then we replace the middle to what we want
to mutate, and re-construct them to one nger tree by using concatenation.
setAt(S, i, x) = concat(L, insertT(x, R)) (12.48)
where
(L, y, R) = splitAt(i, S)
Whats more, we can also realize a removeAt(S, i) function, which can re-
move the i-th element from sequence S. The idea is rst to split at i, unwrap
and return the element of the i-th node; then concatenate the left and right to
a new nger tree.
removeAt(S, i) = (unwrap(y), concat(L, R)) (12.49)
These handy algorithms can be translated to the following Haskell program.
getAt t i = unwrap x where (_, x, _) = splitAt i t
setAt t i x = let (l, _, r) = splitAt i t in concat l (cons x r)
removeAt t i = let (l, x, r) = splitAt i t in (unwrap x, concat l r)
Imperative random access
As we can directly mutate the tree in imperative settings, its possible to realize
Get-At(T, i) and Set-At(T, i, x) without using splitting. The idea is rstly
implement a algorithm which can apply some operation to a given position. The
following algorithm takes three arguments, a nger tree T, a position index at
i which ranges from zero to the number of elements stored in the tree, and a
function f, which will be applied to the element at i.
function Apply-At(T, i, f)
while Size(T) > 1 do
S
f
Size-Nodes(Front(T))
12.6. FINGER TREE 493
S
m
Size-Tr(Mid(T))
if i < S
f
then
return Lookup-Nodes(Front(T), i, f)
else if i < S
f
+S
m
then
T Mid(T)
i i S
f
else
return Lookup-Nodes(Rear(T), i S
f
S
m
, f)
n First-Lf(T)
x Elem(n)
Elem(n) f(x)
return x
This algorithm is essentially a divide and conquer tree search. It repeatedly
examine the current tree till reach a tree with size of 1 (can it be determined
as a leaf? please consider the ill-formed case and refer to the exercise later).
Every time, it checks the position to be located with the size information of
front nger and middle part inner tree.
If the index i is less than the size of front nger, the location is at some
node in it. The algorithm call a sub procedure to look-up among front nger;
If the index is between the size of front nger and the total size till middle part
inner tree, it means that the location is at some node inside the middle, the
algorithm goes on traverse along the middle part inner tree with an updated
index reduced by the size of front nger; Otherwise it means the location is at
some node in rear nger, the similar looking up procedure is called accordingly.
After this loop, weve got a node, (can be a compound node) with what we
are looking for at the rst leaf inside this node. We can extract the element
out, and apply the function f on it and store the new value back.
The algorithm returns the previous element before applying f as the nal
result.
What hasnt been factored is the algorithm Lookup-Nodes(L, i, f). It
takes a list of nodes, a position index, and a function to be applied. This
algorithm can be implemented by checking every node in the list. If the node is
a leaf, and the index is zero, we are at the right position to be looked up. The
function can be applied on the element stored in this leaf, and the previous value
is returned; Otherwise, we need compare the size of this node and the index to
determine if the position is inside this node and search inside the children of the
node if necessary.
function Lookup-Nodes(L, i, f)
loop
for n L do
if n is leaf i = 0 then
x Elem(n)
Elem(n) f(x)
return x
if i < Size(n) then
L Children(n)
break
i i Size(n)
The following are the corresponding Python code implements the algorithms.
494 CHAPTER 12. SEQUENCES, THE LAST BRICK
def applyAt(t, i, f):
while t.size > 1:
szf = sizeNs(t.front)
szm = sizeT(t.mid)
if i < szf:
return lookupNs(t.front, i, f)
elif i < szf + szm:
t = t.mid
i = i - szf
else:
return lookupNs(t.rear, i - szf - szm, f)
n = first_leaf(t)
x = elem(n)
n.children[0] = f(x)
return x
def lookupNs(ns, i, f):
while True:
for n in ns:
if n.leaf and i == 0:
x = elem(n)
n.children[0] = f(x)
return x
if i < n.size:
ns = n.children
break
i = i - n.size
With auxiliary algorithm that can apply function at a given position, its
trivial to implement the Get-At() and Set-At() by passing special function
for applying.
function Get-At(T, i)
return Apply-At(T, i,
x
.x)
function Set-At(T, i, x)
return Apply-At(T, i,
y
.x)
That is we pass id function to implement getting element at a position,
which doesnt change anything at all; and pass constant function to implement
setting, which set the element to new value by ignoring its previous value.
Imperative splitting
Its not enough to just realizing Apply-At algorithm in imperative settings,
this is because removing element at arbitrary position is also a typical case.
Almost all the imperative nger tree algorithms so far are kind of one-pass
top-down manner. Although we sometimes need to book keeping the root. It
means that we can even realize all of them without using the parent eld.
Splitting operation, however, can be easily implemented by using parent
eld. We can rst perform a top-down traverse along with the middle part
inner tree as long as the splitting position doesnt located in front or rear nger.
After that, we need a bottom-up traverse along with the parent eld of the two
split trees to ll out the necessary elds.
12.6. FINGER TREE 495
function Split-At(T, i)
T
1
Tree()
T
2
Tree()
while S
f
i < S
f
+S
m
do Top-down pass
T

1
Tree()
T

2
Tree()
Front(T

1
) Front(T)
Rear(T

2
) Rear(T)
Connect-Mid(T
1
, T

1
)
Connect-Mid(T
2
, T

2
)
T
1
T

1
T
2
T

2
i i S
f
T Mid(T)
if i < S
f
then
(X, n, Y ) Split-Nodes(Front(T), i)
T

1
From-Nodes(X)
T

2
T
Size(T

2
) Size(T) - Size-Nodes(X) - Size(n)
Front(T

2
) Y
else if S
f
+S
m
i then
(X, n, Y ) Split-Nodes(Rear(T), i S
f
S
m
)
T

2
From-Nodes(Y )
T

1
T
Size(T

1
) Size(T) - Size-Nodes(Y ) - Size(n)
Rear(T

1
) X
Connect-Mid(T
1
, T

1
)
Connect-Mid(T
2
, T

2
)
i i Size-Tr(T

1
)
while n is NOT leaf do Bottom-up pass
(X, n, Y ) Split-Nodes(Children(n), i)
i i Size-Nodes(X)
Rear(T
1
) X
Front(T
2
) Y
Size(T
1
) Sum-Sizes(T
1
)
Size(T
2
) Sum-Sizes(T
2
)
T
1
Parent(T
1
)
T
2
Parent(T
2
)
return (Flat(T
1
), Elem(n), Flat(T
2
))
The algorithm rst creates two trees T
1
and T
2
to hold the split results. Note
that they are created as ground trees which are parents of the roots. The rst
pass is a top-down pass. Suppose S
f
, and S
m
retrieve the size of the front nger
and the size of middle part inner tree respectively. If the position at which the
tree to be split is located at middle part inner tree, we reuse the front nger of
T for new created T

1
, and reuse rear nger of T for T

2
. At this time point, we
cant ll the other elds for T

1
and T

2
, they are left empty, and well nish lling
them in the future. After that, we connect T
1
and T

1
so the latter becomes the
middle part inner tree of the former. The similar connection is done for T
2
and
T

2
as well. Finally, we update the position by deducing it by the size of front
496 CHAPTER 12. SEQUENCES, THE LAST BRICK
nger, and go on traversing along with the middle part inner tree.
When the rst pass nishes, we are at a position that either the splitting
should be performed in front nger, or in rear nger. Splitting the nodes in
nger results a tuple, that the rst part and the third part are lists before and
after the splitting point, while the second part is a node contains the element at
the original position to be split. As both ngers hold at most 3 nodes because
they are actually 2-3 trees, the nodes splitting algorithm can be performed by
a linear search.
function Split-Nodes(L, i)
for j [1, Length(L) ] do
if i < Size(L[j]) then
return (L[1...j 1], L[j], L[j + 1... Length(L) ])
i i Size(L[j])
We next create two new result trees T

1
and T

2
from this tuple, and connected
them as the nal middle part inner tree of T
1
and T
2
.
Next we need perform a bottom-up traverse along with the result trees to
ll out all the empty information we skipped in the rst pass.
We loop on the second part of the tuple, the node, till it becomes a leaf. In
each iteration, we repeatedly splitting the children of the node with an updated
position i. The rst list of nodes returned from splitting is used to ll the rear
nger of T
1
; and the other list of nodes is used to ll the front nger of T
2
.
After that, since all the three parts of a nger tree the front and rear nger,
and the middle part inner tree are lled, we can then calculate the size of the
tree by summing these three parts up.
function Sum-Sizes(T)
return Size-Nodes(Front(T)) +Size-Tr(Mid(T)) +Size-Nodes(Rear(T))
Next, the iteration goes on along with the parent elds of T
1
and T
2
. The
last black-box algorithm is From-Nodes(L), which can create a nger tree
from a list of nodes. It can be easily realized by repeatedly perform insertion
on an empty tree. The implementation is left as an exercise to the reader.
The example Python code for splitting is given as below.
def splitAt(t, i):
(t1, t2) = (Tree(), Tree())
while szf(t) i and i < szf(t) + szm(t):
fst = Tree(0, t.front, None, [])
snd = Tree(0, [], None, t.rear)
t1.set_mid(fst)
t2.set_mid(snd)
(t1, t2) = (fst, snd)
i = i - szf(t)
t = t.mid
if i < szf(t):
(xs, n, ys) = splitNs(t.front, i)
sz = t.size - sizeNs(xs) - n.size
(fst, snd) = (fromNodes(xs), Tree(sz, ys, t.mid, t.rear))
elif szf(t) + szm(t) i:
(xs, n, ys) = splitNs(t.rear, i - szf(t) - szm(t))
sz = t.size - sizeNs(ys) - n.size
12.6. FINGER TREE 497
(fst, snd) = (Tree(sz, t.front, t.mid, xs), fromNodes(ys))
t1.set_mid(fst)
t2.set_mid(snd)
i = i - sizeT(fst)
while not n.leaf:
(xs, n, ys) = splitNs(n.children, i)
i = i - sizeNs(xs)
(t1.rear, t2.front) = (xs, ys)
t1.size = sizeNs(t1.front) + sizeT(t1.mid) + sizeNs(t1.rear)
t2.size = sizeNs(t2.front) + sizeT(t2.mid) + sizeNs(t2.rear)
(t1, t2) = (t1.parent, t2.parent)
return (flat(t1), elem(n), flat(t2))
The program to split a list of nodes at a given position is listed like this.
def splitNs(ns, i):
for j in range(len(ns)):
if i < ns[j].size:
return (ns[:j], ns[j], ns[j+1:])
i = i - ns[j].size
With splitting dened, removing an element at arbitrary position can be
realized trivially by rst performing a splitting, then concatenating the two
result tree to one big tree and return the element at that position.
function Remove-At(T, i)
(T
1
, x, T
2
) Split-At(T, i)
return (x, Concat(T
1
, T
2
) )
Exercise 12.6
1. Another way to realize insertT

() is to force increasing the size eld by


one, so that we neednt write function tree

(). Try to realize the algorithm


by using this idea.
2. Try to handle the augment size information as well as in insertT

() al-
gorithm for the following algorithms (both functional and imperative):
extractT()

, appendT(), removeT(), and concat(). The head, tail, init


and last functions should be kept unchanged. Dont refer to the download-
able programs along with this book before you take a try.
3. In the imperative Apply-At() algorithm, it tests if the size of the current
tree is greater than one. Why dont we test if the current tree is a leaf?
Tell the dierence between these two approaches.
4. Implement the From-Nodes(L) in your favorite imperative programming
language. You can either use looping or create a folding-from-right sub
algorithm.
498 CHAPTER 12. SEQUENCES, THE LAST BRICK
12.7 Notes and short summary
Although we havent been able to give a purely functional realization to match
the O(1) constant time random access as arrays in imperative settings. The
result nger tree data structure achieves an overall well performed sequence. It
manipulates fast in amortized O(1) time both on head an on tail, it can also
concatenates two sequence in logarithmic time as well as break one sequence into
two sub sequences at any position. While neither arrays in imperative settings
nor linked-list in functional settings satises all these goals. Some functional
programming languages adopt this sequence realization in its standard library
[7].
Just as the title of this chapter, weve presented the last corner stone of ele-
mentary data structures in both functional and imperative settings. We neednt
concern about being lack of elementary data structures when solve problems
with some typical algorithms.
For example, when writing a MTF (move-to-front) encoding algorithm[8],
with the help of the sequence data structure explained in this chapter. We can
implement it quite straightforward.
mtf(S, i) = {x} S

where (x, S

) = removeAt(S, i).
In the next following chapters, well rst explains some typical divide and
conquer sorting methods, including quick sort, merge sort and their variants;
then some elementary searching algorithms, and string matching algorithms;
nally, well give a real-world example of algorithms, BWT (Burrows-Wheeler
transform) compressor, which is one of the best compression tool in the world.
Bibliography
[1] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[2] Chris Okasaki. Purely Functional Random-Access Lists. Functional Pro-
gramming Languages and Computer Architecture, June 1995, pages 86-95.
[3] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.
[4] Miran Lipovaca. Learn You a Haskell for Great Good! A Beginners
Guide. No Starch Press; 1 edition April 2011, 400 pp. ISBN: 978-1-59327-
283-8
[5] Ralf Hinze and Ross Paterson. Finger Trees: A Simple General-purpose
Data Structure. in Journal of Functional Programming16:2 (2006), pages
197-217. http://www.soi.city.ac.uk/ ross/papers/FingerTree.html
[6] Guibas, L. J., McCreight, E. M., Plass, M. F., Roberts, J. R. (1977), A
new representation for linear lists. Conference Record of the Ninth Annual
ACM Symposium on Theory of Computing, pp. 49C60.
[7] Generic nger-tree structure. http://hackage.haskell.org/packages/archive/ngertree/0.0/doc/html/Data-
FingerTree.html
[8] Wikipedia. Move-to-front transform. http://en.wikipedia.org/wiki/Move-
to-front transform
499
500 BIBLIOGRAPHY
Part IV
Sorting and Searching
501
AlgoXY 503
Divide and conquer, Quick sort V.S. Merge sort
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
504 Quick sort V.S. Merge sort
Chapter 13
Divide and conquer, Quick
sort V.S. Merge sort
13.1 Introduction
Its proved that the best approximate performance of comparison based sorting
is O(nlg n) [1]. In this chapter, two divide and conquer sorting algorithms are
introduced. Both of them perform in O(nlg n) time. One is quick sort. It is
the most popular sorting algorithm. Quick sort has been well studied, many
programming libraries provide sorting tools based on quick sort.
In this chapter, well rst introduce the idea of quick sort, which demon-
strates the power of divide and conquer strategy well. Several variants will be
explained, and well see when quick sort performs poor in some special cases.
That the algorithm is not able to partition the sequence in balance.
In order to solve the unbalanced partition problem, well next introduce
about merge sort, which ensure the sequence to be well partitioned in all the
cases. Some variants of merge sort, including nature merge sort, bottom-up
merge sort are shown as well.
Same as other chapters, all the algorithm will be realized in both imperative
and functional approaches.
13.2 Quick sort
Consider a teacher arranges a group of kids in kindergarten to stand in a line
for some game. The kids need stand in order of their heights, that the shortest
one stands on the left most, while the tallest stands on the right most. How can
the teacher instruct these kids, so that they can stand in a line by themselves?
There are many strategies, and the quick sort approach can be applied here:
1. The rst kid raises his/her hand. The kids who are shorter than him/her
stands to the left to this child; the kids who are taller than him/her stands
to the right of this child;
2. All the kids move to the left, if there are, repeat the above step; all the
kids move to the right repeat the same step as well.
505
506CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
Figure 13.1: Instruct kids to stand in a line
Suppose a group of kids with their heights as {102, 100, 98, 95, 96, 99, 101, 97}
with [cm] as the unit. The following table illustrate how they stand in order of
height by following this method.
102 100 98 95 96 99 101 97
100 98 95 96 99 101 97 102
98 95 96 99 97 100 101 102
95 96 97 98 99 100 101 102
95 96 97 98 99 100 101 102
95 96 97 98 99 100 101 102
95 96 97 98 99 100 101 102
At the beginning, the rst child with height 102 cm raises his/her hand. We
call this kid the pivot and mark this height in bold. It happens that this is
the tallest kid. So all others stands to the left side, which is represented in the
second row in the above table. Note that the child with height 102 cm is in the
nal ordered position, thus we mark it italic. Next the kid with height 100 cm
raise hand, so the children of heights 98, 95, 96 and 99 cm stand to his/her left,
and there is only 1 child of height 101 cm who is taller than this pivot kid. So he
stands to the right hand. The 3rd row in the table shows this stage accordingly.
After that, the child of 98 cm high is selected as pivot on left hand; while the
child of 101 cm high on the right is selected as pivot. Since there are no other
kids in the unsorted group with 101 cm as pivot, this small group is ordered
already and the kid of height 101 cm is in the nal proper position. The same
method is applied to the group of kids which havent been in correct order until
all of them are stands in the nal position.
13.2.1 Basic version
Summarize the above instruction leads to the recursive description of quick sort.
In order to sort a sequence of elements L.
If L is empty, the result is obviously empty; This is the trivial edge case;
Otherwise, select an arbitrary element in L as a pivot, recursively sort all
elements not greater than L, put the result on the left hand of the pivot,
and recursively sort all elements which are greater than L, put the result
on the right hand of the pivot.
13.2. QUICK SORT 507
Note that the emphasized word and, we dont use then here, which indicates
its quite OK that the recursive sort on the left and right can be done in parallel.
Well return this parallelism topic soon.
Quick sort was rst developed by C. A. R. Hoare in 1960 [1], [15]. What
we describe here is a basic version. Note that it doesnt state how to select the
pivot. Well see soon that the pivot selection aects the performance of quick
sort dramatically.
The most simple method to select the pivot is always choose the rst one so
that quick sort can be formalized as the following.
sort(L) =
_
: L =
sort({x|x L

, x l
1
}) {l
1
} sort({x|x L

, l
1
< x}) : otherwise
(13.1)
Where l
1
is the rst element of the non-empty list L, and L

contains the
rest elements {l
2
, l
3
, ...}. Note that we use Zermelo Frankel expression (ZF ex-
pression for short), which is also known as list comprehension. A ZF expression
{a|a S, p
1
(a), p
2
(a), ...} means taking all element in set S, if it satises all
the predication p
1
, p
2
, .... ZF expression is originally used for representing set,
we extend it to express list for the sake of brevity. There can be duplicated
elements, and dierent permutations represent for dierent list. Please refer to
the appendix about list in this book for detail.
Its quite straightforward to translate this equation to real code if list com-
prehension is supported. The following Haskell code is given for example:
sort [] = []
sort (x:xs) = sort [y | yxs, y x] ++ [x] ++ sort [y | yxs, x < y]
This might be the shortest quick sort program in the world at the time when
this book is written. Even a verbose version is still very expressive:
sort [] = []
sort (x:xs) = as ++ [x] ++ bs where
as = sort [ a | a xs, a x]
bs = sort [ b | b xs, x < b]
There are some variants of this basic quick sort program, such as using
explicit ltering instead of list comprehension. The following Python program
demonstrates this for example:
def sort(xs):
if xs == []:
return []
pivot = xs[0]
as = sort(filter(lambda x : x pivot, xs[1:]))
bs = sort(filter(lambda x : pivot < x, xs[1:]))
return as + [pivot] + bs
13.2.2 Strict weak ordering
We assume the elements are sorted in monotonic none decreasing order so far.
Its quite possible to customize the algorithm, so that it can sort the elements
in other ordering criteria. This is necessary in practice because users may sort
numbers, strings, or other complex objects (even list of lists for example).
508CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
The typical generic solution is to abstract the comparison as a parameter as
we mentioned in chapters about insertion sort and selection sort. Although it
neednt the total ordering, the comparison must satisfy strict weak ordering at
least [17] [16].
For the sake of brevity, we only considering sort the elements by using less
than or equal (equivalent to not greater than) in the rest of the chapter.
13.2.3 Partition
Observing that the basic version actually takes two passes to nd all elements
which are greater than the pivot as well as to nd the others which are not
respectively. Such partition can be accomplished by only one pass. We explicitly
dene the partition as below.
partition(p, L) =
_
_
_
(, ) : L =
({l
1
} A, B) : p(l
1
), (A, B) = partition(p, L

)
(A, {l
1
} B) : p(l
1
)
(13.2)
Note that the operation {x} L is just a cons operation, which only takes
constant time. The quick sort can be modied accordingly.
sort(L) =
_
: L =
sort(A) {l
1
} sort(B) : otherwise, (A, B) = partition(
x
x l
1
, L

)
(13.3)
Translating this new algorithm into Haskell yields the below code.
sort [] = []
sort (x:xs) = sort as ++ [x] ++ sort bs where
(as, bs) = partition ( x) xs
partition _ [] = ([], [])
partition p (x:xs) = let (as, bs) = partition p xs in
if p x then (x:as, bs) else (as, x:bs)
The concept of partition is very critical to quick sort. Partition is also very
important to many other sort algorithms. Well explain how it generally aects
the sorting methodology by the end of this chapter. Before further discussion
about ne tuning of quick sort specic partition, lets see how to realize it in-
place imperatively.
There are many partition methods. The one given by Nico Lomuto [4] [2] will
be used here as its easy to understand. Well show other partition algorithms
soon and see how partitioning aects the performance.
Figure 13.2 shows the idea of this one-pass partition method. The array is
processed from left to right. At any time, the array consists of the following
parts as shown in gure 13.2 (a):
The left most cell contains the pivot; By the end of the partition process,
the pivot will be moved to the nal proper position;
A segment contains all elements which are not greater than the pivot. The
right boundary of this segment is marked as left;
13.2. QUICK SORT 509
x[l] ...not greater than ... ... greater than ... ...?...x[u]
pivot left right
(a) Partition invariant
x[l] x[l+1] ...?...x[u]
pivot left right
(b) Start
x[l] ...not greater than ... x[left] ... greater than ...x[u]
swap
pivot left right
(c) Finish
Figure 13.2: Partition a range of array by using the left most element as pivot.
A segment contains all elements which are greater than the pivot. The
right boundary of this segment is marked as right; It means that elements
between left and right marks are greater than the pivot;
The rest of elements after right mark havent been processed yet. They
may be greater than the pivot or not.
At the beginning of partition, the left mark points to the pivot and the
right mark points to the the second element next to the pivot in the array as
in Figure 13.2 (b); Then the algorithm repeatedly advances the right mark one
element after the other till passes the end of the array.
In every iteration, the element pointed by the right mark is compared with
the pivot. If it is greater than the pivot, it should be among the segment between
the left and right marks, so that the algorithm goes on to advance the right
mark and examine the next element; Otherwise, since the element pointed by
right mark is less than or equal to the pivot (not greater than), it should be
put before the left mark. In order to achieve this, the left mark needs be
advanced by one, then exchange the elements pointed by the left and right
marks.
Once the right mark passes the last element, it means that all the elements
have been processed. The elements which are greater than the pivot have been
moved to the right hand of left mark while the others are to the left hand of this
mark. Note that the pivot should move between the two segments. An extra
exchanging between the pivot and the element pointed by left mark makes this
nal one to the correct location. This is shown by the swap bi-directional arrow
in gure 13.2 (c).
510CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
The left mark (which points the pivot nally) partitions the whole array
into two parts, it is returned as the result. We typically increase the left
mark by one, so that it points to the rst element greater than the pivot for
convenient. Note that the array is modied in-place.
The partition algorithm can be described as the following. It takes three
arguments, the array A, the lower and the upper bound to be partitioned
1
.
1: function Partition(A, l, u)
2: p A[l] the pivot
3: L l the left mark
4: for R [l + 1, u] do iterate on the right mark
5: if (p < A[R]) then negate of < is enough for strict weak order
6: L L + 1
7: Exchange A[L] A[R]
8: Exchange A[L] p
9: return L + 1 The partition position
Below table shows the steps of partitioning the array {3, 2, 5, 4, 0, 1, 6, 7}.
(l) 3 (r) 2 5 4 0 1 6 7 initialize, pivot = 3, l = 1, r = 2
3 (l)(r) 2 5 4 0 1 6 7 2 < 3, advances l, (r = l)
3 (l) 2 (r) 5 4 0 1 6 7 5 > 3, moves on
3 (l) 2 5 (r) 4 0 1 6 7 4 > 3, moves on
3 (l) 2 5 4 (r) 0 1 6 7 0 < 3
3 2 (l) 0 4 (r) 5 1 6 7 Advances l, then swap with r
3 2 (l) 0 4 5 (r) 1 6 7 1 < 3
3 2 0 (l) 1 5 (r) 4 6 7 Advances l, then swap with r
3 2 0 (l) 1 5 4 (r) 6 7 6 > 3, moves on
3 2 0 (l) 1 5 4 6 (r) 7 7 > 3, moves on
1 2 0 3 (l+1) 5 4 6 7 r passes the end, swap pivot and l
This version of partition algorithm can be implemented in ANSI C as the
following.
int partition(Key xs, int l, int u) {
int pivot, r;
for (pivot = l, r = l + 1; r < u; ++r)
if (!(xs[pivot] < xs[r])) {
++l;
swap(xs[l], xs[r]);
}
swap(xs[pivot], xs[l]);
return l + 1;
}
Where swap(a, b) can either be dened as function or a macro. In ISO
C++, swap(a, b) is provided as a function template. the type of the elements
can be dened somewhere or abstracted as a template parameter in ISO C++.
We omit these language specic details here.
With the in-place partition realized, the imperative in-place quick sort can
be accomplished by using it.
1: procedure Quick-Sort(A, l, u)
1
The partition algorithm used here is slightly dierent from the one in [2]. The latter uses
the last element in the slice as the pivot.
13.2. QUICK SORT 511
2: if l < u then
3: m Partition(A, l, u)
4: Quick-Sort(A, l, m1)
5: Quick-Sort(A, m, u)
When sort an array, this procedure is called by passing the whole range as
the lower and upper bounds. Quick-Sort(A, 1, |A|). Note that when l u it
means the array slice is either empty, or just contains only one element, both
can be treated as ordered, so the algorithm does nothing in such cases.
Below ANSI C example program completes the basic in-place quick sort.
void quicksort(Key xs, int l, int u) {
int m;
if (l < u) {
m = partition(xs, l, u);
quicksort(xs, l, m - 1);
quicksort(xs, m, u);
}
}
13.2.4 Minor improvement in functional partition
Before exploring how to improve the partition for basic version quick sort, its
obviously that the one presented so far can be dened by using folding. Please
refer to the appendix A of this book for denition of folding.
partition(p, L) = fold(f(p), (, ), L) (13.4)
Where function f compares the element to the pivot with predicate p (which
is passed to f as a parameter, so that f is in curried form, see appendix A for
detail. Alternatively, f can be a lexical closure which is in the scope of partition,
so that it can access the predicate in this scope.), and update the result pair
accordingly.
f(p, x, (A, B)) =
_
({x} A, B) : p(x)
(A, {x} B) : otherwise(p(x))
(13.5)
Note we actually use pattern-matching style denition. In environment with-
out pattern-matching support, the pair (A, B) should be represented by a vari-
able, for example P, and use access functions to extract its rst and second
parts.
The example Haskell program needs to be modied accordingly.
sort [] = []
sort (x:xs) = sort small ++ [x] ++ sort big where
(small, big) = foldr f ([], []) xs
f a (as, bs) = if a x then (a:as, bs) else (as, a:bs)
Accumulated partition
The partition algorithm by using folding actually accumulates to the result pair
of lists (A, B). That if the element is not greater than the pivot, its accumulated
to A, otherwise to B. We can explicitly express it which save spaces and is
512CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
friendly for tail-recursive call optimization (refer to the appendix A of this book
for detail).
partition(p, L, A, B) =
_
_
_
(A, B) : L =
partition(p, L

, {l
1
} A, B) : p(l
1
)
partition(p, L

, A, {l
1
} B) : otherwise
(13.6)
Where l
1
is the rst element in L if L isnt empty, and L

contains the
rest elements except for l
1
, that L

= {l
2
, l
3
, ...} for example. The quick sort
algorithm then uses this accumulated partition function by passing the
x
x
pivot as the partition predicate.
sort(L) =
_
: L =
sort(A) {l
1
} sort(B) : otherwise
(13.7)
Where A, B are computed by the accumulated partition function dened
above.
(A, B) = partition(
x
x l
1
, L

, , )
Accumulated quick sort
Observe the recursive case in the last quick sort denition. the list concatenation
operations sort(A) {l
1
} sort(B) actually are proportion to the length of the
list to be concatenated. Of course we can use some general solutions introduced
in the appendix A of this book to improve it. Another way is to change the sort
algorithm to accumulated manner. Something like below:
sort

(L, S) =
_
S : L =
... : otherwise
Where S is the accumulator, and we call this version by passing empty list
as the accumulator to start sorting: sort(L) = sort

(L, ). The key intuitive is


that after the partition nishes, the two sub lists need to be recursively sorted.
We can rst recursively sort the list contains the elements which are greater
than the pivot, then link the pivot in front of it and use it as an accumulator
for next step sorting.
Based on this idea, the ... part in above denition can be realized as the
following.
sort

(L, S) =
_
S : L =
sort(A, {l
1
} sort(B, ?)) : otherwise
The problem is whats the accumulator when sorting B. There is an impor-
tant invariant actually, that at every time, the accumulator S holds the elements
have been sorted so far. So that we should sort B by accumulating to S.
sort

(L, S) =
_
S : L =
sort(A, {l
1
} sort(B, S)) : otherwise
(13.8)
The following Haskell example program implements the accumulated quick
sort algorithm.
13.3. PERFORMANCE ANALYSIS FOR QUICK SORT 513
asort xs = asort xs []
asort [] acc = acc
asort (x:xs) acc = asort as (x:asort bs acc) where
(as, bs) = part xs [] []
part [] as bs = (as, bs)
part (y:ys) as bs | y x = part ys (y:as) bs
| otherwise = part ys as (y:bs)
Exercise 13.1
Implement the recursive basic quick sort algorithm in your favorite imper-
ative programming language.
Same as the imperative algorithm, one minor improvement is that besides
the empty case, we neednt sort the singleton list, implement this idea in
the functional algorithm as well.
The accumulated quick sort algorithm developed in this section uses inter-
mediate variable A, B. They can be eliminated by dening the partition
function to mutually recursive call the sort function. Implement this idea
in your favorite functional programming language. Please dont refer to
the downloadable example program along with this book before you try
it.
13.3 Performance analysis for quick sort
Quick sort performs well in practice, however, its not easy to give theoretical
analysis. It needs the tool of probability to prove the average case performance.
Nevertheless, its intuitive to calculate the best case and worst case perfor-
mance. Its obviously that the best case happens when every partition divides
the sequence into two slices with equal size. Thus it takes O(lg n) recursive calls
as shown in gure 13.3.
There are total O(lg n) levels of recursion. In the rst level, it executes one
partition, which processes n elements; In the second level, it executes partition
two times, each processes n/2 elements, so the total time in the second level
bounds to 2O(n/2) = O(n) as well. In the third level, it executes partition four
times, each processes n/4 elements. The total time in the third level is also
bound to O(n); ... In the last level, there are n small slices each contains a
single element, the time is bound to O(n). Summing all the time in each level
gives the total performance of quick sort in best case as O(nlg n).
However, in the worst case, the partition process unluckily divides the se-
quence to two slices with unbalanced lengths in most time. That one slices with
length O(1), the other is O(n). Thus the recursive time degrades to O(n). If
we draw a similar gure, unlike in the best case, which forms a balanced binary
tree, the worst case degrades into a very unbalanced tree that every node has
only one child, while the other is empty. The binary tree turns to be a linked
list with O(n) length. And in every level, all the elements are processed, so the
514CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
n
n / 2 n / 2
n /4 n /4 n /4 n /4
...lg(n)...
1 1 ...n... 1
Figure 13.3: In the best case, quick sort divides the sequence into two slices
with same length.
total performance in worst case is O(n
2
), which is as same poor as insertion sort
and selection sort.
Lets consider when the worst case will happen. One special case is that
all the elements (or most of the elements) are same. Nico Lomutos partition
method deals with such sequence poor. Well see how to solve this problem by
introducing other partition algorithm in the next section.
The other two obvious cases which lead to worst case happen when the
sequence has already in ascending or descending order. Partition the ascending
sequence makes an empty sub list before the pivot, while the list after the
pivot contains all the rest elements. Partition the descending sequence gives an
opponent result.
There are other cases which lead quick sort performs poor. There is no
completely satised solution which can avoid the worst case. Well see some
engineering practice in next section which can make it very seldom to meet the
worst case.
13.3.1 Average case analysis
In average case, quick sort performs well. There is a vivid example that even
the partition divides the list every time to two lists with length 1 to 9. The
performance is still bound to O(nlg n) as shown in [2].
This subsection need some mathematic background, reader can safely skip
to next part.
There are two methods to proof the average case performance, one uses
an important fact that the performance is proportion to the total comparing
operations during quick sort [2]. Dierent with the selections sort that every two
elements have been compared. Quick sort avoid many unnecessary comparisons.
For example suppose a partition operation on list {a
1
, a
2
, a
3
, ..., a
n
}. Select a
1
13.3. PERFORMANCE ANALYSIS FOR QUICK SORT 515
as the pivot, the partition builds two sub lists A = {x
1
, x
2
, ..., x
k
} and B =
{y
1
, y
2
, ..., y
nk1
}. In the rest time of quick sort, The element in A will never
be compared with any elements in B.
Denote the nal sorted result as {a
1
, a
2
, ..., a
n
}, this indicates that if element
a
i
< a
j
, they will not be compared any longer if and only if some element a
k
where a
i
< a
k
< a
j
has ever been selected as pivot before a
i
or a
j
being selected
as the pivot.
That is to say, the only chance that a
i
and a
j
being compared is either a
i
is chosen as pivot or a
j
is chosen as pivot before any other elements in ordered
range a
i+1
< a
i+2
< ... < a
j1
are selected.
Let P(i, ) represent the probability that a
i
and a
j
being compared. We have:
P(i, j) =
2
j i + 1
(13.9)
Since the total number of compare operation can be given as:
C(n) =
n1

i=1
n

j=i+1
P(i, j) (13.10)
Note the fact that if we compared a
i
and a
j
, we wont compare a
j
and a
i
again in the quick sort algorithm, and we never compare a
i
onto itself. Thats
why we set the upper bound of i to n 1; and lower bound of j to i + 1.
Substitute the probability, it yields:
C(n) =
n1

i=1
n

j=i+1
2
j i + 1
=
n1

i=1
ni

k=1
2
k + 1
(13.11)
Using the harmonic series [18]
H
n
= 1 +
1
2
+
1
3
+.... = ln n + +
n
C(n) =
n1

i=1
O(lg n) = O(nlg n) (13.12)
The other method to prove the average performance is to use the recursive
fact that when sorting list of length n, the partition splits the list into two sub
lists with length i and ni1. The partition process itself takes cn time because
it examine every element with the pivot. So we have the following equation.
T(n) = T(i) +T(n i 1) +cn (13.13)
Where T(n) is the total time when perform quick sort on list of length n.
Since i is equally like to be any of 0, 1, ..., n1, taking math expectation to the
516CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
equation gives:
T(n) = E(T(i)) +E(T(n i 1)) +cn
=
1
n
n1

i=0
T(i) +
1
n
n1

i=0
T(n i 1) +cn
=
1
n
n1

i=0
T(i) +
1
n
n1

j=0
T(j) +cn
=
2
n
b1

i=0
T(i) +cn
(13.14)
Multiply by n to both sides, the equation changes to:
nT(n) = 2
n1

i=0
T(i) +cn
2
(13.15)
Substitute n to n 1 gives another equation:
(n 1)T(n 1) = 2
n2

i=0
T(i) +c(n 1)
2
(13.16)
Subtract equation (13.15) and (13.16) can eliminate all the T(i) for 0 i <
n 1.
nT(n) = (n + 1)T(n 1) + 2cn c (13.17)
As we can drop the constant time c in computing performance. The equation
can be one more step changed like below.
T(n)
n + 1
=
T(n 1)
n
+
2c
n + 1
(13.18)
Next we assign n to n 1, n 2, ..., which gives us n 1 equations.
T(n 1)
n
=
T(n 2)
n 1
+
2c
n
T(n 2)
n 1
=
T(n 3)
n 2
+
2c
n 1
...
T(2)
3
=
T(1)
2
+
2c
3
Sum all them up, and eliminate the same components in both sides, we can
deduce to a function of n.
T(n)
n + 1
=
T(1)
2
+ 2c
n+1

k=3
1
k
(13.19)
13.4. ENGINEERING IMPROVEMENT 517
Using the harmonic series mentioned above, the nal result is:
O(
T(n)
n + 1
) = O(
T(1)
2
+ 2c ln n + +
n
) = O(lg n) (13.20)
Thus
O(T(n)) = O(nlg n) (13.21)
Exercise 13.2
Why Lomutos methods performs poor when there are many duplicated
elements?
13.4 Engineering Improvement
Quick sort performs well in most cases as mentioned in previous section. How-
ever, there does exist the worst cases which downgrade the performance to
quadratic. If the data is randomly prepared, such case is rare, however, there
are some particular sequences which lead to the worst case and these kinds of
sequences are very common in practice.
In this section, some engineering practices are introduces which either help
to avoid poor performance in handling some special input data with improved
partition algorithm, or try to uniform the possibilities among cases.
13.4.1 Engineering solution to duplicated elements
As presented in the exercise of above section, N. Lomutos partition method isnt
good at handling sequence with many duplicated elements. Consider a sequence
with n equal elements like: {x, x, ..., x}. There are actually two methods to sort
it.
1. The normal basic quick sort: That we select an arbitrary element, which
is x as the pivot, partition it to two sub sequences, one is {x, x, ..., x},
which contains n 1 elements, the other is empty. then recursively sort
the rst one; this is obviously quadratic O(n
2
) solution.
2. The other way is to only pick those elements strictly smaller than x, and
strictly greater than x. Such partition results two empty sub sequences,
and n elements equal to the pivot. Next we recursively sort the sub se-
quences contains the smaller and the bigger elements, since both of them
are empty, the recursive call returns immediately; The only thing left is
to concatenate the sort results in front of and after the list of elements
which are equal to the pivot.
The latter one performs in O(n) time if all elements are equal. This indicates
an important improvement for partition. That instead of binary partition (split
to two sub lists and a pivot), ternary partition (split to three sub lists) handles
duplicated elements better.
518CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
We can dene the ternary quick sort as the following.
sort(L) =
_
: L =
sort(S) sort(E) sort(G) : otherwise
(13.22)
Where S, E, G are sub lists contains all elements which are less than, equal
to, and greater than the pivot respectively.
S = {x|x L, x < l
1
}
E = {x|x L, x = l
1
}
G = {x|x L, l
1
< x}
The basic ternary quick sort can be implemented in Haskell as the following
example code.
sort [] = []
sort (x:xs) = sort [a | axs, a<x] ++
x:[b | bxs, b==x] ++ sort [c | cxs, c>x]
Note that the comparison between elements must support abstract less-
than and equal-to operations. The basic version of ternary sort takes linear
O(n) time to concatenate the three sub lists. It can be improved by using the
standard techniques of accumulator.
Suppose function sort

(L, A) is the accumulated ternary quick sort deni-


tion, that L is the sequence to be sorted, and the accumulator A contains the
intermediate sorted result so far. We initialize the sorting with an empty accu-
mulator: sort(L) = sort

(L, ).
Its easy to give the trivial edge cases like below.
sort

(L, A) =
_
A : L =
... : otherwise
For the recursive case, as the ternary partition splits to three sub lists S, E, G,
only S and G need recursive sort, E contains all elements equal to the pivot,
which is in correct order thus neednt to be sorted any more. The idea is to
sort G with accumulator A, then concatenate it behind E, then use this result
as the new accumulator, and start to sort S:
sort

(L, A) =
_
A : L =
sort(S, E sort(G, A)) : otherwise
(13.23)
The partition can also be realized with accumulators. It is similar to what
has been developed for the basic version of quick sort. Note that we cant just
pass only one predication for pivot comparison. It actually needs two, one for
less-than, the other for equality testing. For the sake of brevity, we pass the
pivot element instead.
partition(p, L, S, E, G) =
_

_
(S, E, G) : L =
partition(p, L

, {l
1
} S, E, G) : l
1
< p
partition(p, L

, S, {l
1
} E, G) : l
1
= p
partition(p, L

, S, E, {l
1
} G) : p < l
1
(13.24)
13.4. ENGINEERING IMPROVEMENT 519
Where l
1
is the rst element in L if L isnt empty, and L

contains all rest


elements except for l
1
. Below Haskell program implements this algorithm. It
starts the recursive sorting immediately in the edge case of parition.
sort xs = sort xs []
sort [] r = r
sort (x:xs) r = part xs [] [x] [] r where
part [] as bs cs r = sort as (bs ++ sort cs r)
part (x:xs) as bs cs r | x < x = part xs (x:as) bs cs r
| x == x = part xs as (x:bs) cs r
| x > x = part xs as bs (x:cs) r
Richard Bird developed another version in [1], that instead of concatenating
the recursively sorted results, it uses a list of sorted sub lists, and performs
concatenation nally.
sort xs = concat $ pass xs []
pass [] xss = xss
pass (x:xs) xss = step xs [] [x] [] xss where
step [] as bs cs xss = pass as (bs:pass cs xss)
step (x:xs) as bs cs xss | x < x = step xs (x:as) bs cs xss
| x == x = step xs as (x:bs) cs xss
| x > x = step xs as bs (x:cs) xss
2-way partition
The cases with many duplicated elements can also be handled imperatively.
Robert Sedgewick presented a partition method [3], [4] which holds two pointers.
One moves from left to right, the other moves from right to left. The two pointers
are initialized as the left and right boundaries of the array.
When start partition, the left most element is selected as the pivot. Then
the left pointer i keeps advancing to right until it meets any element which is
not less than the pivot; On the other hand
2
, The right pointer j repeatedly
scans to left until it meets any element which is not greater than the pivot.
At this time, all elements before the left pointer i are strictly less than the
pivot, while all elements after the right pointer j are greater than the pivot. i
points to an element which is either greater than or equal to the pivot; while j
points to an element which is either less than or equal to the pivot, the situation
at this stage is illustrated in gure 13.4 (a).
In order to partition all elements less than or equal to the pivot to the left,
and the others to the right, we can exchange the two elements pointed by i, and
j. After that the scan can be resumed until either i meets j, or they overlap.
At any time point during partition. There is invariant that all elements
before i (including the one pointed by i) are not greater than the pivot; while
all elements after j (including the one pointed by j) are not less than the pivot.
The elements between i and j havent been examined yet. This invariant is
shown in gure 13.4 (b).
After the left pointer i meets the right pointer j, or they overlap each other,
we need one extra exchanging to move the pivot located at the rst position to
2
We dont use then because its quite OK to perform the two scans in parallel.
520CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
x[l] ... less than ... x[i] ...?... x[j] ... greater than ...
pivot >=pivot <=pivot
(a) When pointer i, and j stop
x[l] ... not greater than ... ...?... ... not less than ...
pivot i j
(b) Partition invariant
Figure 13.4: Partition a range of array by using the left most element as the
pivot.
the correct place which is pointed by j. Next, the elements between the lower
bound and j as well as the sub slice between i and the upper bound of the array
are recursively sorted.
This algorithm can be described as the following.
1: procedure Sort(A, l, u) sort range [l, u)
2: if u l > 1 then More than 1 element for non-trivial case
3: i l, j u
4: pivot A[l]
5: loop
6: repeat
7: i i + 1
8: until A[i] pivot Need handle error case that i u in fact.
9: repeat
10: j j 1
11: until A[j] pivot Need handle error case that j < l in fact.
12: if j < i then
13: break
14: Exchange A[i] A[j]
15: Exchange A[l] A[j] Move the pivot
16: Sort(A, l, j)
17: Sort(A, i, u)
Consider the extreme case that all elements are equal, this in-place quick sort
will partition the list to two equal length sub lists although it takes
n
2
unneces-
sary swaps. As the partition is balanced, the overall performance is O(nlg n),
which avoid downgrading to quadratic. The following ANSI C example program
implements this algorithm.
void qsort(Key xs, int l, int u) {
int i, j, pivot;
if (l < u - 1) {
pivot = i = l; j = u;
while (1) {
13.4. ENGINEERING IMPROVEMENT 521
while (i < u && xs[++i] < xs[pivot]);
while (j l && xs[pivot] < xs[--j]);
if (j < i) break;
swap(xs[i], xs[j]);
}
swap(xs[pivot], xs[j]);
qsort(xs, l, j);
qsort(xs, i, u);
}
}
Comparing this algorithm with the basic version based on N. Lumotos par-
tition method, we can nd that it swaps fewer elements, because it skips those
have already in proper sides of the pivot.
3-way partition
Its obviously that, we should avoid those unnecessary swapping for the dupli-
cated elements. Whats more, the algorithm can be developed with the idea
of ternary sort (as known as 3-way partition in some materials), that all the
elements which are strictly less than the pivot are put to the left sub slice, while
those are greater than the pivot are put to the right. The middle part holds all
the elements which are equal to the pivot. With such ternary partition, we need
only recursively sort the ones which dier from the pivot. Thus in the above
extreme case, there arent any elements need further sorting. So the overall
performance is linear O(n).
The diculty is how to do the 3-way partition. Jon Bentley and Douglas
McIlroy developed a solution which keeps those elements equal to the pivot at
the left most and right most sides as shown in gure 13.5 (a) [5] [6].
x[l] ... equal ... ... less than... ...?... ... greater than ... ... equal ...
pivot p i j q
(a) Invariant of 3-way partition
... less than... ... equal ... ... greater than ...
pivot i j
(b) Swapping the equal parts to the
middle
Figure 13.5: 3-way partition.
The majority part of scan process is as same as the one developed by Robert
Sedgewick, that i and j keep advancing toward each other until they meet any
element which is greater then or equal to the pivot for i, or less than or equal
to the pivot for j respectively. At this time, if i and j dont meet each other or
overlap, they are not only exchanged, but also examined if the elements pointed
522CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
by them are identical to the pivot. Then necessary exchanging happens between
i and p, as well as j and q.
By the end of the partition process, the elements equal to the pivot need
to be swapped to the middle part from the left and right ends. The number of
such extra exchanging operations are proportion to the number of duplicated
elements. Its zero operation if elements are unique which there is no overhead
in the case. The nal partition result is shown in gure 13.5 (b). After that we
only need recursively sort the less-than and greater-than sub slices.
This algorithm can be given by modifying the 2-way partition as below.
1: procedure Sort(A, l, u)
2: if u l > 1 then
3: i l, j u
4: p l, q u points to the boundaries for equal elements
5: pivot A[l]
6: loop
7: repeat
8: i i + 1
9: until A[i] pivot Skip the error handling for i u
10: repeat
11: j j 1
12: until A[j] pivot Skip the error handling for j < l
13: if j i then
14: break Note the dierence form the above algorithm
15: Exchange A[i] A[j]
16: if A[i] = pivot then Handle the equal elements
17: p p + 1
18: Exchange A[p] A[i]
19: if A[j] = pivot then
20: q q 1
21: Exchange A[q] A[j]
22: if i = j A[i] = pivot then A special case
23: j j 1, i i + 1
24: for k from l to p do Swap the equal elements to the middle part
25: Exchange A[k] A[j]
26: j j 1
27: for k from u 1 down-to q do
28: Exchange A[k] A[i]
29: i i + 1
30: Sort(A, l, j + 1)
31: Sort(A, i, u)
This algorithm can be translated to the following ANSI C example program.
void qsort2(Key xs, int l, int u) {
int i, j, k, p, q, pivot;
if (l < u - 1) {
i = p = l; j = q = u; pivot = xs[l];
while (1) {
while (i < u && xs[++i] < pivot);
while (j l && pivot < xs[--j]);
13.4. ENGINEERING IMPROVEMENT 523
if (j i) break;
swap(xs[i], xs[j]);
if (xs[i] == pivot) { ++p; swap(xs[p], xs[i]); }
if (xs[j] == pivot) { --q; swap(xs[q], xs[j]); }
}
if (i == j && xs[i] == pivot) { --j, ++i; }
for (k = l; k p; ++k, --j) swap(xs[k], xs[j]);
for (k = u-1; k q; --k, ++i) swap(xs[k], xs[i]);
qsort2(xs, l, j + 1);
qsort2(xs, i, u);
}
}
It can be seen that the the algorithm turns to be a bit complex when it
evolves to 3-way partition. There are some tricky edge cases should be handled
with caution. Actually, we just need a ternary partition algorithm. This remind
us the N. Lumotos method, which is straightforward enough to be a start point.
The idea is to change the invariant a bit. We still select the rst element as
the pivot, as shown in gure 13.6, at any time, the left most section contains
elements which are strictly less than the pivot; the next section contains the
elements equal to the pivot; the right most section holds all the elements which
are strictly greater than the pivot. The boundaries of three sections are marked
as i, k, and j respectively. The rest part, which is between k and j are elements
havent been scanned yet.
At the beginning of this algorithm, the less-than section is empty; the
equal-to section contains only one element, which is the pivot; so that i is
initialized to the lower bound of the array, and k points to the element next
to i. The greater-than section is also initialized as empty, thus j is set to the
upper bound.
... less than... ... equal ... ...?... ... greater than ...
i k j
Figure 13.6: 3-way partition based on N. Lumotos method.
When the partition process starts, the elements pointed by k is examined.
If its equal to the pivot, k just advances to the next one; If its greater than
the pivot, we swap it with the last element in the unknown area, so that the
length of greater-than section increases by one. Its boundary j moves to the
left. Since we dont know if the elements swapped to k is still greater than the
pivot, it should be examined again repeatedly. Otherwise, if the element is less
than the pivot, we can exchange it with the rst one in the equal-to section to
resume the invariant. The partition algorithm stops when k meets j.
1: procedure Sort(A, l, u)
2: if u l > 1 then
3: i l, j u, k l + 1
4: pivot A[i]
5: while k < j do
524CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
6: while pivot < A[k] do
7: j j 1
8: Exchange A[k] A[j]
9: if A[k] < pivot then
10: Exchange A[k] A[i]
11: i i + 1
12: k k + 1
13: Sort(A, l, i)
14: Sort(A, j, u)
Compare this one with the previous 3-way partition quick sort algorithm, its
more simple at the cost of more swapping operations. Below ANSI C program
implements this algorithm.
void qsort(Key xs, int l, int u) {
int i, j, k; Key pivot;
if (l < u - 1) {
i = l; j = u; pivot = xs[l];
for (k = l + 1; k < j; ++k) {
while (pivot < xs[k]) { --j; swap(xs[j], xs[k]); }
if (xs[k] < pivot) { swap(xs[i], xs[k]); ++i; }
}
qsort(xs, l, i);
qsort(xs, j, u);
}
}
Exercise 13.3
All the quick sort imperative algorithms use the rst element as the
pivot, another method is to choose the last one as the pivot. Realize
the quick sort algorithms, including the basic version, Sedgewick version,
and ternary (3-way partition) version by using this approach.
13.5 Engineering solution to the worst case
Although the ternary quick sort (3-way partition) solves the issue for duplicated
elements, it cant handle some typical worst cases. For example if many of the
elements in the sequence are ordered, no matter its in ascending or descending
order, the partition result will be two unbalanced sub sequences, one with few
elements, the other contains all the rest.
Consider the two extreme cases, {x
1
< x
2
< ... < x
n
} and {y
1
> y
2
> ... >
y
n
}. The partition results are shown in gure 13.7.
Its easy to give some more worst cases, for example, {x
m
, x
m1
, ..., x
2
, x
1
, x
m+1
, x
m+2
, ...x
n
}
where {x
1
< x
2
< ... < x
n
}; Another one is {x
n
, x
1
, x
n1
, x
2
, ...}. Their parti-
tion result trees are shown in gure 13.8.
Observing that the bad partition happens easily when blindly choose the
rst element as the pivot, there is a popular work around suggested by Robert
Sedgwick in [3]. Instead of selecting the xed position in the sequence, a small
13.5. ENGINEERING SOLUTION TO THE WORST CASE 525
1
2
3
...
n
(a) The partition tree for {x
1
< x
2
< ... < x
n
}, There arent any elements less
than or equal to the pivot (the rst element) in every partition.
n
n-1
n-2
...
1
(b) The partition tree for {y
1
> y
2
> ... > y
n
},
There arent any elements greater than or equal to
the pivot (the rst element) in every partition.
Figure 13.7: The two worst cases.
526CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
m
m-1 m+1
m-2
...
1
m+2
...
n
(a) Except for the rst partition, all the others are unbalanced.
1
2
n
3
n-1
4
...
(b) A zig-zag partition tree.
Figure 13.8: Another two worst cases.
13.5. ENGINEERING SOLUTION TO THE WORST CASE 527
sampling helps to nd a pivot which has lower possibility to cause a bad parti-
tion. One option is to examine the rst element, the middle, and the last one,
then choose the median of these three element. In the worst case, it can ensure
that there is at least one element in the shorter partitioned sub list.
Note that there is one tricky in real-world implementation. Since the index
is typically represented in limited length words. It may cause overow when
calculating the middle index by the naive expression (l + u) / 2. In order to
avoid this issue, it can be accessed as l + (u - l)/2. There are two methods
to nd the median, one needs at most three comparisons [5]; the other is to
move the minimum value to the rst location, the maximum value to the last
location, and the median value to the meddle location by swapping. After that
we can select the middle as the pivot. Below algorithm illustrated the second
idea before calling the partition procedure.
1: procedure Sort(A, l, u)
2: if u l > 1 then
3: m
l+u
2
Need handle overow error in practice
4: if A[m] < A[l] then Ensure A[l] A[m]
5: Exchange A[l] A[r]
6: if A[u 1] < A[l] then Ensure A[l] A[u 1]
7: Exchange A[l] A[u 1]
8: if A[u 1] < A[m] then Ensure A[m] A[u 1]
9: Exchange A[m] A[u]
10: Exchange A[l] A[m]
11: (i, j) Partition(A, l, u)
12: Sort(A, l, i)
13: Sort(A, j, u)
Its obviously that this algorithm performs well in the 4 special worst cases
given above. The imperative implementation of median-of-three is left as exer-
cise to the reader.
However, in purely functional settings, its expensive to randomly access the
middle and the last element. We cant directly translate the imperative median
selection algorithm. The idea of taking a small sampling and then nding the
median element as pivot can be realized alternatively by taking the rst 3. For
example, in the following Haskell program.
qsort [] = []
qsort [x] = [x]
qsort [x, y] = [min x y, max x y]
qsort (x:y:z:rest) = qsort (filter (< m) (s:rest)) ++ [m] ++ qsort (filter ( m) (l:rest)) where
xs = [x, y, z]
[s, m, l] = [minimum xs, median xs, maximum xs]
Unfortunately, none of the above 4 worst cases can be well handled by this
program, this is because the sampling is not good. We need telescope, but not
microscope to prole the whole list to be partitioned. Well see the functional
way to solve the partition problem later.
Except for the median-of-three, there is another popular engineering practice
to get good partition result. instead of always taking the rst element or the
last one as the pivot. One alternative is to randomly select one. For example
as the following modication.
528CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
1: procedure Sort(A, l, u)
2: if u l > 1 then
3: Exchange A[l] A[ Random(l, u) ]
4: (i, j) Partition(A, l, u)
5: Sort(A, l, i)
6: Sort(A, j, u)
The function Random(l, u) returns a random integer i between l and u,
that l i < u. The element at this position is exchanged with the rst one, so
that it is selected as the pivot for the further partition. This algorithm is called
random quick sort [2].
Theoretically, neither median-of-three nor random quick sort can avoid the
worst case completely. If the sequence to be sorted is randomly distributed, no
matter choosing the rst one as the pivot, or the any other arbitrary one are
equally in eect. Considering the underlying data structure of the sequence is
singly linked-list in functional setting, its expensive to strictly apply the idea
of random quick sort in purely functional approach.
Even with this bad news, the engineering improvement still makes sense in
real world programming.
13.6 Other engineering practice
There is some other engineering practice which doesnt focus on solving the bad
partition issue. Robert Sedgewick observed that when the list to be sorted is
short, the overhead introduced by quick sort is relative expense, on the other
hand, the insertion sort performs better in such case [4], [5]. Sedgewick, Bentley
and McIlroy tried dierent threshold, as known as Cut-O, that when there
are lesson than Cut-O elements, the sort algorithm falls back to insertion
sort.
1: procedure Sort(A, l, u)
2: if u l > Cut-Off then
3: Quick-Sort(A, l, u)
4: else
5: Insertion-Sort(A, l, u)
The implementation of this improvement is left as exercise to the reader.
Exercise 13.4
Can you gure out more quick sort worst cases besides the four given in
this section?
Implement median-of-three method in your favorite imperative program-
ming language.
Implement random quick sort in your favorite imperative programming
language.
Implement the algorithm which falls back to insertion sort when the length
of list is small in both imperative and functional approach.
13.7. SIDE WORDS 529
13.7 Side words
Its sometimes called true quick sort if the implementation equipped with most
of the engineering practice we introduced, including insertion sort fall-back with
cut-o, in-place exchanging, choose the pivot by median-of-three method, 3-way-
partition.
The purely functional one, which express the idea of quick sort perfect cant
take all of them. Thus someone think the functional quick sort is essentially
tree sort.
Actually, quick sort does have close relationship with tree sort. Richard Bird
shows how to derive quick sort from binary tree sort by deforestation [7].
Consider a binary search tree creation algorithm called unfold. Which turns
a list of elements into a binary search tree.
unfold(L) =
_
: L =
tree(T
l
, l
1
, T
r
) : otherwise
(13.25)
Where
T
l
= unfold({a|a L

, a l
1
})
T
r
= unfold({a|a L

, l
1
< a})
(13.26)
The interesting point is that, this algorithm creates tree in a dierent way
as we introduced in the chapter of binary search tree. If the list to be unfold
is empty, the result is obviously an empty tree. This is the trivial edge case;
Otherwise, the algorithm set the rst element l
1
in the list as the key of the
node, and recursively creates its left and right children. Where the elements
used to form the left child is those which are less than or equal to the key in
L

, while the rest elements which are greater than the key are used to form the
right child.
Remind the algorithm which turns a binary search tree to a list by in-order
traversing:
toList(T) =
_
: T =
toList(left(T)) {key(T)} toList(right(T)) : otherwise
(13.27)
We can dene quick sort algorithm by composing these two functions.
quickSort = toList unfold (13.28)
The binary search tree built in the rst step of applying unfold is the inter-
mediate result. This result is consumed by toList and dropped after the second
step. Its quite possible to eliminate this intermediate result, which leads to the
basic version of quick sort.
The elimination of the intermediate binary search tree is called deforestation.
This concept is based on Burstle-Darlingtons work [9].
13.8 Merge sort
Although quick sort performs perfectly in average cases, it cant avoid the worst
case no matter what engineering practice is applied. Merge sort, on the other
530CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
kind, ensure the performance is bound to O(nlg n) in all the cases. Its par-
ticularly useful in theoretical algorithm design and analysis. Another feature is
that merge sort is friendly for linked-space settings, which is suitable for sorting
nonconsecutive stored sequences. Some functional programming and dynamic
programming environments adopt merge sort as the standard library sorting
solution, such as Haskel, Python and Java (later than Java 7).
In this section, well rst brief the intuitive idea of merge sort, provide a
basic version. After that, some variants of merge sort will be given including
nature merge sort, and bottom-up merge sort.
13.8.1 Basic version
Same as quick sort, the essential idea behind merge sort is also divide and con-
quer. Dierent from quick sort, merge sort enforces the divide to be strictly
balanced, that it always splits the sequence to be sorted at the middle point.
After that, it recursively sort the sub sequences and merge the sorted two se-
quences to the nal result. The algorithm can be described as the following.
In order to sort a sequence L,
Trivial edge case: If the sequence to be sorted is empty, the result is
obvious empty;
Otherwise, split the sequence at the middle position, recursively sort the
two sub sequences and merge the result.
The basic merge sort algorithm can be formalized with the following equa-
tion.
sort(L) =
_
: L =
merge(sort(L
1
), sort(L
2
)) : otherwise, (L
1
, L
2
) = splitAt(
|L|
2
, L)
(13.29)
Merge
There are two black-boxes in the above merge sort denition, one is the splitAt
function, which splits a list at a given position; the other is the merge function,
which can merge two sorted lists into one.
As presented in the appendix of this book, its trivial to realize splitAt in
imperative settings by using random access. However, in functional settings,
its typically realized as a linear algorithm:
splitAt(n, L) =
_
(, L) : n = 0
({l
1
} A, B) : otherwise, (A, B) = splitAt(n 1, L

)
(13.30)
Where l
1
is the rst element of L, and L

represents the rest elements except


of l
1
if L isnt empty.
The idea of merge can be illustrated as in gure 13.9. Consider two lines of
kids. The kids have already stood in order of their heights. that the shortest
one stands at the rst, then a taller one, the tallest one stands at the end of the
line.
13.8. MERGE SORT 531
Figure 13.9: Two lines of kids pass a door.
Now lets ask the kids to pass a door one by one, every time there can be at
most one kid pass the door. The kids must pass this door in the order of their
height. The one cant pass the door before all the kids who are shorter than
him/her.
Since the two lines of kids have already been sorted, the solution is to ask
the rst two kids, one from each line, compare their height, and let the shorter
kid pass the door; Then they repeat this step until one line is empty, after that,
all the rest kids can pass the door one by one.
This idea can be formalized in the following equation.
merge(A, B) =
_

_
A : B =
B : A =
{a
1
} merge(A

, B) : a
1
b
1
{b
1
} merge(A, B

) : otherwise
(13.31)
Where a
1
and b
1
are the rst elements in list A and B; A

and B

are the
rest elements except for the rst ones respectively. The rst two cases are trivial
edge cases. That merge one sorted list with an empty list results the same sorted
list; Otherwise, if both lists are non-empty, we take the rst elements from the
two lists, compare them, and use the minimum as the rst one of the result,
then recursively merge the rest.
With merge dened, the basic version of merge sort can be implemented
like the following Haskell example code.
msort [] = []
msort [x] = [x]
msort xs = merge (msort as) (msort bs) where
(as, bs) = splitAt (length xs div 2) xs
merge xs [] = xs
532CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
merge [] ys = ys
merge (x:xs) (y:ys) | x y = x : merge xs (y:ys)
| x > y = y : merge (x:xs) ys
Note that, the implementation diers from the algorithm denition that it
treats the singleton list as trivial edge case as well.
Merge sort can also be realized imperatively. The basic version can be de-
veloped as the below algorithm.
1: procedure Sort(A)
2: if |A| > 1 then
3: m
|A|
2

4: X Copy-Array(A[1...m])
5: Y Copy-Array(A[m+ 1...|A|])
6: Sort(X)
7: Sort(Y )
8: Merge(A, X, Y )
When the array to be sorted contains at least two elements, the non-trivial
sorting process starts. It rst copy the rst half to a new created array A, and
the second half to a second new array B. Recursively sort them; and nally
merge the sorted result back to A.
This version uses the same amount of extra spaces of A. This is because the
Merge algorithm isnt in-place at the moment. Well introduce the imperative
in-place merge sort in later section.
The merge process almost does the same thing as the functional denition.
There is a verbose version and a simplied version by using sentinel.
The verbose merge algorithm continuously checks the element from the two
input arrays, picks the smaller one and puts it back to the result array A, it then
advances along the arrays respectively until either one input array is exhausted.
After that, the algorithm appends the rest of the elements in the other input
array to A.
1: procedure Merge(A, X, Y )
2: i 1, j 1, k 1
3: m |X|, n |Y |
4: while i m j n do
5: if X[i] < Y [j] then
6: A[k] X[i]
7: i i + 1
8: else
9: A[k] Y [j]
10: j j + 1
11: k k + 1
12: while i m do
13: A[k] X[i]
14: k k + 1
15: i i + 1
16: while j n do
17: A[k] Y [j]
18: k k + 1
19: j j + 1
13.8. MERGE SORT 533
Although this algorithm is a bit verbose, it can be short in some program-
ming environment with enough tools to manipulate array. The following Python
program is an example.
def msort(xs):
n = len(xs)
if n > 1:
ys = [x for x in xs[:n/2]]
zs = [x for x in xs[n/2:]]
ys = msort(ys)
zs = msort(zs)
xs = merge(xs, ys, zs)
return xs
def merge(xs, ys, zs):
i = 0
while ys != [] and zs != []:
xs[i] = ys.pop(0) if ys[0] < zs[0] else zs.pop(0)
i = i + 1
xs[i:] = ys if ys !=[] else zs
return xs
Performance
Before dive into the improvement of this basic version, lets analyze the perfor-
mance of merge sort. The algorithm contains two steps, divide step, and merge
step. In divide step, the sequence to be sorted is always divided into two sub
sequences with the same length. If we draw a similar partition tree as what
we did for quick sort, it can be found this tree is a perfectly balanced binary
tree as shown in gure 13.3. Thus the height of this tree is O(lg n). It means
the recursion depth of merge sort is bound to O(lg n). Merge happens in every
level. Its intuitive to analyze the merge algorithm, that it compare elements
from two input sequences in pairs, after one sequence is fully examined the rest
one is copied one by one to the result, thus its a linear algorithm proportion to
the length of the sequence. Based on this facts, denote T(n) the time for sorting
the sequence with length n, we can write the recursive time cost as below.
T(n) = T(
n
2
) +T(
n
2
) +cn
= 2T(
n
2
) +cn
(13.32)
It states that the cost consists of three parts: merge sort the rst half takes
T(
n
2
), merge sort the second half takes also T(
n
2
), merge the two results takes
cn, where c is some constant. Solve this equation gives the result as O(nlg n).
Note that, this performance doesnt vary in all cases, as merge sort always
uniformly divides the input.
Another signicant performance indicator is space occupation. However, it
varies a lot in dierent merge sort implementation. The detail space bounds
analysis will be explained in every detailed variants later.
For the basic imperative merge sort, observe that it demands same amount
of spaces as the input array in every recursion, copies the original elements
534CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
to them for recursive sort, and these spaces can be released after this level of
recursion. So the peak space requirement happens when the recursion enters to
the deepest level, which is O(nlg n).
The functional merge sort consume much less than this amount, because the
underlying data structure of the sequence is linked-list. Thus it neednt extra
spaces for merge
3
. The only spaces requirement is for book-keeping the stack
for recursive calls. This can be seen in the later explanation of even-odd split
algorithm.
Minor improvement
Well next improve the basic merge sort bit by bit for both the functional and
imperative realizations. The rst observation is that the imperative merge al-
gorithm is a bit verbose. [2] presents an elegant simplication by using positive
as the sentinel. That we append as the last element to the both ordered
arrays for merging
4
. Thus we neednt test which array is not exhausted. Figure
13.10 illustrates this idea.
a[i] ... a[n] INF
x[1] x[2] ... x[k]
b[j] ... b[m] INF
Figure 13.10: Merge with as sentinels.
1: procedure Merge(A, X, Y )
2: Append(X, )
3: Append(Y, )
4: i 1, j 1
5: for k from 1 to |A| do
6: if X[i] < Y [j] then
7: A[k] X[i]
8: i i + 1
9: else
10: A[k] Y [j]
11: j j + 1
The following ANSI C program imlements this idea. It embeds the merge in-
side. INF is dened as a big constant number with the same type of Key. Where
3
The complex eects caused by lazy evaluation is ignored here, please refer to [7] for detail
4
For sorting in monotonic non-increasing order, can be used instead
13.8. MERGE SORT 535
the type can either be dened elsewhere or we can abstract the type informa-
tion by passing the comparator as parameter. We skip these implementation
and language details here.
void msort(Key xs, int l, int u) {
int i, j, m;
Key as, bs;
if (u - l > 1) {
m = l + (u - l) / 2; / avoid int overflow /
msort(xs, l, m);
msort(xs, m, u);
as = (Key) malloc(sizeof(Key) (m - l + 1));
bs = (Key) malloc(sizeof(Key) (u - m + 1));
memcpy((void)as, (void)(xs + l), sizeof(Key) (m - l));
memcpy((void)bs, (void)(xs + m), sizeof(Key) (u - m));
as[m - l] = bs[u - m] = INF;
for (i = j = 0; l < u; ++l)
xs[l] = as[i] < bs[j] ? as[i++] : bs[j++];
free(as);
free(bs);
}
}
Running this program takes much more time than the quick sort. Besides
the major reason well explain later, one problem is that this version frequently
allocates and releases memories for merging. While memory allocation is one of
the well known bottle-neck in real world as mentioned by Bentley in [4]. One
solution to address this issue is to allocate another array with the same size to
the original one as the working area. The recursive sort for the rst and second
halves neednt allocate any more extra spaces, but use the working area when
merging. Finally, the algorithm copies the merged result back.
This idea can be expressed as the following modied algorithm.
1: procedure Sort(A)
2: B Create-Array(|A|)
3: Sort(A, B, 1, |A|)
4: procedure Sort(A, B, l, u)
5: if u l > 0 then
6: m
l+u
2

7: Sort(A, B, l, m)
8: Sort(A, B, m+ 1, u)
9: Merge(A, B, l, m, u)
This algorithm duplicates another array, and pass it along with the original
array to be sorted to Sort algorithm. In real implementation, this working
area should be released either manually, or by some automatic tool such as GC
(Garbage collection). The modied algorithm Merge also accepts a working
area as parameter.
1: procedure Merge(A, B, l, m, u)
2: i l, j m+ 1, k l
3: while i m j u do
4: if A[i] < A[j] then
536CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
5: B[k] A[i]
6: i i + 1
7: else
8: B[k] A[j]
9: j j + 1
10: k k + 1
11: while i m do
12: B[k] A[i]
13: k k + 1
14: i i + 1
15: while j u do
16: B[k] A[j]
17: k k + 1
18: j j + 1
19: for i from l to u do Copy back
20: A[i] B[i]
By using this minor improvement, the space requirement reduced to O(n)
from O(nlg n). The following ANSI C program implements this minor improve-
ment. For illustration purpose, we manually copy the merged result back to the
original array in a loop. This can also be realized by using standard library
provided tool, such as memcpy.
void merge(Key xs, Key ys, int l, int m, int u) {
int i, j, k;
i = k = l; j = m;
while (i < m && j < u)
ys[k++] = xs[i] < xs[j] ? xs[i++] : xs[j++];
while (i < m)
ys[k++] = xs[i++];
while (j < u)
ys[k++] = xs[j++];
for(; l < u; ++l)
xs[l] = ys[l];
}
void msort(Key xs, Key ys, int l, int u) {
int m;
if (u - l > 1) {
m = l + (u - l) / 2;
msort(xs, ys, l, m);
msort(xs, ys, m, u);
merge(xs, ys, l, m, u);
}
}
void sort(Key xs, int l, int u) {
Key ys = (Key) malloc(sizeof(Key) (u - l));
kmsort(xs, ys, l, u);
free(ys);
}
13.9. IN-PLACE MERGE SORT 537
This new version runs faster than the previous one. In my test machine, it
speeds up about 20% to 25% when sorting 100,000 randomly generated numbers.
The basic functional merge sort can also be ne tuned. Observe that, it
splits the list at the middle point. However, as the underlying data structure to
represent list is singly linked-list, random access at a given position is a linear
operation (refer to appendix A for detail). Alternatively, one can split the list
in an even-odd manner. That all the elements in even position are collected
in one sub list, while all the odd elements are collected in another. As for any
lists, there are either same amount of elements in even and odd positions, or
they dier by one. So this divide strategy always leads to well splitting, thus
the performance can be ensured to be O(nlg n) in all cases.
The even-odd splitting algorithm can be dened as below.
split(L) =
_
_
_
(, ) : L =
({l
1
}, ) : |L| = 1
({l
1
} A, {l
2
} B) : otherwise, (A, B) = split(L

)
(13.33)
When the list is empty, the split result are two empty lists; If there is only
one element in the list, we put this single element, which is at position 1, to the
odd sub list, the even sub list is empty; Otherwise, it means there are at least
two elements in the list, We pick the rst one to the odd sub list, the second
one to the even sub list, and recursively split the rest elements.
All the other functions are kept same, the modied Haskell program is given
as the following.
split [] = ([], [])
split [x] = ([x], [])
split (x:y:xs) = (x:xs, y:ys) where (xs, ys) = split xs
13.9 In-place merge sort
One drawback for the imperative merge sort is that it requires extra spaces for
merging, the basic version without any optimization needs O(nlg n) in peak
time, and the one by allocating a working area needs O(n).
Its nature for people to seek the in-place version merge sort, which can
reuse the original array without allocating any extra spaces. In this section,
well introduce some solutions to realize imperative in-place merge sort.
13.9.1 Naive in-place merge
The rst idea is straightforward. As illustrated in gure 13.11, sub list A, and B
are sorted, when performs in-place merge, the variant ensures that all elements
before i are merged, so that they are in non-decreasing order; every time we
compare the i-th and the j-th elements. If the i-th is less than the j-th, the
marker i just advances one step to the next. This is the easy case. Otherwise,
it means that the j-th element is the next merge result, which should be put in
front of i. In order to achieve this, all elements between i and j, including the
i-th should be shift to the end by one cell. We repeat this process till all the
elements in A and B are put to the correct positions.
538CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
merged xs[i] ...sorted sub list A... xs[j] ...sorted sub list B...
shift if not xs[i] < xs[j]
Figure 13.11: Naive in-place merge
1: procedure Merge(A, l, m, u)
2: while l m m u do
3: if A[l] < A[m] then
4: l l + 1
5: else
6: x A[m]
7: for i m down-to l + 1 do Shift
8: A[i] A[i 1]
9: A[l] x
However, this naive solution downgrades merge sort overall performance to
quadratic O(n
2
)! This is because that array shifting is a linear operation. It is
proportion to the length of elements in the rst sorted sub array which havent
been compared so far.
The following ANSI C program based on this algorithm runs very slow, that
it takes about 12 times slower than the previous version when sorting 10,000
random numbers.
void naive_merge(Key xs, int l, int m, int u) {
int i; Key y;
for(; l < m && m < u; ++l)
if (!(xs[l] < xs[m])) {
y = xs[m++];
for (i = m - 1; i > l; --i) / shift /
xs[i] = xs[i-1];
xs[l] = y;
}
}
void msort3(Key xs, int l, int u) {
int m;
if (u - l > 1) {
m = l + (u - l) / 2;
msort3(xs, l, m);
msort3(xs, m, u);
naive_merge(xs, l, m, u);
}
}
13.9.2 in-place working area
In order to implement the in-place merge sort in O(nlg n) time, when sorting a
sub array, the rest part of the array must be reused as working area for merging.
13.9. IN-PLACE MERGE SORT 539
As the elements stored in the working area, will be sorted later, they cant be
overwritten. We can modify the previous algorithm, which duplicates extra
spaces for merging, a bit to achieve this. The idea is that, every time when
we compare the rst elements in the two sorted sub arrays, if we want to put
the less element to the target position in the working area, we in-turn exchange
what sored in the working area with this element. Thus after merging the two
sub arrays store what the working area previously contains. This idea can be
illustrated in gure 13.12.
... reuse ... A[i] ... ... reuse ... B[j] ...
compare
... merged ... C[k] ...
swap(A[i], C[k]) if A[i] < B[j]
Figure 13.12: Merge without overwriting working area.
In our algorithm, both the two sorted sub arrays, and the working area for
merging are parts of the original array to be sorted. we need supply the following
arguments when merging: the start points and end points of the sorted sub
arrays, which can be represented as ranges; and the start point of the working
area. The following algorithm for example, uses [a, b) to indicate the range
include a, exclude b. It merges sorted range [i, m) and range [j, n) to the working
area starts from k.
1: procedure Merge(A, [i, m), [j, n), k)
2: while i < m j < n do
3: if A[i] < A[j] then
4: Exchange A[k] A[i]
5: i i + 1
6: else
7: Exchange A[k] A[j]
8: j j + 1
9: k k + 1
10: while i < m do
11: Exchange A[k] A[i]
12: i i + 1
13: k k + 1
14: while j < m do
15: Exchange A[k] A[j]
16: j j + 1
17: k k + 1
Note that, the following two constraints must be satised when merging:
1. The working area should be within the bounds of the array. In other
540CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
words, it should be big enough to hold elements exchanged in without
causing any out-of-bound error;
2. The working area can be overlapped with either of the two sorted arrays,
however, it should be ensured that there are not any unmerged elements
being overwritten;
This algorithm can be implemented in ANSI C as the following example.
void wmerge(Key xs, int i, int m, int j, int n, int w) {
while (i < m && j < n)
swap(xs, w++, xs[i] < xs[j] ? i++ : j++);
while (i < m)
swap(xs, w++, i++);
while (j < n)
swap(xs, w++, j++);
}
With this merging algorithm dened, its easy to imagine a solution, which
can sort half of the array; The next question is, how to deal with the rest of the
unsorted part stored in the working area as shown in gure 13.13?
...unsorted... ... sorted ...
Figure 13.13: Half of the array is sorted.
One intuitive idea is to recursively sort another half of the working area, thus
there are only
1
4
elements havent been sorted yet. Which is shown in gure
13.14. The key point at this stage is that we must merge the sorted
1
4
elements
B with the sorted
1
2
elements A sooner or later.
unsorted 1/4 sorted B 1/4 ... ... sorted A 1/2 ... ...
Figure 13.14: A and B must be merged at sometime.
Is the working area left, which only holds
1
4
elements, big enough for merging
A and B? Unfortunately, it isnt in the settings shown in gure 13.14.
However, the second constraint mentioned before gives us a hint, that we
can exploit it by arranging the working area to overlap with either sub array
if we can ensure the unmerged elements wont be overwritten under some well
designed merging schema.
Actually, instead of sorting the second half of the working area, we can sort
the rst half, and put the working area between the two sorted arrays as shown
in gure 13.15 (a). This setup eects arranging the working area to overlap
with the sub array A. This idea is proposed in [10].
Lets consider two extreme cases:
1. All the elements in B are less than any element in A. In this case, the
merge algorithm nally moves the whole contents of B to the working
13.9. IN-PLACE MERGE SORT 541
sorted B 1/4 work area ... ... sorted A 1/2 ... ...
(a)
work area 1/4 ... ... ... ... merged 3/4 ... ... ... ...
(b)
Figure 13.15: Merge A and B with the working area.
area; the cells of B holds what previously stored in the working area; As
the size of area is as same as B, its OK to exchange their contents;
2. All the elements in A are less than any element in B. In this case,
the merge algorithm continuously exchanges elements between A and the
working area. After all the previous
1
4
cells in the working area are lled
with elements from A, the algorithm starts to overwrite the rst half of
A. Fortunately, the contents being overwritten are not those unmerged
elements. The working area is in eect advances toward the end of the
array, and nally moves to the right side; From this time point, the merge
algorithm starts exchanging contents in B with the working area. The
result is that the working area moves to the left most side which is shown
in gure 13.15 (b).
We can repeat this step, that always sort the second half of the unsorted
part, and exchange the sorted sub array to the rst half as working area. Thus
we keep reducing the working area from
1
2
of the array,
1
4
of the array,
1
8
of
the array, ... The scale of the merge problem keeps reducing. When there is
only one element left in the working area, we neednt sort it any more since the
singleton array is sorted by nature. Merging a singleton array to the other is
equivalent to insert the element. In practice, the algorithm can nalize the last
few elements by switching to insertion sort.
The whole algorithm can be described as the following.
1: procedure Sort(A, l, u)
2: if u l > 0 then
3: m
l+u
2

4: w l +u m
5: Sort(A, l, m, w) The second half contains sorted elements
6: while w l > 1 do
7: u

w
8: w
l+u

2
Ensure the working area is big enough
9: Sort(A, w, u

, l) The rst half holds the sorted elements


10: Merge(A, [l, l +u

w], [u

, u], w)
11: for i w down-to l do Switch to insertion sort
12: j i
13: while j u A[j] < A[j 1] do
14: Exchange A[j] A[j 1]
542CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
15: j j + 1
Note that in order to satisfy the rst constraint, we must ensure the working
area is big enough to hold all exchanged in elements, thats way we round it by
ceiling when sort the second half of the working area. Note that we actually
pass the ranges including the end points to the algorithm Merge.
Next, we develop a Sort algorithm, which mutually recursive call Sort and
exchange the result to the working area.
1: procedure Sort(A, l, u, w)
2: if u l > 0 then
3: m
l+u
2

4: Sort(A, l, m)
5: Sort(A, m+ 1, u)
6: Merge(A, [l, m], [m+ 1, u], w)
7: else Exchange all elements to the working area
8: while l u do
9: Exchange A[l] A[w]
10: l l + 1
11: w w + 1
Dierent from the naive in-place sort, this algorithm doesnt shift the array
during merging. The main algorithm reduces the unsorted part in sequence of
n
2
,
n
4
,
n
8
, ..., it takes O(lg n) steps to complete sorting. In every step, It recursively
sorts half of the rest elements, and performs linear time merging.
Denote the time cost of sorting n elements as T(n), we have the following
equation.
T(n) = T(
n
2
) +c
n
2
+T(
n
4
) +c
3n
4
+T(
n
8
) +c
7n
8
+... (13.34)
Solving this equation by using telescope method, gets the result O(nlg n).
The detailed process is left as exercise to the reader.
The following ANSI C code completes the implementation by using the ex-
ample wmerge program given above.
void imsort(Key xs, int l, int u);
void wsort(Key xs, int l, int u, int w) {
int m;
if (u - l > 1) {
m = l + (u - l) / 2;
imsort(xs, l, m);
imsort(xs, m, u);
wmerge(xs, l, m, m, u, w);
}
else
while (l < u)
swap(xs, l++, w++);
}
void imsort(Key xs, int l, int u) {
int m, n, w;
if (u - l > 1) {
m = l + (u - l) / 2;
13.9. IN-PLACE MERGE SORT 543
w = l + u - m;
wsort(xs, l, m, w); / the last half contains sorted elements /
while (w - l > 2) {
n = w;
w = l + (n - l + 1) / 2; / ceiling /
wsort(xs, w, n, l); / the first half contains sorted elements /
wmerge(xs, l, l + n - w, n, u, w);
}
for (n = w; n > l; --n) /switch to insertion sort/
for (m = n; m < u && xs[m] < xs[m-1]; ++m)
swap(xs, m, m - 1);
}
}
However, this program doesnt run faster than the version we developed in
previous section, which doubles the array in advance as working area. In my
machine, it is about 60% slower when sorting 100,000 random numbers due to
many swap operations.
13.9.3 In-place merge sort V.S. linked-list merge sort
The in-place merge sort is still a live area for research. In order to save the
extra spaces for merging, some overhead has be introduced, which increases
the complexity of the merge sort algorithm. However, if the underlying data
structure isnt array, but linked-list, merge can be achieved without any extra
spaces as shown in the even-odd functional merge sort algorithm presented in
previous section.
In order to make it clearer, we can develop a purely imperative linked-list
merge sort solution. The linked-list can be dened as a record type as shown in
appendix A like below.
struct Node {
Key key;
struct Node next;
};
We can dene an auxiliary function for node linking. Assume the list to be
linked isnt empty, it can be implemented as the following.
struct Node link(struct Node xs, struct Node ys) {
xsnext = ys;
return xs;
}
One method to realize the imperative even-odd splitting, is to initialize two
empty sub lists. Then iterate the list to be split. Every time, we link the
current node in front of the rst sub list, then exchange the two sub lists. So
that, the second sub list will be linked at the next time iteration. This idea can
be illustrated as below.
1: function Split(L)
2: (A, B) (, )
3: while L = do
4: p L
5: L Next(L)
544CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
6: A Link(p, A)
7: Exchange A B
8: return (A, B)
The following example ANSI C program implements this splitting algorithm
embedded.
struct Node msort(struct Node xs) {
struct Node p, as, bs;
if (!xs | | !xsnext) return xs;
as = bs = NULL;
while(xs) {
p = xs;
xs = xsnext;
as = link(p, as);
swap(as, bs);
}
as = msort(as);
bs = msort(bs);
return merge(as, bs);
}
The only thing left is to develop the imperative merging algorithm for linked-
list. The idea is quite similar to the array merging version. As long as neither of
the sub lists is exhausted, we pick the less one, and append it to the result list.
After that, it just need link the non-empty one to the tail the result, but not
a looping for copying. It needs some carefulness to initialize the result list, as
its head node is the less one among the two sub lists. One simple method is to
use a dummy sentinel head, and drop it before returning. This implementation
detail can be given as the following.
struct Node merge(struct Node as, struct Node bs) {
struct Node s, p;
p = &s;
while (as && bs) {
if (askey < bskey) {
link(p, as);
as = asnext;
}
else {
link(p, bs);
bs = bsnext;
}
p = pnext;
}
if (as)
link(p, as);
if (bs)
link(p, bs);
return s.next;
}
Exercise 13.5
13.10. NATURE MERGE SORT 545
Proof the performance of in-place merge sort is bound to O(nlg n).
13.10 Nature merge sort
Knuth gives another way to interpret the idea of divide and conquer merge sort.
It just likes burn a candle in both ends [1]. This leads to the nature merge sort
algorithm.
Figure 13.16: Burn a candle from both ends
For any given sequence, we can always nd a non-decreasing sub sequence
starts at any position. One particular case is that we can nd such a sub
sequence from the left-most position. The following table list some examples,
the non-decreasing sub sequences are in bold font.
15 , 0, 4, 3, 5, 2, 7, 1, 12, 14, 13, 8, 9, 6, 10, 11
8, 12, 14 , 0, 1, 4, 11, 2, 3, 5, 9, 13, 10, 6, 15, 7
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
The rst row in the table illustrates the worst case, that the second element
is less than the rst one, so the non-decreasing sub sequence is a singleton list,
which only contains the rst element; The last row shows the best case, the the
sequence is ordered, and the non-decreasing list is the whole; The second row
shows the average case.
Symmetrically, we can always nd a non-decreasing sub sequence from the
end of the sequence to the left. This indicates us that we can merge the two non-
decreasing sub sequences, one from the beginning, the other form the ending
to a longer sorted sequence. The advantage of this idea is that, we utilize the
nature ordered sub sequences, so that we neednt recursive sorting at all.
Figure 13.17 illustrates this idea. We starts the algorithm by scanning from
both ends, nding the longest non-decreasing sub sequences respectively. After
that, these two sub sequences are merged to the working area. The merged
result starts from beginning. Next we repeat this step, which goes on scanning
toward the center of the original sequence. This time we merge the two ordered
sub sequences to the right hand of the working area toward the left. Such setup
is easy for the next round of scanning. When all the elements in the original
sequence have been scanned and merged to the target, we switch to use the
elements stored in the working area for sorting, and use the previous sequence
as new working area. Such switching happens repeatedly in each round. Finally,
we copy all elements from the working area to the original array if necessary.
546CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
8, 12, 14 0, 1, 4, 11 2, 3, 5 9 13, 10, 6 15, 7
7, 8, 12, 14, 15 ... free cells ... 13, 11, 10, 6, 4, 1, 0
merge merge
Figure 13.17: Nature merge sort
The only question left is when this algorithm stops. The answer is that when
we start a new round of scanning, and nd that the longest non-decreasing sub
list spans to the end, which means the whole list is ordered, the sorting is done.
Because this kind of merge sort proceeds the target sequence in two ways,
and uses the nature ordering of sub sequences, its named nature two-way merge
sort. In order to realize it, some carefulness must be paid. Figure 13.18 shows
the invariant during the nature merge sort. At anytime, all elements before
marker a and after marker d have been already scanned and merged. We are
trying to span the non-decreasing sub sequence [a, b) as long as possible, at the
same time, we span the sub sequence from right to left to span [c, d) as long as
possible as well. The invariant for the working area is shown in the second row.
All elements before f and after r have already been sorted. (Note that they
may contain several ordered sub sequences), For the odd times (1, 3, 5, ...), we
merge [a, b) and [c, d) from f toword right; while for the even times (2, 4, 6, ...),
we merge the two sorted sub sequences after r toward left.
... scanned ... ... span [a, b) ... ... ? ... ... span [c, d) ... ... scanned ...
f
... merged ... ... unused free cells ... ... merged ...
a b c d
r
Figure 13.18: Invariant during nature merge sort
For imperative realization, the sequence is represented by array. Before
13.10. NATURE MERGE SORT 547
sorting starts, we duplicate the array to create a working area. The pointers
a, b are initialized to point the left most position, while c, d point to the right
most position. Pointer f starts by pointing to the front of the working area,
and r points to the rear position.
1: function Sort(A)
2: if |A| > 1 then
3: n |A|
4: B Create-Array(n) Create the working area
5: loop
6: [a, b) [1, 1)
7: [c, d) [n + 1, n + 1)
8: f 1, r n front and rear pointers to the working area
9: t False merge to front or rear
10: while b < c do There are still elements for scan
11: repeat Span [a, b)
12: b b + 1
13: until b c A[b] < A[b 1]
14: repeat Span [c, d)
15: c c 1
16: until c b A[c 1] < A[c]
17: if c < b then Avoid overlap
18: c b
19: if b a n then Done if [a, b) spans to the whole array
20: return A
21: if t then merge to front
22: f Merge(A, [a, b), [c, d), B, f, 1)
23: else merge to rear
24: r Merge(A, [a, b), [c, d), B, r, 1)
25: a b, d c
26: t t Switch the merge direction
27: Exchange A B Switch working area
28: return A
The merge algorithm is almost as same as before except that we need pass
a parameter to indicate the direction for merging.
1: function Merge(A, [a, b), [c, d), B, w, )
2: while a < b c < d do
3: if A[a] < A[d 1] then
4: B[w] A[a]
5: a a + 1
6: else
7: B[w] A[d 1]
8: d d 1
9: w w +
10: while a < b do
11: B[w] A[a]
12: a a + 1
13: w w +
14: while c < d do
548CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
15: B[w] A[d 1]
16: d d 1
17: w w +
18: return w
The following ANSI C program implements this two-way nature merge sort
algorithm. Note that it doesnt release the allocated working area explictly.
int merge(Key xs, int a, int b, int c, int d, Key ys, int k, int delta) {
for(; a < b && c < d; k += delta )
ys[k] = xs[a] < xs[d-1] ? xs[a++] : xs[--d];
for(; a < b; k += delta)
ys[k] = xs[a++];
for(; c < d; k += delta)
ys[k] = xs[--d];
return k;
}
Key sort(Key xs, Key ys, int n) {
int a, b, c, d, f, r, t;
if(n < 2)
return xs;
for(;;) {
a = b = 0;
c = d = n;
f = 0;
r = n-1;
t = 1;
while(b < c) {
do { / span [a, b) as much as possible /
++b;
} while( b < c && xs[b-1] xs[b] );
do{ / span [c, d) as much as possible /
--c;
} while( b < c && xs[c] xs[c-1] );
if( c < b )
c = b; / eliminate overlap if any /
if( b - a n)
return xs; / sorted /
if( t )
f = merge(xs, a, b, c, d, ys, f, 1);
else
r = merge(xs, a, b, c, d, ys, r, -1);
a = b;
d = c;
t = !t;
}
swap(&xs, &ys);
}
return xs; /cantbehere/
}
The performance of nature merge sort depends on the actual ordering of
the sub arrays. However, it in fact performs well even in the worst case. Sup-
pose that we are unlucky when scanning the array, that the length of the non-
13.10. NATURE MERGE SORT 549
decreasing sub arrays are always 1 during the rst round scan. This leads to
the result working area with merged ordered sub arrays of length 2. Suppose
that we are unlucky again in the second round of scan, however, the previous
results ensure that the non-decreasing sub arrays in this round are no shorter
than 2, this time, the working area will be lled with merged ordered sub arrays
of length 4, ... Repeat this we get the length of the non-decreasing sub arrays
doubled in every round, so there are at most O(lg n) rounds, and in every round
we scanned all the elements. The overall performance for this worst case is
bound to O(nlg n). Well go back to this interesting phenomena in the next
section about bottom-up merge sort.
In purely functional settings however, its not sensible to scan list from both
ends since the underlying data structure is singly linked-list. The nature merge
sort can be realized in another approach.
Observe that the list to be sorted is consist of several non-decreasing sub
lists, that we can pick every two of such sub lists and merge them to a bigger
one. We repeatedly pick and merge, so that the number of the non-decreasing
sub lists halves continuously and nally there is only one such list, which is the
sorted result. This idea can be formalized in the following equation.
sort(L) = sort

(group(L)) (13.35)
Where function group(L) groups the list into non-decreasing sub lists. This
function can be described like below, the rst two are trivial edge cases.
If the list is empty, the result is a list contains an empty list;
If there is only one element in the list, the result is a list contains a
singleton list;
Otherwise, The rst two elements are compared, if the rst one is less
than or equal to the second, it is linked in front of the rst sub list of the
recursive grouping result; or a singleton list contains the rst element is
set as the rst sub list before the recursive result.
group(L) =
_
_
_
{L} : |L| 1
{{l
1
} L
1
, L
2
, ...} : l
1
l
2
, {L
1
, L
2
, ...} = group(L

)
{{l
1
}, L
1
, L
2
, ...} : otherwise
(13.36)
Its quite possible to abstract the grouping criteria as a parameter to develop
a generic grouping function, for instance, as the following Haskell code
5
.
groupBy :: (aaBool) [a] [[a]]
groupBy _ [] = [[]]
groupBy _ [x] = [[x]]
groupBy f (x:xs@(x:_)) | f x x = (x:ys):yss
| otherwise = [x]:r
5
There is a groupBy function provided in the Haskell standard library Data.List. How-
ever, it doesnt t here, because it accepts an equality testing function as parameter, which
must satisfy the properties of reexive, transitive, and symmetric. but what we use here, the
less-than or equal to operation doesnt conform to transitive. Refer to appendix A of this
book for detail.
550CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
where
r@(ys:yss) = groupBy f xs
Dierent from the sort function, which sorts a list of elements, function sort

accepts a list of sub lists which is the result of grouping.


sort

(L) =
_
_
_
: L =
L
1
: L = {L
1
}
sort

(mergePairs(L)) : otherwise
(13.37)
The rst two are the trivial edge cases. If the list to be sorted is empty, the
result is obviously empty; If it contains only one sub list, then we are done. We
need just extract this single sub list as result; For the recursive case, we call a
function mergePairs to merge every two sub lists, then recursively call sort

.
The next undened function is mergePairs, as the name indicates, it re-
peatedly merges pairs of non-decreasing sub lists into bigger ones.
mergePairs(L) =
_
L : |L| 1
{merge(L
1
, L
2
)} mergePairs(L

) : otherwise
(13.38)
When there are less than two sub lists in the list, we are done; otherwise, we
merge the rst two sub lists L
1
and L
2
, and recursively merge the rest of pairs
in L

. The type of the result of mergePairs is list of lists, however, it will be


attened by sort

function nally.
The merge function is as same as before. The complete example Haskell
program is given as below.
mergesort = sort groupBy ()
sort [] = []
sort [xs] = xs
sort xss = sort (mergePairs xss) where
mergePairs (xs:ys:xss) = merge xs ys : mergePairs xss
mergePairs xss = xss
Alternatively, observing that we can rst pick two sub lists, merge them to
an intermediate result, then repeatedly pick next sub list, and merge to this
ordered result weve gotten so far until all the rest sub lists are merged. This is
a typical folding algorithm as introduced in appendix A.
sort(L) = fold(merge, , group(L)) (13.39)
Translate this version to Haskell yields the folding version.
mergesort = foldl merge [] groupBy ()
Exercise 13.6
Is the nature merge sort algorithm realized by folding is equivalent with
the one by using mergePairs in terms of performance? If yes, prove it; If
not, which one is faster?
13.11. BOTTOM-UP MERGE SORT 551
13.11 Bottom-up merge sort
The worst case analysis for nature merge sort raises an interesting topic, instead
of realizing merge sort in top-down manner, we can develop a bottom-up version.
The great advantage is that, we neednt do book keeping any more, so the
algorithm is quite friendly for purely iterative implementation.
The idea of bottom-up merge sort is to turn the sequence to be sorted into n
small sub sequences each contains only one element. Then we merge every two
of such small sub sequences, so that we get
n
2
ordered sub sequences each with
length 2; If n is odd number, we left the last singleton sequence untouched. We
repeatedly merge these pairs, and nally we get the sorted result. Knuth names
this variant as straight two-way merge sort [1]. The bottom-up merge sort is
illustrated in gure 13.19
...
... ...
...


Figure 13.19: Bottom-up merge sort
Dierent with the basic version and even-odd version, we neednt explicitly
split the list to be sorted in every recursion. The whole list is split into n
singletons at the very beginning, and we merge these sub lists in the rest of the
algorithm.
sort(L) = sort

(wraps(L)) (13.40)
wraps(L) =
_
: L =
{{l
1
}} wraps(L

) : otherwise
(13.41)
Of course wraps can be implemented by using mapping as introduced in
appendix A.
sort(L) = sort

(map(
x
{x}, L)) (13.42)
552CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
We reuse the function sort

and mergePairs which are dened in section of


nature merge sort. They repeatedly merge pairs of sub lists until there is only
one.
Implement this version in Haskell gives the following example code.
sort = sort map (x[x])
This version is based on what Okasaki presented in [6]. It is quite similar to
the nature merge sort only diers in the way of grouping. Actually, it can be
deduced as a special case (the worst case) of nature merge sort by the following
equation.
sort(L) = sort

(groupBy(
x,y
False, L)) (13.43)
That instead of spanning the non-decreasing sub list as long as possible, the
predicate always evaluates to false, so the sub list spans only one element.
Similar with nature merge sort, bottom-up merge sort can also be dened
by folding. The detailed implementation is left as exercise to the reader.
Observing the bottom-up sort, we can nd its in tail-recursion call manner,
thus its quite easy to translate into purely iterative algorithm without any
recursion.
1: function Sort(A)
2: B
3: for a A do
4: B Append({a})
5: N |B|
6: while N > 1 do
7: for i from 1 to
N
2
do
8: B[i] Merge(B[2i 1], B[2i])
9: if Odd(N) then
10: B[
N
2
] B[N]
11: N
N
2

12: if B = then
13: return
14: return B[1]
The following example Python program implements the purely iterative
bottom-up merge sort.
def mergesort(xs):
ys = [[x] for x in xs]
while len(ys) > 1:
ys.append(merge(ys.pop(0), ys.pop(0)))
return [] if ys == [] else ys.pop()
def merge(xs, ys):
zs = []
while xs != [] and ys !=[]:
zs.append(xs.pop(0) if xs[0] < ys[0] else ys.pop(0))
return zs + (xs if xs !=[] else ys)
The Python implementation exploit the fact that instead of starting next
round of merging after all pairs have been merged, we can combine these rounds
13.12. PARALLELISM 553
of merging by consuming the pair of lists on the head, and appending the merged
result to the tail. This greatly simply the logic of handling odd sub lists case as
shown in the above pseudo code.
Exercise 13.7
Implement the functional bottom-up merge sort by using folding.
Implement the iterative bottom-up merge sort only with array indexing.
Dont use any library supported tools, such as list, vector etc.
13.12 Parallelism
We mentioned in the basic version of quick sort, that the two sub sequences
can be sorted in parallel after the divide phase nished. This strategy is also
applicable for merge sort. Actually, the parallel version quick sort and morege
sort, do not only distribute the recursive sub sequences sorting into two parallel
processes, but divide the sequences into p sub sequences, where p is the number
of processors. Idealy, if we can achieve sorting in T

time with parallelism,


which satisies O(nlg n) = pT

. We say it is linear speed up, and the algorithm


is parallel optimal.
However, a straightforward parallel extension to the sequential quick sort
algorithm which samples several pivots, divides p sub sequences, and indepen-
dantly sorts them in parallel, isnt optimal. The bottleneck exists in the divide
phase, which we can only achive O(n) time in average case.
The straightforward parallel extention to merge sort, on the other hand,
block at the merge phase. Both parallel merge sort and quick sort in practice
need good designes in order to achieve the optimal speed up. Actually, the
divide and conqure nature makes merge sort and quick sort relative easy for
parallelisim. Richard Cole found the O(lg n) parallel merge sort algorithm with
n processors in 1986 in [13].
Parallelism is a big and complex topic which is out of the scope of this
elementary book. Readers can refer to [13] and [14] for details.
13.13 Short summary
In this chapter, two popular divide and conquer sorting methods, quick sort
and merge sort are introduced. Both of them meet the upper performance
limit of the comparison based sorting algorithms O(nlg n). Sedgewick said
that quick sort is the greatest algorithm invented in the 20th century. Almost
all programming environments adopt quick sort as the default sorting tool. As
time goes on, some environments, especially those manipulate abstract sequence
which is dynamic and not based on pure array switch to merge sort as the general
purpose sorting tool
6
.
The reason for this interesting phenomena can be partly explained by the
treatment in this chapter. That quick sort performs perfectly in most cases,
6
Actually, most of them are kind of hybrid sort, balanced with insertion sort to achieve
good performance when the sequence is short
554CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT V.S. MERGE SORT
it needs fewer swapping than most other algorithms. However, the quick sort
algorithm is based on swapping, in purely functional settings, swapping isnt
the most ecient way due to the underlying data structure is singly linked-list,
but not vectorized array. Merge sort, on the other hand, is friendly in such
environment, as it costs constant spaces, and the performance can be ensured
even in the worst case of quick sort, while the latter downgrade to quadratic
time. However, merge sort doesnt performs as well as quick sort in purely im-
perative settings with arrays. It either needs extra spaces for merging, which is
sometimes unreasonable, for example in embedded system with limited memory,
or causes many overhead swaps by in-place workaround. In-place merging is till
an active research area.
Although the title of this chapter is quick sort V.S. merge sort, its not
the case that one algorithm has nothing to do with the other. Quick sort can
be viewed as the optimized version of tree sort as explained in this chapter.
Similarly, merge sort can also be deduced from tree sort as shown in [12].
There are many ways to categorize sorting algorithms, such as in [1]. One
way is to from the point of view of easy/hard partition, and easy/hard merge
[7].
Quick sort, for example, is quite easy for merging, because all the elements
in the sub sequence before the pivot are no greater than any one after the pivot.
The merging for quick sort is actually trivial sequence concatenation.
Merge sort, on the other hand, is more complex in merging than quick sort.
However, its quite easy to divide no matter what concrete divide method is
taken: simple divide at the middle point, even-odd splitting, nature splitting,
or bottom-up straight splitting. Compare to merge sort, its more dicult for
quick sort to achieve a perfect dividing. We show that in theory, the worst
case cant be completely avoided, no matter what engineering practice is taken,
median-of-three, random quick sort, 3-way partition etc.
Weve shown some elementary sorting algorithms in this book till this chap-
ter, including insertion sort, tree sort, selection sort, heap sort, quick sort and
merge sort. Sorting is still a hot research area in computer science. At the
time when I this chapter is written, people are challenged by the buzz word big
data, that the traditional convenient method cant handle more and more huge
data within reasonable time and resources. Sorting a sequence of hundreds of
Gigabytes becomes a routine in some elds.
Exercise 13.8
Design an algorithm to create binary search tree by using merge sort
strategy.
Bibliography
[1] Donald E. Knuth. The Art of Computer Programming, Volume 3: Sorting
and Searching (2nd Edition). Addison-Wesley Professional; 2 edition (May
4, 1998) ISBN-10: 0201896850 ISBN-13: 978-0201896855
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[3] Robert Sedgewick. Implementing quick sort programs. Communication
of ACM. Volume 21, Number 10. 1978. pp.847 - 857.
[4] Jon Bentley. Programming pearls, Second Edition. Addison-Wesley Pro-
fessional; 1999. ISBN-13: 978-0201657883
[5] Jon Bentley, Douglas McIlroy. Engineering a sort function. Software
Practice and experience VOL. 23(11), 1249-1265 1993.
[6] Robert Sedgewick, Jon Bentley. Quicksort is optimal.
http://www.cs.princeton.edu/ rs/talks/QuicksortIsOptimal.pdf
[7] Richard Bird. Pearls of functional algorithm design. Cambridge Univer-
sity Press. 2010. ISBN, 1139490605, 9781139490603
[8] Fethi Rabhi, Guy Lapalme. Algorithms: a functional programming ap-
proach. Second edition. Addison-Wesley, 1999. ISBN: 0201-59604-0
[9] Simon Peyton Jones. The Implementation of functional programming lan-
guages. Prentice-Hall International, 1987. ISBN: 0-13-453333-X
[10] Jyrki Katajainen, Tomi Pasanen, Jukka Teuhola. Practical in-place merge-
sort. Nordic Journal of Computing, 1996.
[11] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[12] Jos`e Bacelar Almeida and Jorge Sousa Pinto. Deriving Sorting Algo-
rithms. Technical report, Data structures and Algorithms. 2008.
[13] Cole, Richard (August 1988). Parallel merge sort. SIAM J. Comput. 17
(4): 770C785. doi:10.1137/0217049. (August 1988)
[14] Powers, David M. W. Parallelized Quicksort and Radixsort with Optimal
Speedup, Proceedings of International Conference on Parallel Computing
Technologies. Novosibirsk. 1991.
555
556 Searching
[15] Wikipedia. Quicksort. http://en.wikipedia.org/wiki/Quicksort
[16] Wikipedia. Strict weak order. http://en.wikipedia.org/wiki/Strict weak order
[17] Wikipedia. Total order. http://en.wokipedia.org/wiki/Total order
[18] Wikipedia. Harmonic series (mathematics).
http://en.wikipedia.org/wiki/Harmonic series (mathematics)
Searching
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
Chapter 14
Searching
14.1 Introduction
Searching is quite a big and important area. Computer makes many hard search-
ing problems realistic. They are almost impossible for human begins. A modern
industry robot can even search and pick the correct gadget from the pipeline for
assembly; A GPS car navigator can search among the map, for the best route
to a specic place. The modern mobile phone is not only equipped with such
map navigator, but it can also search for the best price for Internet shopping.
This chapter just scratches the surface of elementary searching. One good
thing that computer oers is the brute-force scanning for a certain result in a
large sequence. The divide and conquer search strategy will be briefed with two
problems, one is to nd the k-th big one among a list of unsorted elements; the
other is the popular binary search among a list of sorted elements. Well also
introduce the extension of binary search for multiple-dimension data.
Text matching is also very important in our daily life, two well-known search-
ing algorithms, Knuth-Morris-Pratt (KMP) and Boyer-Moore algorithms will be
introduced. They set good examples for another searching strategy: information
reusing.
Besides sequence search, some elementary methods for searching solution for
some interesting problems will be introduced. They were mostly well studied
in the early phase of AI (articial intelligence), including the basic DFS (Depth
rst search), and BFS (Breadth rst search).
Finally, Dynamic programming will be briefed for searching optimal solu-
tions, and well also introduce about greedy algorithm which is applicable for
some special cases.
All algorithms will be realized in both imperative and functional approaches.
14.2 Sequence search
Although modern computer oers fast speed for brute-force searching, and even
if the Moores law could be strictly followed, the grows of huge data is too fast
to be handled well in this way. Weve seen a vivid example in the introduction
chapter of this book. Its why people study the computer search algorithms.
557
558 CHAPTER 14. SEARCHING
14.2.1 Divide and conquer search
One solution is to use divide and conquer approach. That if we can repeatedly
scale down the search domain, the data being dropped neednt be examined at
all. This will denitely speed up the search.
k-selection problem
Consider a problem of nding the k-th smallest one among n elements. The
most straightforward idea is to nd the minimum rst, then drop it and nd
the second minimum element among the rest. Repeat this minimum nding and
dropping k steps will give the k-th smallest one. Finding the minimum among
n elements costs linear O(n) time. Thus this method performs O(kn) time, if k
is much smaller than n.
Another method is to use the heap data structure weve introduced. No
matter what concrete heap is used, e.g. binary heap with implicit array, Fi-
bonacci heap or others, Accessing the top element followed by popping is typ-
ically bound O(lg n) time. Thus this method, as formalized in equation (14.1)
and (14.2) performs in O(k lg n) time, if k is much smaller than n.
top(k, L) = find(k, heapify(L)) (14.1)
find(k, H) =
_
top(H) : k = 0
find(k 1, pop(H)) : otherwise
(14.2)
However, heap adds some complexity to the solution. Is there any simple,
fast method to nd the k-th element?
The divide and conquer strategy can help us. If we can divide all the elements
into two sub lists A and B, and ensure all the elements in A is not greater than
any elements in B, we can scale down the problem by following this method
1
:
1. Compare the length of sub list A and k;
2. If k < |A|, the k-th smallest one must be contained in A, we can drop B
and further search in A;
3. If |A| < k, the k-th smallest one must be contained in B, we can drop A
and further search the (k |A|)-th smallest one in B.
Note that the italic font emphasizes the fact of recursion. The ideal case
always divides the list into two equally big sub lists A and B, so that we can
halve the problem each time. Such ideal case leads to a performance of O(n)
linear time.
Thus the key problem is how to realize dividing, which collects the rst m
smallest elements in one sub list, and put the rest in another.
This reminds us the partition algorithm in quick sort, which moves all the
elements smaller than the pivot in front of it, and moves those greater than
the pivot behind it. Based on this idea, we can develop a divide and conquer
k-selection algorithm, which is called quick selection algorithm.
1
This actually demands a more accurate denition of the k-th smallest in L: Its equal to
the k-the element of L

, where L

is a permutation of L, and L

is in monotonic non-decreasing
order.
14.2. SEQUENCE SEARCH 559
1. Randomly select an element (the rst for instance) as the pivot;
2. Moves all elements which arent greater than the pivot in a sub list A; and
moves the rest to sub list B;
3. Compare the length of A with k, if |A| = k 1, then the pivot is the k-th
smallest one;
4. If |A| > k 1, recursively nd the k-th smallest one among A;
5. Otherwise, recursively nd the (k |A|)-th smallest one among B;
This algorithm can be formalized in below equation. Suppose 0 < k |L|
, where L is a non-empty list of elements. Denote l
1
as the rst element in L.
It is chosen as the pivot; L

contains the rest elements except for l


1
. (A, B) =
partition(
x
x l
1
, L

). It partitions L

by using the same algorithm dened


in the chapter of quick sort.
top(k, L) =
_
_
_
l
1
: |A| = k 1
top(k 1 |A|, B) : |A| < k 1
top(k, A) : otherwise
(14.3)
partition(p, L) =
_
_
_
(, ) : L =
({l
1
} A, B) : p(l
1
), (A, B) = partition(p, L

)
(A, {l
1
} B) : p(l
1
)
(14.4)
The following Haskell example program implements this algorithm.
top n (x:xs) | len == n - 1 = x
| len < n - 1 = top (n - len - 1) bs
| otherwise = top n as
where
(as, bs) = partition (x) xs
len = length as
The partition function is provided in Haskell standard library, the detailed
implementation can be referred to previous chapter about quick sort.
The lucky case is that, the k-th smallest element is selected as the pivot at
the very beginning. The partition function examines the whole list, and nds
that there are k 1 elements not greater than the pivot, we are done in just
O(n) time. The worst case is that either the maximum or the minimum element
is selected as the pivot every time. The partition always produces an empty sub
list, that either A or B is empty. If we always pick the minimum as the pivot,
the performance is bound to O(kn). If we always pick the maximum as the
pivot, the performance is O((n k)n). If k is much less than n, it downgrades
to quadratic O(n
2
) time.
The best case (not the lucky case), is that the pivot always partition the
list perfectly. The length of A is nearly as same as the length of B. The list
is halved every time. It needs about O(lg n) partitions, each partition takes
linear time proportion to the length of the halved list. This can be expressed
as O(n +
n
2
+
n
4
+ ... +
n
2
m
), where m is the smallest number satises
n
2
m
< k.
Summing the series leads to the result of O(n).
560 CHAPTER 14. SEARCHING
The average case analysis needs tool of mathematical expectation. Its quite
similar to the proof given in previous chapter of quick sort. Its left as an exercise
to the reader.
Similar as quick sort, this divide and conquer selection algorithm performs
well most time in practice. We can take the same engineering practice such as
media-of-three, or randomly select the pivot as we did for quick sort. Below is
the imperative realization for example.
1: function Top(k, A, l, u)
2: Exchange A[l] A[ Random(l, u) ] Randomly select in [l, u]
3: p Partition(A, l, u)
4: if p l + 1 = k then
5: return A[p]
6: if k < p l + 1 then
7: return Top(k, A, l, p 1)
8: return Top(k p +l 1, A, p + 1, u)
This algorithm searches the k-th smallest element in range of [l, u] for array
A. The boundaries are included. It rst randomly selects a position, and swaps
it with the rst one. Then this element is chosen as the pivot for partitioning.
The partition algorithm in-place moves elements and returns the position where
the pivot being moved. If the pivot is just located at position k, then we are
done; if there are more than k 1 elements not greater than the pivot, the
algorithm recursively searches the k-th smallest one in range [l, p1]; otherwise,
k is deduced by the number of elements before the pivot, and recursively searches
the range after the pivot [p + 1, u].
There are many methods to realize the partition algorithm, below one is
based on N. Lumotos method. Other realizations are left as exercises to the
reader.
1: function Partition(A, l, u)
2: p A[l]
3: L l
4: for R l + 1 to u do
5: if (p < A[R]) then
6: L L + 1
7: Exchange A[L] A[R]
8: Exchange A[L] p
9: return L
Below ANSI C example program implements this algorithm. Note that it
handles the special case that either the array is empty, or k is out of the bound-
aries of the array. It returns -1 to indicate the search failure.
int partition(Key xs, int l, int u) {
int r, p = l;
for (r = l + 1; r < u; ++r)
if (!(xs[p] < xs[r]))
swap(xs, ++l, r);
swap(xs, p, l);
return l;
}
/ The result is stored in xs[k], returns k if u-l k, otherwise -1 /
14.2. SEQUENCE SEARCH 561
int top(int k, Key xs, int l, int u) {
int p;
if (l < u) {
swap(xs, l, rand() % (u - l) + l);
p = partition(xs, l, u);
if (p - l + 1 == k)
return p;
return (k < p - l + 1) ? top(k, xs, l, p) :
top(k- p + l - 1, xs, p + 1, u);
}
return -1;
}
There is a method proposed by Blum, Floyd, Pratt, Rivest and Tarjan in
1973, which ensures the worst case performance being bound to O(n) [2], [3].
It divides the list into small groups. Each group contains no more than 5
elements. The median of each group among these 5 elements are identied
quickly. Then there are
n
5
median elements selected. We repeat this step, and
divide them again into groups of 5, and recursively select the median of median.
Its obviously that the nal true median can be found in O(lg n) time. This is
the best pivot for partitioning the list. Next, we halve the list by this pivot and
recursively search for the k-th smallest one. The performance can be calculated
as the following.
T(n) = c
1
lgn +c
2
n +T(
n
2
) (14.5)
Where c
1
and c
2
are constant factors for the median of median and partition
computation respectively. Solving this equation with telescope method or the
master theory in [2] gives the linear O(n) performance. The detailed algorithm
realization is left as exercise to the reader.
In case we just want to pick the top k smallest elements, but dont care
about the order of them, the algorithm can be adjusted a little bit to t.
tops(k, L) =
_

_
: k = 0 L =
A : |A| = k
A {l
1
} tops(k |A| 1, B) : |A| < k
tops(k, A) : otherwise
(14.6)
Where A, B have the same meaning as before that, (A, B) = partition(
x

x l
1
, L

) if L isnt empty. The relative example program in Haskell is given


as below.
tops _ [] = []
tops 0 _ = []
tops n (x:xs) | len ==n = as
| len < n = as ++ [x] ++ tops (n-len-1) bs
| otherwise = tops n as
where
(as, bs) = partition ( x) xs
len = length as
562 CHAPTER 14. SEARCHING
binary search
Another popular divide and conquer algorithm is binary search. Weve shown
it in the chapter about insertion sort. When I was in school, the teacher who
taught math played a magic to me, He asked me to consider a natural number
less than 1000. Then he asked me some questions, I only replied yes or no, and
nally he guessed my number. He typically asked questions like the following:
Is it an even number?
Is it a prime number?
Are all digits same?
Can it be divided by 3?
...
Most of the time he guessed the number within 10 questions. My classmates
and I all thought its unbelievable.
This game will not be so interesting if it downgrades to a popular TV pro-
gram, that the price of a product is hidden, and you must gure out the exact
price in 30 seconds. The host of the program tells you if your guess is higher
or lower to the fact. If you win, the product is yours. The best strategy is to
use similar divide and conquer approach to perform a binary search. So its
common to nd such conversation between the player and the host:
P: 1000;
H: High;
P: 500;
H: Low;
P: 750;
H: Low;
P: 890;
H: Low;
P: 990;
H: Bingo.
My math teacher told us that, because the number we considered is within
1000, if he can halve the numbers every time by designing good questions, the
number will be found in 10 questions. This is because 2
10
= 1024 > 1000.
However, it would be boring to just ask it is higher than 500, is lower than 250,
... Actually, the question is it even is very good, because it always halve the
numbers.
Come back to the binary search algorithm. It is only applicable to a sequence
of ordered number. Ive seen programmers tried to apply it to unsorted array,
and took several hours to gure out why it doesnt work. The idea is quite
14.2. SEQUENCE SEARCH 563
straightforward, in order to nd a number x in an ordered sequence A, we
rstly check middle point number, compare it with x, if they are same, then
we are done; If x is smaller, as A is ordered, we need only recursively search it
among the rst half; otherwise we search it among the second half. Once A gets
empty and we havent found x yet, it means x doesnt exist.
Before formalizing this algorithm, there is a surprising fact need to be noted.
Donald Knuth stated that Although the basic idea of binary search is compar-
atively straightforward, the details can be surprisingly tricky. Jon Bentley
pointed out that most binary search implementation contains errors, and even
the one given by him in the rst version of Programming pearls contains an
error undetected over twenty years [4].
There are two kinds of realization, one is recursive, the other is iterative.
The recursive solution is as same as what we described. Suppose the lower and
upper boundaries of the array are l and u inclusive.
1: function Binary-Search(x, A, l, u)
2: if u < l then
3: Not found error
4: else
5: m l +
ul
2
avoid overow of
l+u
2

6: if A[m] = x then
7: return m
8: if x < A[m] then
9: return Binary-Search(x, A, l, m - 1)
10: else
11: return Binary-Search(x, A, m + 1, u)
As the comment highlights, if the integer is represented with limited words,
we cant merely use
l+u
2
because it may cause overow if l and u are big.
Binary search can also be realized in iterative manner, that we keep updating
the boundaries according to the middle point comparison result.
1: function Binary-Search(x, A, l, u)
2: while l < u do
3: m l +
ul
2

4: if A[m] = x then
5: return m
6: if x < A[m] then
7: u m1
8: else
9: l m+ 1
return NIL
The implementation is very good exercise, we left it to the reader. Please
try all kinds of methods to verify your program.
Since the array is halved every time, the performance of binary search is
bound to O(lg n) time.
In purely functional settings, the list is represented with singly linked-list.
Its linear time to randomly access the element for a given position. Binary
search doesnt make sense in such case. However, it good to analyze what the
performance will downgrade to. Consider the following equation.
564 CHAPTER 14. SEARCHING
bsearch(x, L) =
_

_
Err : L =
b
1
: x = b
1
, (A, B) = splitAt(
|L|
2
, L)
bsearch(x, A) : B = x < b
1
bsearch(x, B

) : otherwise
Where b
1
is the rst element if B isnt empty, and B

holds the rest except


for b
1
. The splitAt function takes O(n) time to divide the list into two subs A
and B (see the appendix A, and the chapter about merge sort for detail). If B
isnt empty and x is equal to b
1
, the search returns; Otherwise if it is less than
b
1
, as the list is sorted, we need recursively search in A, otherwise, we search in
B. If the list is empty, we raise error to indicate search failure.
As we always split the list in the middle point, the number of elements halves
in each recursion. In every recursive call, we takes linear time for splitting. The
splitting function only traverses the rst half of the linked-list, Thus the total
time can be expressed as.
T(n) = c
n
2
+c
n
4
+c
n
8
+...
This results O(n) time, which is as same as the brute force search from head
to tail:
search(x, L) =
_
_
_
Err : L =
l
1
: x = l
1
search(x, L

) : otherwise
As we mentioned in the chapter about insertion sort, the functional approach
of binary search is through binary search tree. That the ordered sequence is
represented in a tree ( self balanced tree if necessary), which oers logarithm
time searching
2
.
Although it doesnt make sense to apply divide and conquer binary sort on
linked-list, binary search can still be very useful in purely functional settings.
Consider solving an equation a
x
= y, for given natural numbers a and y, where
a y. We want to nd the integer solution for x if there is. Of course brute-
force naive searching can solve it. We can examine all numbers one by one
from 0 for a
0
, a
1
, a
2
, ..., stops if a
i
= y or report that there is no solution if
a
i
< y < a
i+1
for some i. We initialize the solution domain as X = {0, 1, 2, ...},
and call the below exhausted searching function solve(a, y, X).
solve(a, y, X) =
_
_
_
x
1
: a
x
1
= y
solve(a, y, X

) : a
x
1
< y
Err : otherwise
This function examines the solution domain in monotonic increasing order.
It takes the rst candidate element x
1
from X, compare a
x
1
and y, if they are
equal, then x
1
is the solution and we are done; if it is less than y, then x
1
is
dropped, and we search among the rest elements represented as X

; Otherwise,
since f(x) = a
x
is non-decreasing function when a is natural number, so the rest
2
Some readers may argue that array should be used instead of linked-list, for example in
Haskell. This book only deals with purely functional sequences in nger-tree. Dierent from
the Haskell array, it cant support constant time random accessing
14.2. SEQUENCE SEARCH 565
elements will only make f(x) bigger and bigger. There is no integer solution for
this equation. The function returns error to indicate no solution.
The computation of a
x
is expensive for big a and x if precession must be
kept
3
. Can it be improved so that we can compute as less as possible? The
divide and conquer binary search can help. Actually, we can estimate the upper
limit of the solution domain. As a
y
y, We can search in range {0, 1, ..., y}. As
the function f(x) = a
x
is non-decreasing against its argument x, we can rstly
check the middle point candidate x
m
=
0+y
2
, if a
x
m
= y, the solution is found;
if it is less than y, we can drop all candidate solutions before x
m
; otherwise we
drop all candidate solutions after it; Both halve the solution domain. We repeat
this approach until either the solution is found or the solution domain becomes
empty, which indicates there is no integer solution.
The binary search method can be formalized as the following equation. The
non-decreasing function is abstracted as a parameter. To solve our problem, we
can just call it as bsearch(f, y, 0, y), where f(x) = a
x
.
bsearch(f, y, l, u) =
_

_
Err : u < l
m : f(m) = y, m =
l+u
2

bsearch(f, y, l, m1) : f(m) > y
bsearch(f, y, m+ 1, u) : f(m) < y
(14.7)
As we halve the solution domain in every recursion, this method computes
f(x) in O(log y) times. It is much faster than the brute-force searching.
2 dimensions search
Its quite natural to think that the idea of binary search can be extended to 2
dimensions or even more general multiple-dimensions domain. However, it is
not so easy.
Consider the example of a mn matrix M. The elements in each row and
each column are in strict increasing order. Figure 14.1 illustrates such a matrix
for example.
_

_
1 2 3 4 ...
2 4 5 6 ...
3 5 7 8 ...
4 6 8 9 ......
_

_
Figure 14.1: A matrix in strict increasing order for each row and column.
Given a value x, how to locate all elements equal to x in the matrix quickly?
We need develop an algorithm, which returns a list of locations (i, j) so that
M
i,j
= x.
Richard Bird in [1] mentioned that he used this problem to interview candi-
dates for entry to Oxford. The interesting story was that, those who had some
3
One alternative is to reuse the result of a
n
when compute a
n+1
= aa
n
. Here we consider
for general form monotonic function f(n)
566 CHAPTER 14. SEARCHING
computer background at school tended to use binary search. But its easy to
get stuck.
The usual way follows binary search idea is to examine element at M
m
2
,
n
2
.
If it is less than x, we can only drop the elements in the top-left area; If it
is greater than x, only the bottom-right area can be dropped. Both cases are
illustrated in gure 14.2, the gray areas indicate elements can be dropped.
Figure 14.2: Left: the middle point element is smaller than x. All elements in
the gray area are less than x; Right: the middle point element is greater than
x. All elements in the gray area are greater than x.
The problem is that the solution domain changes from a rectangle to a L
shape in both cases. We cant just recursively apply search on it. In order to
solve this problem systematically, we dene the problem more generally, using
brute-force search as a start point, and keep improving it bit by bit.
Consider a function f(x, y), which is strict increasing for its arguments, for
instance f(x, y) = a
x
+ b
y
, where a and b are natural numbers. Given a value
z, which is a natural number too, we want to solve the equation f(x, y) = z by
nding all candidate pairs (x, y).
With this denition, the matrix search problem can be specialized by below
function.
f(x, y) =
_
M
x,y
: 1 x m, 1 y n
1 : otherwise
Brute-force 2D search
As all solutions should be found for f(x, y). One can immediately give the brute
force solution by embedded looping.
1: function Solve(f, z)
2: A
3: for x {0, 1, 2, ..., z} do
4: for y {0, 1, 2, ..., z} do
5: if f(x, y) = z then
6: A A {(x, y)}
7: return A
14.2. SEQUENCE SEARCH 567
This denitely calculates f for (z + 1)
2
times. It can be formalized as in
(14.8).
solve(f, z) = {(x, y)|x {0, 1, ..., z}, y {0, 1, ..., z}, f(x, y) = z} (14.8)
Saddleback search
We havent utilize the fact that f(x, y) is strict increasing yet. Dijkstra pointed
out in [6], instead of searching from bottom-left corner, starting from the top-
left leads to one eective solution. As illustrated in gure 14.3, the search starts
from (0, z), for every point (p, q), we compare f(p, q) with z:
If f(p, q) < z, since f is strict increasing, for all 0 y < q, we have
f(p, y) < z. We can drop all points in the vertical line section (in red
color);
If f(p, q) > z, then f(x, q) > z for all p < x z. We can drop all points
in the horizontal line section (in blue color);
Otherwise if f(p, q) = z, we mark (p, q) as one solution, then both line
sections can be dropped.
This is a systematical way to scale down the solution domain rectangle. We
keep dropping a row, or a column, or both.
Figure 14.3: Search from top-left.
This method can be formalized as a function search(f, z, p, q), which searches
solutions for equation f(x, y) = z in rectangle with top-left corner (p, q), and
bottom-right corner (z, 0). We start the searching by initializing (p, q) = (0, z)
as solve(f, z) = search(f, z, 0, z)
search(f, z, p, q) =
_

_
: p > z q < 0
search(f, z, p + 1, q) : f(p, q) < z
search(f, z, p, q 1) : f(p, q) > z
{(p, q)} search(f, z, p + 1, q 1) : otherwise
(14.9)
568 CHAPTER 14. SEARCHING
The rst clause is the edge case, there is no solution if (p, q) isnt top-left to
(z, 0). The following example Haskell program implements this algorithm.
solve f z = search 0 z where
search p q | p > z | | q < 0 = []
| z < z = search (p + 1) q
| z > z = search p (q - 1)
| otherwise = (p, q) : search (p + 1) (q - 1)
where z = f p q
Considering the calculation of f may be expensive, this program stores the
result of f(p, q) to variable z

. This algorithm can also be implemented in


iterative manner, that the boundaries of solution domain keeps being updated
in a loop.
1: function Solve(f, z)
2: p 0, q z
3: S
4: while p z q 0 do
5: z

f(p, q)
6: if z

< z then
7: p p + 1
8: else if z

> z then
9: q q 1
10: else
11: S S {(p, q)}
12: p p + 1, q q 1
13: return S
Its intuitive to translate this imperative algorithm to real program, as the
following example Python code.
def solve(f, z):
(p, q) = (0, z)
res = []
while p z and q 0:
z1 = f(p, q)
if z1 < z:
p = p + 1
elif z1 > z:
q = q - 1
else:
res.append((p, q))
(p, q) = (p + 1, q - 1)
return res
It is clear that in every iteration, At least one of p and q advances to the
bottom-right corner by one. Thus it takes at most 2(z + 1) steps to complete
searching. This is the worst case. There are three best cases. The rst one
happens that in every iteration, both p and q advance by one, so that it needs
only z +1 steps; The second case keeps advancing horizontally to right and ends
when p exceeds z; The last case is similar, that it keeps moving down vertically
to the bottom until q becomes negative.
Figure 14.4 illustrates the best cases and the worst cases respectively. Figure
14.4 (a) is the case that every point (x, zx) in diagonal satises f(x, zx) = z,
14.2. SEQUENCE SEARCH 569
it uses z +1 steps to arrive at (z, 0); (b) is the case that every point (x, z) along
the top horizontal line gives the result f(x, z) < z, the algorithm takes z + 1
steps to nish; (c) is the case that every point (0, x) along the left vertical line
gives the result f(0, x) > z, thus the algorithm takes z +1 steps to nish; (d) is
the worst case. If we project all the horizontal sections along the search path to
x axis, and all the vertical sections to y axis, it gives the total steps of 2(z +1).
Figure 14.4: The best cases and the worst cases.
Compare to the quadratic brute-force method (O(z
2
)), we improve to a linear
algorithm bound to O(z).
Bird imagined that the name saddleback is because the 3D plot of f with
the smallest bottom-left and the latest top-right and two wings looks like a
saddle as shown in gure 14.5
Figure 14.5: Plot of f(x, y) = x
2
+y
2
.
Improved saddleback search
We havent utilized the binary search tool so far, even the problem extends to
2-dimension domain. The basic saddleback search starts from the top-left corner
570 CHAPTER 14. SEARCHING
(0, z) to the bottom-right corner (z, 0). This is actually over-general domain.
we can constraint it a bit more accurate.
Since f is strict increasing, we can nd the biggest number m, that 0 m
z, along the y axis which satises f(0, m) z; Similarly, we can nd the biggest
n, that 0 n z, along the x axis, which satises f(n, 0) z; And the solution
domain shrinks from (0, z) (z, 0) to (0, m) (n, 0) as shown in gure 14.6.
Figure 14.6: A more accurate search domain shown in gray color.
Of course m and n can be found by brute-force like below.
m = max({y|0 y z, f(0, y) z})
n = max({x|0 x z, f(x, 0) z})
(14.10)
When searching m, the x variable of f is bound to 0. It turns to be one
dimension search problem for a strict increasing function (or in functional term,
a Curried function f(0, y)). Binary search works in such case. However, we
need a bit modication for equation (14.7). Dierent from searching a solution
l x u, so that f(x) = y for a given y; we need search for a solution l x u
so that f(x) y < f(x + 1).
bsearch(f, y, l, u) =
_

_
l : u l
m : f(m) y < f(m+ 1), m =
l+u
2

bsearch(f, y, m+ 1, u) : f(m) y
bsearch(f, y, l, m1) : otherwise
(14.11)
The rst clause handles the edge case of empty range. The lower boundary
is returned in such case; If the middle point produces a value less than or equal
to the target, while the next one evaluates to a bigger value, then the middle
point is what we are looking for; Otherwise if the point next to the middle also
evaluates to a value not greater than the target, the lower bound is increased
by one, and we perform recursively binary search; In the last case, the middle
point evaluates to a value greater than the target, upper bound is updated as
the point proceeds to the middle for further recursive searching. The following
Haskell example code implements this modied binary search.
bsearch f y (l, u) | u l = l
14.2. SEQUENCE SEARCH 571
| f m y = if f (m + 1) y
then bsearch f y (m + 1, u) else m
| otherwise = bsearch f y (l, m-1)
where m = (l + u) div 2
Then m and n can be found with this binary search function.
m = bsearch(
y
f(0, y), z, 0, z)
n = bsearch(
x
f(x, 0), z, 0, z)
(14.12)
And the improved saddleback search shrinks to this new search domain
solve(f, z) = search(f, z, 0, m):
search(f, z, p, q) =
_

_
: p > n q < 0
search(f, z, p + 1, q) : f(p, q) < z
search(f, z, p, q 1) : f(p, q) > z
{(p, q)} search(f, z, p + 1, q 1) : otherwise
(14.13)
Its almost as same as the basic saddleback version, except that it stops
if p exceeds n, but not z. In real implementation, the result of f(p, q) can
be calculated once, and stored in a variable as shown in the following Haskell
example.
solve f z = search 0 m where
search p q | p > n | | q < 0 = []
| z < z = search (p + 1) q
| z > z = search p (q - 1)
| otherwise = (p, q) : search (p + 1) (q - 1)
where z = f p q
m = bsearch (f 0) z (0, z)
n = bsearch (xf x 0) z (0, z)
This improved saddleback search rstly performs binary search two rounds
to nd the proper m, and n. Each round is bound to O(lg z) times of calculation
for f; After that, it takes O(m + n) time in the worst case; and O(min(m, n))
time in the best case. The overall performance is given in the following table.
times of evaluation f
worst case 2 log z +m+n
best case 2 log z +min(m, n)
For some function f(x, y) = a
x
+ b
y
, for positive integers a and b, m and n
will be relative small, that the performance is close to O(lg z).
This algorithm can also be realized in imperative approach. Firstly, the
binary search should be modied.
1: function Binary-Search(f, y, (l, u))
2: while l < u do
3: m
l+u
2

4: if f(m) y then
5: if y < f(m+ 1) then
6: return m
7: l m+ 1
8: else
9: u m
572 CHAPTER 14. SEARCHING
10: return l
Utilize this algorithm, the boundaries m and n can be found before perform-
ing the saddleback search.
1: function Solve(f, z)
2: m Binary-Search(
y
f(0, y), z, (0, z))
3: n Binary-Search(
x
f(x, 0), z, (0, z))
4: p 0, q m
5: S
6: while p n q 0 do
7: z

f(p, q)
8: if z

< z then
9: p p + 1
10: else if z

> z then
11: q q 1
12: else
13: S S {(p, q)}
14: p p + 1, q q 1
15: return S
The implementation is left as exercise to the reader.
More improvement to saddleback search
In gure 14.2, two cases are shown for comparing the value of the middle point
in a matrix with the given value. One case is the center value is smaller than
the given value, the other is bigger. In both cases, we can only throw away
1
4
candidates, and left a L shape for further searching.
Actually, one important case is missing. We can extend the observation to
any point inside the rectangle searching area. As shown in the gure 14.7.
Suppose we are searching in a rectangle from the lower-left corner (a, b) to
the upper-right corner (c, d). If the (p, q) isnt the middle point, and f(p, q) = z.
We cant ensure the area to be dropped is always 1/4. However, if f(p, q) = z,
as f is strict increasing, we are not only sure both the lower-left and the upper-
right sub areas can be thrown, but also all the other points in the column p and
row q. The problem can be scaled down fast, because only 1/2 area is left.
This indicates us, instead of jumping to the middle point to start searching.
A more ecient way is to nd a point which evaluates to the target value. One
straightforward way to nd such a point, is to perform binary search along the
center horizontal line or the center vertical line of the rectangle.
The performance of binary search along a line is logarithmic to the length
of that line. A good idea is to always pick the shorter center line as shown in
gure 14.8. That if the height of the rectangle is longer than the width, we
perform binary search along the horizontal center line; otherwise we choose the
vertical center line.
However, what if we cant nd a point (p, q) in the center line, that satises
f(p, q) = z? Lets take the center horizontal line for example. even in such case,
we can still nd a point that f(p, q) < z < f(p + 1, q). The only dierence is
that we cant drop the points in row p and q completely.
Combine this conditions, the binary search along the horizontally line is
to nd a p, satises f(p, q) z < f(p + 1, q); While the vertical line search
14.2. SEQUENCE SEARCH 573
(a) If f(p, q) = z, only lower-left or upper-right sub area
(in gray color) can be thrown. Both left a L shape.
(b) If f(p, q) = z, both sub areas can be thrown, the scale
of the problem is halved.
Figure 14.7: The eciency of scaling down the search domain.
Figure 14.8: Binary search along the shorter center line.
574 CHAPTER 14. SEARCHING
condition is f(p, q) z < f(p, q + 1).
The modied binary search ensures that, if all points in the line segment
give f(p, q) < z, the upper bound will be found; and the lower bound will be
found if they all greater than z. We can drop the whole area on one side of the
center line in such case.
Sum up all the ideas, we can develop the ecient improved saddleback search
as the following.
1. Perform binary search along the y axis and x axis to nd the tight bound-
aries from (0, m) to (n, 0);
2. Denote the candidate rectangle as (a, b) (c, d), if the candidate rectangle
is empty, the solution is empty;
3. If the height of the rectangle is longer than the width, perform binary
search along the center horizontal line; otherwise, perform binary search
along the center vertical line; denote the search result as (p, q);
4. If f(p, q) = z, record (p, q) as a solution, and recursively search two sub
rectangles (a, b) (p 1, q + 1) and (p + 1, q 1) (c, d);
5. Otherwise, f(p, q) = z, recursively search the same two sub rectangles
plus a line section. The line section is either (p, q +1) (p, b) as shown in
gure 14.9 (a); or (p + 1, q) (c, q) as shown in gure 14.9 (b).
Figure 14.9: Recursively search the gray areas, the bold line should be included
if f(p, q) = z.
This algorithm can be formalized as the following. The equation (14.11),
and (14.12) are as same as before. A new search function should be dened.
Dene Search
(a,b),(c,d)
as a function for searching rectangle with top-left
corner (a, b), and bottom-right corner (c, d).
search
(a,b),(c,d)
=
_
_
_
: c < a d < b
csearch : c a < b d
rsearch : otherwise
(14.14)
Function csearch performs binary search in the center horizontal line to
nd a point (p, q) that f(p, q) z < f(p + 1, q). This is shown in gure 14.9
14.2. SEQUENCE SEARCH 575
(a). There is a special edge case, that all points in the lines evaluate to values
greater than z. The general binary search will return the lower bound as result,
so that (p, q) = (a,
b+d
2
). The whole upper side includes the center line can
be dropped as shown in gure 14.10 (a).
Figure 14.10: Edge cases when performing binary search in the center line.
csearch =
_
_
_
search
(p,q1),(c,d)
: z < f(p, q)
search
(a,b),(p1,q+1)
{(p, q)} search
(p+1,q1),(c,d)
: f(p, q) = z
search
(a,b),(p,q+1)
search
(p+1,q1),(c,d)
: otherwise
(14.15)
Where
q =
b+d
2
)
p = bsearch(
x
f(x, q), z, (a, c))
Function rsearch is quite similar except that it searches in the center hori-
zontal line.
rsearch =
_
_
_
search
(a,b),(p1,q)
: z < f(p, q)
search
(a,b),(p1,q+1)
{(p, q)} search
(p+1,q1),(c,d)
: f(p, q) = z
search
(a,b),(p1,q+1)
search
(p+1,q),(c,d)
: otherwise
(14.16)
Where
p =
a+c
2
)
q = bsearch(
y
f(p, y), z, (d, b))
The following Haskell program implements this algorithm.
search f z (a, b) (c, d) | c < a | | b < d = []
| c - a < b - d = let q = (b + d) div 2 in
csearch (bsearch (x f x q) z (a, c), q)
| otherwise = let p = (a + c) div 2 in
rsearch (p, bsearch (f p) z (d, b))
where
csearch (p, q) | z < f p q = search f z (p, q - 1) (c, d)
576 CHAPTER 14. SEARCHING
| f p q == z = search f z (a, b) (p - 1, q + 1) ++
(p, q) : search f z (p + 1, q - 1) (c, d)
| otherwise = search f z (a, b) (p, q + 1) ++
search f z (p + 1, q - 1) (c, d)
rsearch (p, q) | z < f p q = search f z (a, b) (p - 1, q)
| f p q == z = search f z (a, b) (p - 1, q + 1) ++
(p, q) : search f z (p + 1, q - 1) (c, d)
| otherwise = search f z (a, b) (p - 1, q + 1) ++
search f z (p + 1, q) (c, d)
And the main program calls this function after performing binary search in
X and Y axes.
solve f z = search f z (0, m) (n, 0) where
m = bsearch (f 0) z (0, z)
n = bsearch (x f x 0) z (0, z)
Since we drop half areas in every recursion, it takes O(log(mn)) rounds of
search. However, in order to locate the point (p, q), which halves the problem,
we must perform binary search along the center line. which will call f about
O(log(min(m, n))) times. Denote the time of searching a m n rectangle as
T(m, n), the recursion relationship can be represented as the following.
T(m, n) = log(min(m, n)) + 2T(
m
2
,
n
2
) (14.17)
Suppose m > n, using telescope method, for m = 2
i
, and n = 2
j
. We have:
T(2
i
, 2
j
) = j + 2T(2
i1
, 2
j1
)
=
i1

k=0
2
k
(j k)
= O(2
i
(j i))
= O(mlog(n/m))
(14.18)
Richard Bird proved that this is asymptotically optimal by a lower bound
of searching a given value in mn rectangle [1].
The imperative algorithm is almost as same as the functional version. We
skip it for the sake of brevity.
Exercise 14.1
Prove that the average case for the divide and conquer solution to k-
selection problem is O(n). Please refer to previous chapter about quick
sort.
Implement the imperative k-selection problem with 2-way partition, and
median-of-three pivot selection.
Implement the imperative k-selection problem to handle duplicated ele-
ments eectively.
Realize the median-of-median k-selection algorithm and implement it in
your favorite programming language.
14.2. SEQUENCE SEARCH 577
The tops(k, L) algorithm uses list concatenation likes A {l
1
} tops(k
|A| 1, B). It is linear operation which is proportion to the length of the
list to be concatenated. Modify the algorithm so that the sub lists are
concatenated by one pass.
The author considered another divide and conquer solution for the k-
selection problem. It nds the maximum of the rst k elements and the
minimum of the rest. Denote them as x, and y. If x is smaller than y, it
means that all the rst k elements are smaller than the rest, so that they
are exactly the top k smallest; Otherwise, There are some elements in the
rst k should be swapped.
1: procedure Tops(k, A)
2: l 1
3: u |A|
4: loop
5: i Max-At(A[l..k])
6: j Min-At(A[k + 1..u])
7: if A[i] < A[j] then
8: break
9: Exchange A[l] A[j]
10: Exchange A[k + 1] A[i]
11: l Partition(A, l, k)
12: u Partition(A, k + 1, u)
Explain why this algorithm works? Whats the performance of it?
Implement the binary search algorithm in both recursive and iterative
manner, and try to verify your version automatically. You can either
generate randomized data, test your program with the binary search in-
variant, or compare with the built-in binary search tool in your standard
library.
Implement the improved saddleback search by rstly performing binary
search to nd a more accurate solution domain in your favorite imperative
programming language.
Realize the improved 2D search, by performing binary search along the
shorter center line, in your favorite imperative programming language.
Someone considers that the 2D search can be designed as the following.
When search a rectangle, as the minimum value is at bottom-left, and
the maximum at to-right. If the target value is less than the minimum
or greater than the maximum, then there is no solution; otherwise, the
rectangle is divided into 4 sub rectangles at the center point, then perform
recursively searching.
1: procedure Search(f, z, a, b, c, d) (a, b): bottom-left (c, d):
top-right
2: if z f(a, b) f(c, d) z then
3: if z = f(a, b) then
4: record (a, b) as a solution
5: if z = f(c, d) then
578 CHAPTER 14. SEARCHING
6: record (c, d) as a solution
7: return
8: p
a+c
2

9: q
b+d
2

10: Search(f, z, a, q, p, d)
11: Search(f, z, p, q, c, d)
12: Search(f, z, a, b, p, q)
13: Search(f, z, p, b, c, q)
Whats the performance of this algorithm?
14.2.2 Information reuse
One interesting behavior that is that people learning while searching. We do not
only remember lessons which cause search fails, but also learn patterns which
lead to success. This is a kind of information reusing, no matter the information
is positive or negative. However, Its not easy to determine what information
should be kept. Too little information isnt enough to help eective searching,
while keeping too much is expensive in term of spaces.
In this section, well rst introduce two interesting problems, Boyer-Moore
majority number problem and the maximum sum of sub vector problem. Both
reuse information as little as possible. After that, two popular string matching
algorithms, Knuth-Morris-Pratt algorithm and Boyer-Moore algorithm will be
introduced.
Boyer-Moore majority number
Voting is quite critical to people. We use voting to choose the leader, make
decision or reject a proposal. In the months when I was writing this chapter,
there are three countries in the world voted their presidents. All of the three
voting activities utilized computer to calculate the result.
Suppose there is a country in a small island wants a new president. According
to the constitution, only if the candidate wins more than half of the votes can
be selected as the president. Given a serious of votes, such as A, B, A, C, B,
B, D, ..., can we develop a program tells who is the new president if there is, or
indicate nobody wins more than half of the votes?
Of course this problem can be solved with brute-force by using a map. As
what we did in the chapter of binary search tree
4
.
template<typename T>
T majority(const T xs, int n, T fail) {
map<T, int> m;
int i, max = 0;
T r;
for (i = 0; i < n; ++i)
++m[xs[i]];
for (typename map<T, int>::iterator it = m.begin(); it != m.end(); ++it)
if (itsecond > max) {
4
There is a probabilistic sub-linear space counting algorithm published in 2004, named as
Count-min sketch[8].
14.2. SEQUENCE SEARCH 579
max = itsecond;
r = itfirst;
}
return max 2 > n ? r : fail;
}
This program rst scan the votes, and accumulates the number of votes for
each individual with a map. After that, it traverse the map to nd the one with
the most of votes. If the number is bigger than the half, the winner is found
otherwise, it returns a value to indicate fail.
The following pseudo code describes this algorithm.
1: function Majority(A)
2: M empty map
3: for a A do
4: Put(M, a, 1+ Get(M, a))
5: max 0, m NIL
6: for (k, v) M do
7: if max < v then
8: max v, m k
9: if max > |A|50% then
10: return m
11: else
12: fail
For m individuals and n votes, this program rstly takes about O(nlog m)
time to build the map if the map is implemented in self balanced tree (red-black
tree for instance); or about O(n) time if the map is hash table based. However,
the hash table needs more space. Next the program takes O(m) time to traverse
the map, and nd the majority vote. The following table lists the time and space
performance for dierent maps.
map time space
self-balanced tree O(nlog m) O(m)
hashing O(n) O(m) at least
Boyer and Moore invented a cleaver algorithm in 1980, which can pick the
majority element with only one scan if there is. Their algorithm only needs
O(1) space [7].
The idea is to record the rst candidate as the winner so far, and mark
him with 1 vote. During the scan process, if the winner being selected gets
another vote, we just increase the vote counter; otherwise, it means somebody
vote against this candidate, so the vote counter should be decreased by one. If
the vote counter becomes zero, it means this candidate is voted out; We select
the next candidate as the new winner and repeat the above scanning process.
Suppose there is a series of votes: A, B, C, B, B, C, A, B, A, B, B, D, B.
Below table illustrates the steps of this processing.
580 CHAPTER 14. SEARCHING
winner count scan position
A 1 A, B, C, B, B, C, A, B, A, B, B, D, B
A 0 A, B, C, B, B, C, A, B, A, B, B, D, B
C 1 A, B, C, B, B, C, A, B, A, B, B, D, B
C 0 A, B, C, B, B, C, A, B, A, B, B, D, B
B 1 A, B, C, B, B, C, A, B, A, B, B, D, B
B 0 A, B, C, B, B, C, A, B, A, B, B, D, B
A 1 A, B, C, B, B, C, A, B, A, B, B, D, B
A 0 A, B, C, B, B, C, A, B, A, B, B, D, B
A 1 A, B, C, B, B, C, A, B, A, B, B, D, B
A 0 A, B, C, B, B, C, A, B, A, B, B, D, B
B 1 A, B, C, B, B, C, A, B, A, B, B, D, B
B 0 A, B, C, B, B, C, A, B, A, B, B, D, B
B 1 A, B, C, B, B, C, A, B, A, B, B, D, B
The key point is that, if there exits the majority greater than 50%, it cant
be voted out by all the others. However, if there are not any candidates win
more than half of the votes, the recorded winner is invalid. Thus it is necessary
to perform a second round scan for verication.
The following pseudo code illustrates this algorithm.
1: function Majority(A)
2: c 0
3: for i 1 to |A| do
4: if c = 0 then
5: x A[i]
6: if A[i] = x then
7: c c + 1
8: else
9: c c 1
10: return x
If there is the majority element, this algorithm takes one pass to scan the
votes. In every iteration, it either increases or decreases the counter according
to the vote is support or against the current selection. If the counter becomes
zero, it means the current selection is voted out. So the new one is selected as
the updated candidate for further scan.
The process is linear O(n) time, and the spaces needed are just two variables.
One for recording the selected candidate so far, the other is for vote counting.
Although this algorithm can nd the majority element if there is. it still
picks an element even there isnt. The following modied algorithm veries the
nal result with another round of scan.
1: function Majority(A)
2: c 0
3: for i 1 to |A| do
4: if c = 0 then
5: x A[i]
6: if A[i] = x then
7: c c + 1
8: else
9: c c 1
14.2. SEQUENCE SEARCH 581
10: c 0
11: for i 1 to |A| do
12: if A[i] = x then
13: c c + 1
14: if c > %50|A| then
15: return x
16: else
17: fail
Even with this verication process, the algorithm is still bound to O(n) time,
and the space needed is constant. The following ISO C++ program implements
this algorithm
5
.
template<typename T>
T majority(const T xs, int n, T fail) {
T m;
int i, c;
for (i = 0, c = 0; i < n; ++i) {
if (!c)
m = xs[i];
c += xs[i] == m ? 1 : -1;
}
for (i = 0, c = 0; i < n; ++i, c += xs[i] == m);
return c 2 > n ? m : fail;
}
Boyer-Moore majority algorithm can also be realized in purely functional
approach. Dierent from the imperative settings, which use variables to record
and update information, accumulators are used to dene the core algorithm.
Dene function maj(c, n, L), which takes a list of votes L, a selected candidate
c so far, and a counter n. For non empty list L, we initialize c as the rst vote
l
1
, and set the counter as 1 to start the algorithm: maj(l
1
, 1, L

), where L

is
the rest votes except for l
1
. Below are the denition of this function.
maj(c, n, L) =
_

_
c : L =
maj(c, n + 1, L

) : l
1
= c
maj(l
1
, 1, L

) : n = 0 l
1
= c
maj(c, n 1, L

) : otherwise
(14.19)
We also need to dene a function, which can verify the result. The idea is
that, if the list of votes is empty, the nal result is a failure; otherwise, we start
the Boyer-Moore algorithm to nd a candidate c, then we scan the list again to
count the total votes c wins, and verify if this number is not less than the half.
majority(L) =
_
_
_
fail : L =
c : c = maj(l
1
, 1, L

), |{x|x L, x = c}| > %50|L|


fail : otherwise
(14.20)
Below Haskell example code implements this algorithm.
majority :: (Eq a) [a] Maybe a
5
We actually uses the ANSI C style. The C++ template is only used to generalize the
type of the element
582 CHAPTER 14. SEARCHING
majority [] = Nothing
majority (x:xs) = let m = maj x 1 xs in verify m (x:xs)
maj c n [] = c
maj c n (x:xs) | c == x = maj c (n+1) xs
| n == 0 = maj x 1 xs
| otherwise = maj c (n-1) xs
verify m xs = if 2 (length $ filter (==m) xs) > length xs
then Just m else Nothing
Maximum sum of sub vector
Jon Bentley presents another interesting puzzle which can be solved by using
quite similar idea in [4]. The problem is to nd the maximum sum of sub vector.
For example in the following array, The sub vector {19, -12, 1, 9, 18} yields the
biggest sum 35.
3 -13 19 -12 1 9 18 -16 15 -15
Note that it is only required to output the value of the maximum sum. If
all the numbers are positive, the answer is denitely the sum of all. Another
special case is that all numbers are negative. We dene the maximum sum is 0
for an empty sub vector.
Of course we can nd the answer with brute-force, by calculating all sums of
sub vectors and picking the maximum. Such naive method is typical quadratic.
1: function Max-Sum(A)
2: m 0
3: for i 1 to |A| do
4: s 0
5: for j i to |A| do
6: s s +A[j]
7: m Max(m, s)
8: return m
The brute force algorithm does not reuse any information in previous search.
Similar with Boyer-Moore majority vote algorithm, we can record the maximum
sum end to the position where we are scanning. Of course we also need record
the biggest sum found so far. The following gure illustrates this idea and the
invariant during scan.
... max ... max end at i ...
i
Figure 14.11: Invariant during scan.
At any time when we scan to the i-th position, the max sum found so far is
recorded as A. At the same time, we also record the biggest sum end at i as B.
Note that A and B may not be the same, in fact, we always maintain B A. and
when B becomes greater than A by adding with the next element, we update A
14.2. SEQUENCE SEARCH 583
with this new value. When B becomes negative, this happens when the next el-
ement is a negative number, we reset it to 0. The following tables illustrated the
steps when we scan the example vector {3, 13, 19, 12, 1, 9, 18, 16, 15, 15}.
max sum max end at i list to be scan
0 0 {3, 13, 19, 12, 1, 9, 18, 16, 15, 15}
3 3 {13, 19, 12, 1, 9, 18, 16, 15, 15}
3 0 {19, 12, 1, 9, 18, 16, 15, 15}
19 19 {12, 1, 9, 18, 16, 15, 15}
19 7 {1, 9, 18, 16, 15, 15}
19 8 {9, 18, 16, 15, 15}
19 17 {18, 16, 15, 15}
35 35 {16, 15, 15}
35 19 {15, 15}
35 34 {15}
35 19 {}
This algorithm can be described as below.
1: function Max-Sum(V )
2: A 0, B 0
3: for i 1 to |V | do
4: B Max(B +V [i], 0)
5: A Max(A, B)
It is trivial to implement this linear time algorithm, that we skip the details
here.
This algorithm can also be dened in functional approach. Instead of mu-
tating variables, we use accumulator to record A and B. In order to search the
maximum sum of list L, we call the below function with max
sum
(0, 0, L).
max
sum
(A, B, L) =
_
A : L =
max
sum
(A

, B

, L

) : otherwise
(14.21)
Where
B

= max(l
1
+B, 0)
A

= max(A, B

)
Below Haskell example code implements this algorithm.
maxsum = msum 0 0 where
msum a _ [] = a
msum a b (x:xs) = let b = max (x+b) 0
a = max a b
in msum a b xs
KMP
String matching is another important type of searching. Almost all the software
editors are equipped with tools to nd string in the text. In chapters about Trie,
Patricia, and sux tree, we have introduced some powerful data structures
which can help to search string. In this section, we introduce another two string
matching algorithms all based on information reusing.
Some programming environments provide built-in string search tools, how-
ever, most of them are brute-force solution including strstr function in ANSI
584 CHAPTER 14. SEARCHING
C standard library, nd in C++ standard template library, indexOf in Java
Development Kit etc. Figure 14.12 illustrate how such character-by-character
comparison process works.
a n a n y m
s
a n y a n a n t h o u s a n a n y m f l o w e r T
P
q
(a) The oset s = 4, after matching q = 4 characters, the 5th mismatch.
a n a n y m
s
a n y a n a n t h o u s a n a n y m f l o w e r T
P
q
(b) Move s = 4 + 2 = 6, directly.
Figure 14.12: Match ananym in any ananthous ananym ower.
Suppose we search a pattern P in text T, as shown in gure 14.12 (a), at
oset s = 4, the process examines every character in P and T to check if they
are same. It successfully matches the rst 4 characters anan. However, the
5th character in the pattern string is y. It doesnt match the corresponding
character in the text, which is t.
At this stage, the brute-force solution terminates the attempt, increases s by
one to 5, and restart the comparison between ananym and nantho.... Actually,
we can increase s not only by one. This is because we have already known that
the rst four characters anan have been matched, and the failure happens at
the 5th position. Observe the two letters prex an of the pattern string is also
a sux of anan that we have matched so far. A more eective way is to shift
s by two but not one, which is shown in gure 14.12 (b). By this means, we
reused the information that 4 characters have been matched. This helps us to
skip invalid positions as many as possible.
Knuth, Morris and Pratt presented this idea in [9] and developed a novel
string matching algorithm. This algorithm is later called as KMP, which is
consist of the three authors initials.
For the sake of brevity, we denote the rst k characters of text T as T
k
.
Which means T
k
is the k-character prex of T.
The key point to shift s eectively is to nd a function of q, where q is the
number of characters matched successfully. For instance, q is 4 in gure 14.12
(a), as the 5th character doesnt match.
Consider what situation we can shift s more than 1. As shown in gure
14.13, if we can shift the pattern P ahead, there must exist k, so that the rst k
characters are as same as the last k characters of P
q
. In other words, the prex
P
k
is sux of P
q
.
14.2. SEQUENCE SEARCH 585
P[1] P[2] ... P[j] P[j+1] ... P[q] ...
s
... T[i] T[i+1] T[i+2] ... ... ... ... T[i+q-1] ... T
P[1] P[2] ... P[k] ...
P
P
Figure 14.13: P
k
is both prex of P
q
and sux of P
q
.
Its possible that there is no such a prex that is the sux at the same time.
If we treat empty string as both the prex and the sux of any others, there
must be at least one solution that k = 0. Its also quite possible that there are
multiple k satisfy. To avoid missing any possible matching positions, we have
to nd the biggest k. We can dene a prex function (q) which tells us where
we can fallback if the (q + 1)-th character does not match [2].
(q) = max{k|k < q P
k
P
q
} (14.22)
Where is read as is sux of. For instance, A B means A is sux of
B. This function is used as the following. When we match pattern P against
text T from oset s, If it fails after matching q characters, we next look up (q)
to get a fallback q

, and retry to compare P[q

] with the previous unmatched


character. Based on this idea, the core algorithm of KMP can be described as
the following.
1: function KMP(T, P)
2: n |T|, m |P|
3: build prex function from P
4: q 0 How many characters have been matched so far.
5: for i 1 to n do
6: while q > 0 P[q + 1] = T[i] do
7: q (q)
8: if P[q + 1] = T[i] then
9: q q + 1
10: if q = m then
11: found one solution at i m
12: q (q) look for next solution
Although the denition of prex function (q) is given in equation (14.22),
realizing it blindly by nding the longest sux isnt eective. Actually we can
use the idea of information reusing again to build the prex function.
The trivial edge case is that, the rst character doesnt match. In this case
the longest prex, which is also the sux is denitely empty, so (1) = k = 0.
We record the longest prex as P
k
. In this edge case P
k
= P
0
is the empty
string.
After that, when we scan at the q-th character in the pattern string P, we
hold the invariant that the prex function values (i) for i in {1, 2, ..., q 1}
have already been recorded, and P
k
is the longest prex which is also the sux
of P
q1
. As shown in gure 14.14, if P[q] = P[k + 1], A bigger k than before
586 CHAPTER 14. SEARCHING
is found, we can increase the maximum of k by one; otherwise, if they are not
same, we can use (k) to fallback to a shorter prex P
k
where k

= (k), and
check if the next character after this new prex is same as the q-th character.
We need repeat this step until either k becomes zero (which means only empty
string satises), or the q-th character matches.
P[1] P[2] ... P[k] P[k+1] ... P[q-1] P[q] ...
P[1] P[2] ... P[k] P[k+1] ...
?
Figure 14.14: P
k
is sux of P
q1
, P[q] and P[k + 1] are compared.
Realizing this idea gives the KMP prex building algorithm.
1: function Build-Prefix-Function(P)
2: m |P|, k 0
3: (1) 0
4: for q 2 to m do
5: while k > 0 P[q] = P[k + 1] do
6: k (k)
7: if P[q] = P[k + 1] then
8: k k + 1
9: (q) k
10: return
The following table lists the steps of building prex function for pattern
string ananym. Note that the k in the table actually means the maximum k
satises equation (14.22).
q P
q
k P
k
1 a 0
2 an 0
3 ana 1 a
4 anan 2 an
5 anany 0
6 ananym 0
Translating the KMP algorithm to Python gives the below example code.
def kmp_match(w, p):
n = len(w)
m = len(p)
fallback = fprefix(p)
k = 0 # how many elements have been matched so far.
res = []
for i in range(n):
while k > 0 and p[k] != w[i]:
k = fallback[k] #fall back
if p[k] == w[i]:
k = k + 1
if k == m:
res.append(i+1-m)
14.2. SEQUENCE SEARCH 587
k = fallback[k-1] # look for next
return res
def fprefix(p):
m = len(p)
t = [0]m # fallback table
k = 0
for i in range(2, m):
while k>0 and p[i-1] != p[k]:
k = t[k-1] #fallback
if p[i-1] == p[k]:
k = k + 1
t[i] = k
return t
The KMP algorithm builds the prex function for the pattern string as a
kind of pre-processing before the search. Because of this, it can reuse as much
information of the previous matching as possible.
The amortized performance of building the prex function is O(m). This
can be proved by using potential method as in [2]. Using the similar method, it
can be proved that the matching algorithm itself is also linear. Thus the total
performance is O(m+n) at the expense of the O(m) space to record the prex
function table.
It seems that varies pattern string would aect the performance of KMP.
Considering the case that we are nding pattern string aaa...a of length m in a
string aaa...a of length n. All the characters are same, when the last character
in the pattern is examined, we can only fallback by 1, and this 1 character
fallback repeats until it falls back to zero. Even in this extreme case, KMP
algorithm still holds its linear performance (why?). Please try to consider more
cases such as P = aaaa...b, T = aaaa...a and so on.
Purely functional KMP algorithm
It is not easy to realize KMP matching algorithm in purely functional man-
ner. The imperative algorithm represented so far intensely uses array to record
prex function values. Although it is possible to utilize sequence like struc-
ture in purely functional settings, such sequence is typically implemented with
nger tree. Unlike native arrays, nger tree needs logarithm time for random
accessing
6
.
Richard Bird presents a formal program deduction to KMP algorithm by
using fold fusion law in chapter 17 of [1]. In this section, we show how to
develop purely functional KMP algorithm step by step from a brute-force prex
function creation method.
Both text string and pattern are represented as singly linked-list in purely
functional settings. During the scan process, these two lists are further parti-
tioned, every one is broken into two parts. As shown in gure 14.15, The rst j
characters in the pattern string have been matched. T[i +1] and P[j +1] will be
compared next. If they are same, we need append the character to the matched
6
Again, we dont use native array, even it is supported in some functional programming
environments like Haskell.
588 CHAPTER 14. SEARCHING
part. However, since strings are essentially singly linked list, such appending is
proportion to j.
P[1] P[2] ... P[j]
s
T[1] T[2] ... ... T[i-1] T[i] T[i+1] T[i+2] ... ... T[n-1] T[n]
P[j+1] P[j+2] ... P[m]
?
T
P
Figure 14.15: P
k
is sux of P
q1
, P[q] and P[k + 1] are compared.
Denote the rst i characters as T
p
, which means the prex of T, the rest
characters as T
s
for sux; Similarly, the rst j characters as P
p
, and the rest
as P
s
; Denote the rst character of T
s
as t, the rst character of P
s
as p. We
have the following cons relationship.
T
s
= cons(t, T

s
)
P
s
= cons(p, P

s
)
If t = p, note the following updating process is bound to linear time.
T

p
= T
p
{t}
P

p
= P
p
{p}
Weve introduced a method in the chapter about purely functional queue,
which can solve this problem. By using a pair of front and rear list, we can
turn the linear time appending to constant time linking. The key point is to
represent the prex part in reverse order.
T = T
p
T
s
= reverse(reverse(T
p
)) T
s
= reverse(

T
p
) T
s
P = P
p
P
s
= reverse(reverse(P
p
)) P
s
= reverse(

P
p
) P
s
(14.23)
The idea is to using pair (

T
p
, T
s
) and (

P
p
, P
s
) instead. With this change,
the if t = p, we can update the prex part fast in constant time.

p
= cons(t,

T
p
)

p
= cons(p,

P
p
)
(14.24)
The KMP matching algorithm starts by initializing the success prex parts
to empty strings as the following.
search(P, T) = kmp(, (, P)(, T)) (14.25)
Where is the prex function we explained before. The core part of KMP
14.2. SEQUENCE SEARCH 589
algorithm, except for the prex function building, can be dened as below.
kmp(, (

P
p
, P
s
), (

T
p
, T
s
)) =
_

_
{|

T
p
|} : P
s
= T
s
=
: P
s
= T
s
=
{|

T
p
} kmp(, (

P
p
, P
s
), (

T
p
, T
s
)) : P
s
= T
s
=
kmp(, (

p
, P

s
), (

p
, T

s
)) : t = p
kmp(, (

P
p
, P
s
), (

p
, T

s
)) : t = p

P
p
=
kmp(, (

P
p
, P
s
), (

T
p
, T
s
)) : t = p

P
p
=
(14.26)
The rst clause states that, if the scan successfully ends to both the pattern
and text strings, we get a solution, and the algorithm terminates. Note that we
use the right position in the text string as the matching point. Its easy to use
the left position by subtracting with the length of the pattern string. For sake
of brevity, we switch to right position in functional solutions.
The second clause states that if the scan arrives at the end of text string,
while there are still rest of characters in the pattern string havent been matched,
there is no solution. And the algorithm terminates.
The third clause states that, if all the characters in the pattern string have
been successfully matched, while there are still characters in the text havent
been examined, we get a solution, and we fallback by calling prex function
to go on searching other solutions.
The fourth clause deals with the case, that the next character in pattern
string and text are same. In such case, the algorithm advances one character
ahead, and recursively performs searching.
If the the next characters are not same and this is the rst character in the
pattern string, we just need advance to next character in the text, and try again.
Otherwise if this isnt the rst character in the pattern, we call prex function
to fallback, and try again.
The brute-force way to build the prex function is just to follow the denition
equation (14.22).
(

P
p
, P
s
) = (

p
, P

s
) (14.27)
where
P

p
= longest({s|s prefixes(P
p
), s P
p
})
P

s
= P P

p
Every time when calculate the fallback position, the algorithm naively enu-
merates all prexes of P
p
, checks if it is also the sux of P
p
, and then pick the
longest one as result. Note that we reuse the subtraction symbol here for list
dier operation.
There is a tricky case which should be avoided. Because any string itself is
both its prex and sux. Say P
p
P
p
and P
p
P
p
. We shouldnt enumerate
P
p
as a candidate prex. One solution of such prex enumeration can be realized
as the following.
prefixes(L) =
_
{} : L = |L| = 1
cons(, map(
s
cons(l
1
, s), prefixes(L

))) : otherwise
(14.28)
590 CHAPTER 14. SEARCHING
Below Haskell example program implements this version of string matching
algorithm.
kmpSearch1 ptn text = kmpSearch next ([], ptn) ([], text)
kmpSearch _ (sp, []) (sw, []) = [length sw]
kmpSearch _ _ (_, []) = []
kmpSearch f (sp, []) (sw, ws) = length sw : kmpSearch f (f sp []) (sw, ws)
kmpSearch f (sp, (p:ps)) (sw, (w:ws))
| p == w = kmpSearch f ((p:sp), ps) ((w:sw), ws)
| otherwise = if sp ==[] then kmpSearch f (sp, (p:ps)) ((w:sw), ws)
else kmpSearch f (f sp (p:ps)) (sw, (w:ws))
next sp ps = (sp, ps) where
prev = reverse sp
prefix = longest [xs | xs inits prev, xs isSuffixOf prev]
sp = reverse prefix
ps = (prev ++ ps) \\ prefix
longest = maximumBy (compare on length)
inits [] = [[]]
inits [_] = [[]]
inits (x:xs) = [] : (map (x:) $ inits xs)
This version does not only perform poorly, but it is also complex. We can
simplify it a bit. Observing the KMP matching is a scan process from left to
the right of the text, it can be represented with folding (refer to Appendix A for
detail). Firstly, we can augment each character with an index for folding like
below.
zip(T, {1, 2, ...}) (14.29)
Zipping the text string with innity natural numbers gives list of pairs. For
example, text string The quick brown fox jumps over the lazy dog turns into
(T, 1), (h, 2), (e, 3), ... (o, 42), (g, 43).
The initial state for folding contains two parts, one is the pair of pattern
(P
p
, P
s
), with prex starts from empty, and the sux is the whole pattern
string (, P). For illustration purpose only, we revert back to normal pairs
but not (

P
p
, P
s
) notation. It can be easily replaced with reversed form in the
nalized version. This is left as exercise to the reader. The other part is a
list of positions, where the successful matching are found. It starts from empty
list. After the folding nishes, this list contains all solutions. What we need
is to extract this list from the nal state. The core KMP search algorithm is
simplied like this.
kmp(P, T) = snd(fold(search, ((, P), ), zip(T, {1, 2, ...}))) (14.30)
The only black box is the search function, which takes a state, and a pair
of character and index, and it returns a new state as result. Denote the rst
character in P
s
as p and the rest characters as P

s
(P
s
= cons(p, P

s
)), we have
14.2. SEQUENCE SEARCH 591
the following denition.
search(((P
p
, P
s
), L), (c, i)) =
_

_
((P
p
p, P

s
), L {i}) : p = c P

s
=
((P
p
p, P

s
), L) : p = c P

s
=
((P
p
, P
s
), L) : P
p
=
search(((P
p
, P
s
), L), (c, i)) : otherwise
(14.31)
If the rst character in P
s
matches the current character c during scan, we
need further check if all the characters in the pattern have been examined, if so,
we successfully nd a solution, This position i in list L is recorded; Otherwise,
we advance one character ahead and go on. If p does not match c, we need
fallback for further retry. However, there is an edge case that we cant fallback
any more. P
p
is empty in this case, and we need do nothing but keep the current
state.
The prex-function developed so far can also be improved a bit. Since we
want to nd the longest prex of P
p
, which is also sux of it, we can scan from
right to left instead. For any non empty list L, denote the rst element as l
1
,
and all the rest except for the rst one as L

, dene a function init(L), which


returns all the elements except for the last one as below.
init(L) =
_
: |L| = 1
cons(l
1
, init(L

)) : otherwise
(14.32)
Note that this function can not handle empty list. The idea of scan from
right to left for P
p
is rst check if init(P
p
) P
p
, if yes, then we are done;
otherwise, we examine if init(init(P
p
)) is OK, and repeat this till the left most.
Based on this idea, the prex-function can be modied as the following.
(P
p
, P
s
) =
_
(P
p
, P
s
) : P
p
=
fallback(init(P
p
), cons(last(P
p
), P
s
)) : otherwise
(14.33)
Where
fallback(A, B) =
_
(A, B) : A P
p
(init(A), cons(last(A), B)) : otherwise
(14.34)
Note that fallback always terminates because empty string is sux of any
string. The last(L) function returns the last element of a list, it is also a
linear time operation (refer to Appendix A for detail). However, its constant
operation if we use

P
p
approach. This improved prex-function is bound to
linear time. It is still quite slower than the imperative algorithm which can
look up prex-function in constant O(1) time. The following Haskell example
program implements this minor improvement.
failure ([], ys) = ([], ys)
failure (xs, ys) = fallback (init xs) (last xs:ys) where
fallback as bs | as isSuffixOf xs = (as, bs)
| otherwise = fallback (init as) (last as:bs)
kmpSearch ws txt = snd $ foldl f (([], ws), []) (zip txt [1..]) where
592 CHAPTER 14. SEARCHING
f (p@(xs, (y:ys)), ns) (x, n) | x == y = if ys==[] then ((xs++[y], ys), ns++[n])
else ((xs++[y], ys), ns)
| xs == [] = (p, ns)
| otherwise = f (failure p, ns) (x, n)
f (p, ns) e = f (failure p, ns) e
The bottleneck is that we can not use native array to record prex functions
in purely functional settings. In fact the prex function can be understood as
a state transform function. It transfer from one state to the other according to
the matching is success or fail. We can abstract such state changing as a tree.
In environment supporting algebraic data type, Haskell for example, such state
tree can be dened like below.
data State a = E | S a (State a) (State a)
A state is either empty, or contains three parts: the current state, the new
state if match fails, and the new state if match succeeds. Such denition is quite
similar to the binary tree. We can call it left-fail, right-success tree. The state
we are using here is (P
p
, P
s
).
Similar as imperative KMP algorithm, which builds the prex function from
the pattern string, the state transforming tree can also be built from the pattern.
The idea is to build the tree from the very beginning state (, P), with both
its children empty. We replace the left child with a new state by calling
function dened above, and replace the right child by advancing one character
ahead. There is an edge case, that when the state transfers to (P, ), we can
not advance any more in success case, such node only contains child for failure
case. The build function is dened as the following.
build((P
p
, P
s
), , ) =
_
build((P
p
, P
s
), , ) : P
s
=
build((P
p
, P
s
), L, R) : otherwise
(14.35)
Where
L = build((P
p
, P
s
), , )
R = build((P
s
{p}, P

s
), , ))
The meaning of p and P

s
are as same as before, that p is the rst character
in P
s
, and P

s
is the rest characters. The most interesting point is that the build
function will never stop. It endless build a innite tree. In strict programming
environment, calling this function will freeze. However, in environments support
lazy evaluation, only the nodes have to be used will be created. For example,
both Haskell and Scheme/Lisp are capable to construct such innite state tree.
In imperative settings, it is typically realized by using pointers which links to
ancestor of a node.
Figure 14.16 illustrates such an innite state tree for pattern string ananym.
Note that the right most edge represents the case that the matching continuously
succeed for all characters. After that, since we cant match any more, so the
right sub-tree is empty. Base on this fact, we can dene a auxiliary function to
test if a state indicates the whole pattern is successfully matched.
match((P
p
, P
s
), L, R) =
_
True : P
s
=
False : otherwise
(14.36)
14.2. SEQUENCE SEARCH 593
(, ananym)
(, ananym)
fail
(a, nanym)
match
(, ananym)
fail
(a, ananym)
match
(, ananym)
fail
(an, anym)
match
... ... ... (, ananym)
fail
(ana, nym)
match
... (a, nanym)
fail
(anan, ym)
match
... (an, anym)
fail
(anany, m)
match
(, ananym)
fail
(ananym, )
match
(, ananym)
fail
empty
Figure 14.16: The innite state tree for pattern ananym.
594 CHAPTER 14. SEARCHING
With the help of state transform tree, we can realize KMP algorithm in an
automaton manner.
kmp(P, T) = snd(fold(search, (Tr, []), zip(T, {1, 2, ...}))) (14.37)
Where the tree Tr = build((, P), , ) is the innite state transform tree.
Function search utilizes this tree to transform the state according to match or
fail. Denote the rst character in P
s
as p, the rest characters as P

s
, and the
matched positions found so far as A.
search((((P
p
, P
s
), L, R), A), (c, i)) =
_

_
(R, A {i}) : p = c match(R)
(R, A) : p = c match(R)
((((P
p
, P
s
), L, R), A) : P
p
=
search((L, A), (c, i)) : otherwise
(14.38)
The following Haskell example program implements this algorithm.
data State a = E | S a (State a) (State a) -- state, ok-state, fail-state
deriving (Eq, Show)
build :: (Eq a)State ([a], [a]) State ([a], [a])
build (S s@(xs, []) E E) = S s (build (S (failure s) E E)) E
build (S s@(xs, (y:ys)) E E) = S s l r where
l = build (S (failure s) E E) -- fail state
r = build (S (xs++[y], ys) E E)
matched (S (_, []) _ _) = True
matched _ = False
kmpSearch3 :: (Eq a) [a] [a] [Int]
kmpSearch3 ws txt = snd $ foldl f (auto, []) (zip txt [1..]) where
auto = build (S ([], ws) E E)
f (s@(S (xs, ys) l r), ns) (x, n)
| [x] isPrefixOf ys = if matched r then (r, ns++[n])
else (r, ns)
| xs == [] = (s, ns)
| otherwise = f (l, ns) (x, n)
The bottle-neck is that the state tree building function calls to fallback.
While current denition of isnt eective enough, because it enumerates all
candidates from right to the left every time.
Since the state tree is innite, we can adopt some common treatment for
innite structures. One good example is the Fibonacci series. The rst two
Fibonacci numbers are dened as 0 and 1; the rest Fibonacci numbers can be
obtained by adding the previous two numbers.
F
0
= 0
F
1
= 1
F
n
= F
n1
+F
n2
(14.39)
14.2. SEQUENCE SEARCH 595
Thus the Fibonacci numbers can be list one by one as the following
F
0
= 0
F
1
= 1
F
2
= F
1
+F
0
F
3
= F
2
+F
1
...
(14.40)
We can collect all numbers in both sides, and dene F = {0, 1, F
1
, F
2
, ...},
Thus we have the following equation.
F = {0, 1, F
1
+F
0
, F
2
+F
1
, ...}
= {0, 1} {x +y|x {F
0
, F
1
, F
2
, ...}, y {F
1
, F
2
, F
3
, ...}}
= {0, 1} {x +y|x F, y F

}
(14.41)
Where F

= tail(F) is all the Fibonacci numbers except for the rst one. In
environments support lazy evaluation, like Haskell for instance, this denition
can be expressed like below.
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
The recursive denition for innite Fibonacci series indicates an idea which
can be used to get rid of the fallback function . Denote the state transfer tree
as T, we can dene the transfer function when matching a character on this tree
as the following.
trans(T, c) =
_
_
_
root : T =
R : T = ((P
p
, P
s
), L, R), c = p
trans(L, c) : otherwise
(14.42)
If we match a character against empty node, we transfer to the root of the
tree. Well dene the root later soon. Otherwise, we compare if the character c
is as same as the rst character p in P
s
. If they match, then we transfer to the
right sub tree for this success case; otherwise, we transfer to the left sub tree
for fail case.
With transfer function dened, we can modify the previous tree building
function accordingly. This is quite similar to the previous Fibonacci series def-
inition.
build(T, (P
p
, P
s
)) = ((P
p
, P
s
), T, build(trans(T, p), (P
p
{p}, P

s
)))
The right hand of this equation contains three parts. The rst one is the
state that we are matching (P
p
, P
s
); If the match fails, Since T itself can handle
any fail case, we use it directly as the left sub tree; otherwise we recursive build
the right sub tree for success case by advancing one character ahead, and calling
transfer function we dened above.
However, there is an edge case which has to be handled specially, that if P
s
is empty, which indicates a successful match. As dened above, there isnt right
sub tree any more. Combining these cases gives the nal building function.
build(T, (P
p
, P
s
)) =
_
((P
p
, P
s
), T, ) : P
s
=
((P
p
, P
s
), T, build(trans(T, p), (P
p
{p}, P

s
))) : otherwise
(14.43)
596 CHAPTER 14. SEARCHING
The last brick is to dene the root of the innite state transfer tree, which
initializes the building.
root = build(, (, P)) (14.44)
And the new KMP matching algorithm is modied with this root.
kmp(P, T) = snd(fold(trans, (root, []), zip(T, {1, 2, ...}))) (14.45)
The following Haskell example program implements this nal version.
kmpSearch ws txt = snd $ foldl tr (root, []) (zip txt [1..]) where
root = build E ([], ws)
build fails (xs, []) = S (xs, []) fails E
build fails s@(xs, (y:ys)) = S s fails succs where
succs = build (fst (tr (fails, []) (y, 0))) (xs++[y], ys)
tr (E, ns) _ = (root, ns)
tr ((S (xs, ys) fails succs), ns) (x, n)
| [x] isPrefixOf ys = if matched succs then (succs, ns++[n]) else (succs, ns)
| otherwise = tr (fails, ns) (x, n)
Figure 14.17 shows the rst 4 steps when search anaym in text anal. Since
the rst 3 steps all succeed, so the left sub trees of these 3 states are not actually
constructed. They are marked as ?. In the fourth step, the match fails, thus the
right sub tree neednt be built. On the other hand, we must construct the left
sub tree, which is on top of the result of trans(right(right(right(T))), n), where
function right(T) returns the right sub tree of T. This can be further expanded
according to the denition of building and state transforming functions till we
get the concrete state ((a, nanym), L, R). The detailed deduce process is left as
exercise to the reader.
(, ananym)
?
fail
(a, nanym)
match
?
fail
(an, anym)
match
?
fail
(ana, nym)
match
(a, nanym)
fail
?
match
Figure 14.17: On demand construct the state transform tree when searching
ananym in text anal.
This algorithm depends on the lazy evaluation critically. All the states to
be transferred are built on demand. So that the building process is amortized
14.2. SEQUENCE SEARCH 597
O(m), and the total performance is amortized O(n + m). Readers can refer to
[1] for detailed proof of it.
Its worth of comparing the nal purely functional and the imperative algo-
rithms. In many cases, we have expressive functional realization, however, for
KMP matching algorithm, the imperative approach is much simpler and more
intuitive. This is because we have to mimic the raw array by a innite state
transfer tree.
Boyer-Moore
Boyer-Moore string matching algorithm is another eective solution invited in
1977 [10]. The idea of Boyer-Moore algorithm comes from the following obser-
vation.
The bad character heuristics
When attempt to match the pattern, even if there are several characters from
the left are same, it fails if the last one does not match, as shown in gure 14.18.
Whats more, we wouldnt nd a match even if we slide the pattern down by
1, or 2. Actually, the length of the pattern ananym is 5, the last character is
m, however, the corresponding character in the text is h. It does not appear
in the pattern at all. We can directly slide the pattern down by 5.
a n a n y m
s
a n y a n a n t h o u s a n a n y m f l o w e r T
P
q
Figure 14.18: Since character h doesnt appear in the pattern, we wouldnt nd
a match even if we slide the pattern down less than the length of the pattern.
This leads to the bad-character rule. We can do a pre-processing for the
pattern. If the character set of the text is already known, we can nd all
characters which dont appear in the pattern string. During the later scan
process, as long as we nd such a bad character, we can immediately slide the
pattern down by its length. The question is what if the unmatched character
does appear in the pattern? While, in order not to miss any potential matches,
we have to slide down the pattern to check again. This is shown as in the gure
14.19
Its quite possible that the unmatched character appears in the pattern more
than one position. Denote the length of the pattern as |P|, the character appears
in positions p
1
, p
2
, ..., p
i
. In such case, we take the right most one to avoid
missing any matches.
s = |P| p
i
(14.46)
Note that the shifting length is 0 for the last position in the pattern according
to the above equation. Thus we can skip it in realization. Another important
598 CHAPTER 14. SEARCHING
i s s i m p l e ...
e x a m p l e
T
P
(a) The last character in the pattern e
doesnt match p. However, p appears
in the pattern.
i s s i m p l e ...
e x a m p l e
T
P
(b) We have to slide the pattern down by
2 to check again.
Figure 14.19: Slide the pattern if the unmatched character appears in the pat-
tern.
point is that since the shifting length is calculated against the position aligned
with the last character in the pattern string, (we deduce it from |P|), no matter
where the mismatching happens when we scan from right to the left, we slide
down the pattern string by looking up the bad character table with the one in
the text aligned with the last character of the pattern. This is shown in gure
14.20.
i s s i m p l e ...
e x a m p l e
T
P
(a)
i s s i m p l e ...
e x a m p l e
T
P
(b)
Figure 14.20: Even the mismatching happens in the middle, between char i
and a, we look up the shifting value with character e, which is 6 (calculated
from the rst e, the second e is skipped to avoid zero shifting).
There is a good result in practice, that only using the bad-character rule leads
to a simple and fast string matching algorithm, called Boyer-Moore-Horspool
algorithm [11].
1: procedure Boyer-Moore-Horspool(T, P)
2: for c do
3: [c] |P|
4: for i 1 to |P| 1 do Skip the last position
5: [P[i]] |P| i
6: s 0
7: while s +|P| |T| do
14.2. SEQUENCE SEARCH 599
8: i |P|
9: while i 1 P[i] = T[s +i] do scan from right
10: i i 1
11: if i < 1 then
12: found one solution at s
13: s s + 1 go on nding the next
14: else
15: s s +[T[s +|P|]]
The character set is denoted as , we rst initialize all the values of sliding
table as the length of the pattern string |P|. After that we process the pattern
from left to right, update the sliding value. If a character appears multiple times
in the pattern, the latter value, which is on the right hand, will overwrite the
previous value. We start the matching scan process by aligning the pattern
and the text string from the very left. However, for every alignment s, we scan
from the right to the left until either there is unmatched character or all the
characters in the pattern have been examined. The latter case indicates that
weve found a match; while for the former case, we look up to slide the pattern
down to the right.
The following example Python code implements this algorithm accordingly.
def bmh_match(w, p):
n = len(w)
m = len(p)
tab = [m for _ in range(256)] # table to hold the bad character rule.
for i in range(m-1):
tab[ord(p[i])] = m - 1 - i
res = []
offset = 0
while offset + m n:
i = m - 1
while i 0 and p[i] == w[offset+i]:
i = i - 1
if i < 0:
res.append(offset)
offset = offset + 1
else:
offset = offset + tab[ord(w[offset + m - 1])]
return res
The algorithm rstly takes about O(|| +|P|) time to build the sliding table.
If the character set size is small, the performance is dominated by the pattern
and the text. There is denitely the worst case that all the characters in the
pattern and text are same, e.g. searching aa...a (m of a, denoted as a
m
)
in text aa......a (n of a, denoted as a
n
). The performance in the worst case
is O(mn). This algorithm performs well if the pattern is long, and there are
constant number of matching. The result is bound to linear time. This is as
same as the best case of full Boyer-Moore algorithm which will be explained
next.
600 CHAPTER 14. SEARCHING
The good sux heuristics
Consider searching pattern abbabab in text bbbababbabab... like gure 14.21.
By using the bad-character rule, the pattern will be slided by two.
b b b a b a b b a b a b ...
a b b a b a b
X
T
P
(a)
b b b a b a b b a b a b ...
a b b a b a b
T
P
(b)
Figure 14.21: According to the bad-character rule, the pattern is slided by 2, so
that the next b is aligned.
Actually, we can do better than this. Observing that before the unmatched
point, we have already successfully matched 6 characters bbabab from right to
the left. Since ab, which is the prex of the pattern is also the sux of what
we matched so far, we can directly slide the pattern to align this sux as shown
in gure 14.22.
b b b a b a b b a b a b ...
a b b a b a b
T
P
Figure 14.22: As the prex ab is also the sux of what weve matched, we can
slide down the pattern to a position so that ab are aligned.
This is quite similar to the pre-processing of KMP algorithm, However, we
cant always skip so many characters. Consider the following example as shown
in gure 14.23. We have matched characters bab when the unmatch happens.
Although the prex ab of the pattern is also the sux of bab, we cant
slide the pattern so far. This is because bab appears somewhere else, which
starts from the 3rd character of the pattern. In order not to miss any potential
matching, we can only slide the pattern by two.
b a a b b a b a b ...
a b b a b a b
X
T
P
(a)
b a a b b a b a b ...
a b b a b a b
T
P
(b)
Figure 14.23: Weve matched bab, which appears somewhere else in the pattern
(from the 3rd to the 5th character). We can only slide down the pattern by 2
to avoid missing any potential matching.
14.2. SEQUENCE SEARCH 601
The above situation forms the two cases of the good-sux rule, as shown in
gure 14.24.
(a) Case 1, Only a part of the matching sux
occurs as a prex of the pattern.
(b) Case 2, The matching sux occurs some
where else in the pattern.
Figure 14.24: The light gray section in the text represents the characters have
been matched; The dark gray parts indicate the same content in the pattern.
Both cases in good sux rule handle the situation that there are multiple
characters have been matched from right. We can slide the pattern to the right
if any of the the following happens.
Case 1 states that if a part of the matching sux occurs as a prex of the
pattern, and the matching sux doesnt appear in any other places in the
pattern, we can slide the pattern to the right to make this prex aligned;
Case 2 states that if the matching sux occurs some where else in the pat-
tern, we can slide the pattern to make the right most occurrence aligned.
Note that in the scan process, we should apply case 2 rst whenever it is
possible, and then examine case 1 if the whole matched sux does not appears
in the pattern. Observe that both cases of the good-sux rule only depend on
the pattern string, a table can be built by pre-process the pattern for further
looking up.
For the sake of brevity, we denote the sux string from the i-th character
of P as P
i
. That P
i
is the sub-string P[i]P[i + 1]...P[m].
For case 1, we can check every sux of P, which includes P
m
, P
m1
, P
m2
,
..., P
2
to examine if it is the prex of P. This can be achieved by a round of
scan from right to the left.
For case 2, we can check every prex of P includes P
1
, P
2
, ..., P
m1
to
examine if the longest sux is also a sux of P. This can be achieved by
another round of scan from left to the right.
1: function Good-Suffix(P)
2: m |P|
3:
s
{0, 0, ..., 0} Initialize the table of length m
4: l 0 The last sux which is also prex of P
5: for i m1 down-to 1 do First loop for case 1
6: if P
i
P then means is prex of
7: l i
602 CHAPTER 14. SEARCHING
8:
s
[i] l
9: for i 1 to m do Second loop for case 2
10: s Suffix-Length(P
i
)
11: if s = 0 P[i s] = P[ms] then
12:
s
[ms] mi
13: return
s
This algorithm builds the good-sux heuristics table
s
. It rst checks every
sux of P from the shortest to the longest. If the sux P
i
is also the prex of
P, we record this sux, and use it for all the entries until we nd another sux
P
j
, j < i, and it is also the prex of P.
After that, the algorithm checks every prex of P from the shortest to the
longest. It calls the function Suffix-Length(P
i
), to calculate the length of the
longest sux of P
i
, which is also sux of P. If this length s isnt zero, which
means there exists a sub-string of s, that appears as the sux of the pattern.
It indicates that case 2 happens. The algorithm overwrites the s-th entry from
the right of the table
s
. Note that to avoid nding the same occurrence of the
matched sux, we test if P[i s] and P[ms] are same.
Function Suffix-Length is designed as the following.
1: function Suffix-Length(P
i
)
2: m |P|
3: j 0
4: while P[mj] = P[i j] j < i do
5: j j + 1
6: return j
The following Python example program implements the good-sux rule.
def good_suffix(p):
m = len(p)
tab = [0 for _ in range(m)]
last = 0
# first loop for case 1
for i in range(m-1, 0, -1): # m-1, m-2, .., 1
if is_prefix(p, i):
last = i
tab[i - 1] = last
# second loop for case 2
for i in range(m):
slen = suffix_len(p, i)
if slen != 0 and p[i - slen] != p[m - 1 - slen]:
tab[m - 1 - slen] = m - 1 - i
return tab
# test if p[i..m-1] is prefix of p
def is_prefix(p, i):
for j in range(len(p) - i):
if p[j] != p [i+j]:
return False
return True
# length of the longest suffix of p[..i], which is also a suffix of p
def suffix_len(p, i):
14.2. SEQUENCE SEARCH 603
m = len(p)
j = 0
while p[m - 1 - j] == p[i - j] and j < i:
j = j + 1
return j
Its quite possible that both the bad-character rule and the good-sux rule
can be applied when the unmatch happens. The Boyer-Moore algorithm com-
pares and picks the bigger shift so that it can nd the solution as quick as
possible. The bad-character rule table can be explicitly built as below
1: function Bad-Character(P)
2: for c do
3:
b
[c] |P|
4: for i 1 to |P| 1 do
5:
b
[P[i]] |P| i
6: return
b
The following Python program implements the bad-character rule accord-
ingly.
def bad_char(p):
m = len(p)
tab = [m for _ in range(256)]
for i in range(m-1):
tab[ord(p[i])] = m - 1 - i
return tab
The nal Boyer-Moore algorithm rstly builds the two rules from the pattern,
then aligns the pattern to the beginning of the text and scans from right to the
left for every alignment. If any unmatch happens, it tries both rules, and slides
the pattern with the bigger shift.
1: function Boyer-Moore(T, P)
2: n |T|, m |P|
3:
b
Bad-Character(P)
4:
s
Good-Suffix(P)
5: s 0
6: while s +m n do
7: i m
8: while i 1 P[i] = T[s +i] do
9: i i 1
10: if i < 1 then
11: found one solution at s
12: s s + 1 go on nding the next
13: else
14: s s +max(
b
[T[s +m]],
s
[i])
Here is the example implementation of Boyer-Moore algorithm in Python.
def bm_match(w, p):
n = len(w)
m = len(p)
tab1 = bad_char(p)
tab2 = good_suffix(p)
res = []
604 CHAPTER 14. SEARCHING
offset = 0
while offset + m n:
i = m - 1
while i 0 and p[i] == w[offset + i]:
i = i - 1
if i < 0:
res.append(offset)
offset = offset + 1
else:
offset = offset + max(tab1[ord(w[offset + m - 1])], tab2[i])
return res
The Boyer-Moore algorithm published in original paper is bound to O(n+m)
in worst case only if the pattern doesnt appear in the text [10]. Knuth, Morris,
and Pratt proved this fact in 1977 [12]. However, when the pattern appears in
the text, as we shown above, Boyer-Moore performs O(nm) in the worst case.
Richard birds shows a purely functional realization of Boyer-Moore algorithm
in chapter 16 in [1]. We skipped it in this book.
Exercise 14.2
Proof that Boyer-Moore majority vote algorithm is correct.
Given a list, nd the element occurs most. Are there any divide and
conqueror solutions? Are there any divide and conqueror data structures,
such as map can be used?
Bentley presents a divide and conquer algorithm to nd the maximum sum
in O(nlog n) time in [4]. The idea is to split the list at the middle point.
We can recursively nd the maximum sum in the rst half and second half;
However, we also need to nd maximum sum cross the middle point. The
method is to scan from the middle point to both ends as the following.
1: function Max-Sum(A)
2: if A = then
3: return 0
4: else if |A| = 1 then
5: return Max(0, A[1])
6: else
7: m
|A|
2

8: a Max-From(Reverse(A[1...m]))
9: b Max-From(A[m+ 1...|A|])
10: c Max-Sum(A[1...m])
11: d Max-Sum(A[m+ 1...|A|)
12: return Max(a +b, c, d)
13: function Max-From(A)
14: sum 0, m 0
15: for i 1 to |A| do
16: sum sum+A[i]
17: m Max(m, sum)
18: return m
14.3. SOLUTION SEARCHING 605
Its easy to deduce the time performance is T(n) = 2T(n/2) + O(n).
Implement this algorithm in your favorite programming language.
Explain why KMP algorithm perform in linear time even in the seemed
worst case.
Implement the purely functional KMP algorithm by using reversed P
p
to
avoid the linear time appending operation.
Deduce the state of the tree left(right(right(right(T)))) when searching
ananym in text anal.
14.3 Solution searching
One interesting thing that computer programming can oer is solving puzzles.
In the early phase of classic articial intelligent, people developed many methods
to search for solutions. Dierent from the sequence searching and string match-
ing, the solution doesnt obviously exist among a candidates set. It typically
need construct the solution while trying varies of attempts. Some problems are
solvable, while others are not. Among the solvable problems, not all of them
just have one unique solution. For example, a maze may have multiple ways
out. People sometimes need search for the best one.
14.3.1 DFS and BFS
DFS and BFS stand for deep-rst search and breadth-rst search. They are
typically introduced as graph algorithms in textbooks. Graph is a comprehen-
sive topic which is hard to be covered in this elementary book. In this section,
well show how to use DFS and BFS to solve some real puzzles without formal
introduction about the graph concept.
Maze
Maze is a classic and popular puzzle. Maze is amazing to both kids and adults.
Figure 14.25 shows an example maze. There are also real maze gardens can be
found in parks for fun. In the late 1990s, maze-solving games were quite often
hold in robot mouse competition all over the world.
Figure 14.25: A maze
606 CHAPTER 14. SEARCHING
There are multiple methods to solve maze puzzle. Well introduce an eec-
tive, yet not the best one in this section. There are some well known sayings
about how to nd the way out in maze, while not all of them are true.
For example, one method states that, wherever you have multiple ways,
always turn right. This doesnt work as shown in gure 14.26. The obvious
solution is rst to go along the top horizontal line, then turn right, and keep
going ahead at the T section. However, if we always turn right, well endless
loop around the inner big block.
Figure 14.26: It leads to loop way if always turns right.
This example tells us that the decision when there are multiple choices mat-
ters the solution. Like the fairy tale we read in our childhood, we can take some
bread crumbs in a maze. When there are multiple ways, we can simply select
one, left a piece of bread crumbs to mark this attempt. If we enter a died end,
we go back to the last place where weve made a decision by back-tracking the
bread crumbs. Then we can alter to another way.
At any time, if we nd there have been already bread crumbs left, it means
we have entered a loop, we must go back and try dierent ways. Repeat these
try-and-check steps, we can either nd the way out, or give the no solution
fact. In the later case, we back-track to the start point.
One easy way to describe a maze, is by a m n matrix, each element is
either 0 or 1, which indicates if there is a way at this cell. The maze illustrated
in gure 14.26 can be dened as the following matrix.
0 0 0 0 0 0
0 1 1 1 1 0
0 1 1 1 1 0
0 1 1 1 1 0
0 1 1 1 1 0
0 0 0 0 0 0
1 1 1 1 1 0
Given a start point s = (i, j), and a goal e = (p, q), we need nd all solutions,
that are the paths from s to e.
There is an obviously recursive exhaustive search method. That in order
to nd all paths from s to e, we can check all connected points to s, for every
such point k, we recursively nd all paths from k to e. This method can be
illustrated as the following.
Trivial case, if the start point s is as same as the target point e, we are
done;
14.3. SOLUTION SEARCHING 607
Otherwise, for every connected point k to s, recursively nd the paths
from k to e; If e can be reached via k, put section s-k in front of each path
between k and e.
However, we have to left bread crumbs to avoid repeatedly trying the same
attempts. This is because otherwise in the recursive case, we start from s, nd
a connected point k, then we further try to nd paths from k to e. Since s is
connected to k as well, so in the next recursion, well try to nd paths from s
to e again. It turns to be the very same origin problem, and we are trapped in
innite recursions.
Our solution is to initialize an empty list, use it to record all the points weve
visited so far. For every connected point, we look up the list to examine if it
has already been visited. We skip all the visited candidates and only try those
new ones. The corresponding algorithm can be dened like this.
solveMaze(m, s, e) = solve(s, {}) (14.47)
Where m is the matrix which denes a maze, s is the start point, and e is
the end point. Function solve is dened in the context of solveMaze, so that
the maze and the end point can be accessed. It can be realized recursively like
what we described above
7
.
solve(s, P) =
_
_
_
{{s} p|p P} : s = e
concat({ solve(s

, {{s} p|p P})|


s

adj(s), visited(s

)})
: otherwise
(14.48)
Note that P also serves as an accumulator. Every connected point is recorded
in all the possible paths to the current position. But they are stored in reversed
order, that is the newly visited point is put to the head of all the lists, and
the starting point is the last one. This is because the appending operation is
linear (O(n), where n is the number of elements stored in a list), while linking
to the head is just constant time. We can output the result in correct order by
reversing all possible solutions in equation (14.47)
8
:
solveMaze(m, s, e) = map(reverse, solve(s, {})) (14.49)
We need dene functions adj(p) and visited(p), which nds all the connected
points to p, and tests if point p has been visited respectively. Two points are
connected if and only if they are next cells horizontally or vertically in the maze
matrix, and both have zero value.
adj((x, y)) = {(x

, y

)| (x

, y

) {(x 1, y), (x + 1, y), (x, y 1), (x, y + 1)},


1 x

M, 1 y

N, m
x

y
= 0}
(14.50)
Where M and N are the widths and heights of the maze.
Function visited(p) examines if point p has been recorded in any lists in P.
visited(p, P) = path P, p path (14.51)
7
Function concat can atten a list of lists. For example. concat({{a, b, c}, {x, y, z}}) =
{a, b, c, x, y, z}. Refer to appendix A for detail.
8
the detailed denition of reverse can be found in the appendix A.
608 CHAPTER 14. SEARCHING
The following Haskell example code implements this algorithm.
solveMaze m from to = map reverse $ solve from [[]] where
solve p paths | p == to = map (p:) paths
| otherwise = concat [solve p (map (p:) paths) |
p adjacent p,
not $ visited p paths]
adjacent (x, y) = [(x, y) |
(x, y) [(x-1, y), (x+1, y), (x, y-1), (x, y+1)],
inRange (bounds m) (x, y),
m ! (x, y) == 0]
visited p paths = any (p elem) paths
For a maze dened as matrix like below example, all the solutions can be
given by this program.
mz = [[0, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 1],
[1, 0, 0, 0, 0, 0],
[1, 1, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 0]]
maze = listArray ((1,1), (6, 6)) concat
solveMaze (maze mz) (1,1) (6, 6)
As we mentioned, this is a style of exhaustive search. It recursively searches
all the connected points as candidates. In a real maze solving game, a robot
mouse competition for instance, its enough to just nd a route. We can adapt
to a method close to what described at the beginning of this section. The robot
mouse always tries the rst connected point, and skip the others until it gets
stuck. We need some data structure to store the bread crumbs, which help to
remember the decisions being made. As we always attempt to nd the way on
top of the latest decision, it is the last-in, rst-out manner. A stack can be used
to realize it.
At the very beginning, only the starting point s is stored in the stack. we
pop it out, nd, for example, points a, and b, are connected to s. We push the
two possible paths: {a, s} and {b, s} to the stack. Next we pop {a, s} out, and
examine connected points to a. Then all the paths with 3 steps will be pushed
back. We repeat this process. At anytime, each element stored in the stack is
a path, from the starting point to the farthest place can arrive in the reversed
order. This can be illustrated in gure 14.27.
The stack can be realized with a list. The latest option is picked from the
head, and the new candidates are also added to the head. The maze puzzle can
be solved by using such a list of paths:
solveMaze

(m, s, e) = reverse(solve

({{s}})) (14.52)
As we are searching the rst, but not all the solutions, map isnt used here.
When the stack is empty, it means that weve tried all the options and failed
to nd a way out. There is no solution; otherwise, the top option is popped,
expanded with all the adjacent points which havent been visited before, and
pushed back to the stack. Denote the stack as S, if it isnt empty, the top
14.3. SOLUTION SEARCHING 609
[s]
[a, s]
[b, s]
...
i
p
j k
[p, ... , s]
[q, ..., s]
...
[i, p, ... , s]
[j, p, ..., s]
[k, p, ..., s]
[q, ..., s]
...
Figure 14.27: The stack is initialized with a singleton list of the starting point
s. s is connected with point a and b. Paths {a, s} and {b, s} are pushed back.
In some step, the path ended with point p is popped. p is connected with points
i, j, and k. These 3 points are expanded as dierent options and pushed back
to the stack. The candidate path ended with q wont be examined unless all the
options above fail.
element is s
1
, and the new stack after the top being popped as S

. s
1
is a list
of points represents path P. Denote the rst point in this path as p
1
, and the
rest as P

. The solution can be formalized as the following.


solve

(S) =
_

_
: S =
s
1
: s
1
= e
solve

(S

) : C = {c|c adj(p
1
), c P

} =
solve

({{p} P|p C} S) : otherwise, C =


(14.53)
Where the adj function is dened above. This updated maze solution can
be implemented with the below example Haskell program
9
.
dfsSolve m from to = reverse $ solve [[from]] where
solve [] = []
solve (c@(p:path):cs)
| p == to = c -- stop at the first solution
| otherwise = let os = filter (notElem path) (adjacent p) in
if os == []
then solve cs
else solve ((map (:c) os) ++ cs)
Its quite easy to modify this algorithm to nd all solutions. When we nd
a path in the second clause, instead of returning it immediately, we record it
and go on checking the rest memorized options in the stack till until the stack
becomes empty. We left it as an exercise to the reader.
The same idea can also be realized imperatively. We maintain a stack to store
all possible paths from the starting point. In each iteration, the top option path
is popped, if the farthest position is the end point, a solution is found; otherwise,
all the adjacent, not visited yet points are appended as new paths and pushed
9
The same code of adjacent function is skipped
610 CHAPTER 14. SEARCHING
back to the stack. This is repeated till all the candidate paths in the stacks are
checked.
We use the same notation to represent the stack S. But the paths will
be stored as arrays instead of list in imperative settings as the former is more
eective. Because of this the starting point is the rst element in the path
array, while the farthest reached place is the right most element. We use p
n
to represent Last(P) for path P. The imperative algorithm can be given as
below.
1: function Solve-Maze(m, s, e)
2: S
3: Push(S, {s})
4: L the result list
5: while S = do
6: P Pop(S)
7: if e = p
n
then
8: Add(L, P)
9: else
10: for p Adjacent(m, p
n
) do
11: if p / P then
12: Push(S, P {p})
13: return L
The following example Python program implements this maze solving algo-
rithm.
def solve(m, src, dst):
stack = [[src]]
s = []
while stack != []:
path = stack.pop()
if path[-1] == dst:
s.append(path)
else:
for p in adjacent(m, path[-1]):
if not p in path:
stack.append(path + [p])
return s
def adjacent(m, p):
(x, y) = p
ds = [(0, 1), (0, -1), (1, 0), (-1, 0)]
ps = []
for (dx, dy) in ds:
x1 = x + dx
y1 = y + dy
if 0 x1 and x1 < len(m[0]) and
0 y1 and y1 < len(m) and m[y][x] == 0:
ps.append((x1, y1))
return ps
And the same maze example given above can be solved by this program like
the following.
mz = [[0, 0, 1, 0, 1, 1],
14.3. SOLUTION SEARCHING 611
[1, 0, 1, 0, 1, 1],
[1, 0, 0, 0, 0, 0],
[1, 1, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 0]]
solve(mz, (0, 0), (5,5))
It seems that in the worst case, there are 4 options (up, down, left, and right)
at each step, each option is pushed to the stack and eventually examined during
backtracking. Thus the complexity is bound to O(4
n
). The actual time wont
be so large because we ltered out the places which have been visited before. In
the worst case, all the reachable points are visited exactly once. So the time is
bound to O(n), where n is the number of points connected in total. As a stack
is used to store candidate solutions, the space complexity is O(n
2
).
Eight queens puzzle
The eight queens puzzle is also a famous problem. Although cheese has very
long history, this puzzle was rst published in 1848 by Max Bezzel[13]. Queen
in the cheese game is quite powerful. It can attack any other pieces in the same
row, column and diagonal at any distance. The puzzle is to nd a solution to
put 8 queens in the board, so that none of them attack each other. Figure
14.28 (a) illustrates the places can be attacked by a queen and 14.28 (b) shows
a solution of 8 queens puzzle.
(a) A queen piece. (b) An example solution
Figure 14.28: The eight queens puzzle.
Its obviously that the puzzle can be solved by brute-force, which takes P
8
64
times. This number is about 4 10
10
. It can be easily improved by observing
that, no two queens can be in the same row, and each queen must be put on one
column between 1 to 8. Thus we can represent the arrangement as a permutation
of {1, 2, 3, 4, 5, 6, 7, 8}. For instance, the arrangement {6, 2, 7, 1, 3, 5, 8, 4} means,
we put the rst queen at row 1, column 6, the second queen at row 2 column 2,
..., and the last queen at row 8, column 4. By this means, we need only examine
8! = 40320 possibilities.
We can nd better solutions than this. Similar to the maze puzzle, we put
queens one by one from the rst row. For the rst queen, there are 8 options,
that we can put it at one of the eight columns. Then for the next queen, we
612 CHAPTER 14. SEARCHING
again examine the 8 candidate columns. Some of them are not valid because
those positions will be attacked by the rst queen. We repeat this process, for
the i-th queen, we examine the 8 columns in row i, nd which columns are safe.
If none column is valid, it means all the columns in this row will be attacked
by some queen weve previously arranged, we have to backtrack as what we did
in the maze puzzle. When all the 8 queens are successfully put to the board,
we nd a solution. In order to nd all the possible solutions, we need record
it and go on to examine other candidate columns and perform back tracking if
necessary. This process terminates when all the columns in the rst row have
been examined. The below equation starts the search.
solve({}, ) (14.54)
In order to manage the candidate attempts, a stack S is used as same as
in the maze puzzle. The stack is initialized with one empty element. And a
list L is used to record all possible solutions. Denote the top element in the
stack as s
1
. Its actually an intermediate state of assignment, which is a partial
permutation of 1 to 8. after pops s
1
, the stack becomes S

. The solve function


can be dened as the following.
solve(S, L) =
_

_
L : S =
solve(S

, {s
1
} L) : |s
1
| = 8
solve(
_
_
_
{i} s
1
| i [1, 8],
i / s
1
,
safe(i, s
1
)
_
_
_
S

, L) : otherwise
(14.55)
If the stack is empty, all the possible candidates have been examined, its not
possible to backtrack any more. L has been accumulated all found solutions and
returned as the result; Otherwise, if the length of the top element in the stack
is 8, a valid solution is found. We add it to L, and go on nding other solutions;
If the length is less than 8, we need try to put the next queen. Among all the
columns from 1 to 8, we pick those not already occupied by previous queens
(through the i / s
1
clause), and must not be attacked in diagonal direction
(through the safe predication). The valid assignments will be pushed to the
stack for the further searching.
Function safe(x, C) detects if the assignment of a queen in position x will
be attacked by other queens in C in diagonal direction. There are 2 possible
cases, 45

and 135

directions. Since the row of this new queen is y = 1 +|C|,


where |C| is the length of C, the safe function can be dened as the following.
safe(x, C) = (c, r) zip(reverse(C), {1, 2, ...}), |x c| = |y r| (14.56)
Where zip takes two lists, and pairs every elements in them to a new
list. Thus If C = {c
i1
, c
i2
, ..., c
2
, c
1
} represents the column of the rst
i 1 queens has been assigned, the above function will check list of pairs
{(c
1
, 1), (c
2
, 2), ..., (c
i1
, i 1)} with position (x, y) forms any diagonal lines.
Translating this algorithm into Haskell gives the below example program.
solve = dfsSolve [[]] [] where
dfsSolve [] s = s
14.3. SOLUTION SEARCHING 613
dfsSolve (c:cs) s
| length c == 8 = dfsSolve cs (c:s)
| otherwise = dfsSolve ([(x:c) | x [1..8] \\ c,
not $ attack x c] ++ cs) s
attack x cs = let y = 1 + length cs in
any ((c, r) abs(x - c) == abs(y - r)) $
zip (reverse cs) [1..]
Observing that the algorithm is tail recursive, its easy to transform it into
imperative realization. Instead of using list, we use array to represent queens
assignment. Denote the stack as S, and the possible solutions as A. The
imperative algorithm can be described as the following.
1: function Solve-Queens
2: S {}
3: L The result list
4: while S = do
5: A Pop(S) A is an intermediate assignment
6: if |A| = 8 then
7: Add(L, A)
8: else
9: for i 1 to 8 do
10: if Valid(i, A) then
11: Push(S, A {i})
12: return L
The stack is initialized with the empty assignment. The main process re-
peatedly pops the top candidate from the stack. If there are still queens left, the
algorithm examines possible columns in the next row from 1 to 8. If a column
is safe, that it wont be attacked by any previous queens, this column will be
appended to the assignment, and pushed back to the stack. Dierent from the
functional approach, since array, but not list, is used, we neednt reverse the
solution assignment any more.
Function Valid checks if column x is safe with previous queens put in A.
It lters out the columns have already been occupied, and calculates if any
diagonal lines are formed with existing queens.
1: function Valid(x, A)
2: y 1 +|A|
3: for i 1 to |A| do
4: if x = i |y i| = |x A[i]| then
5: return False
6: return True
The following Python example program implements this imperative algo-
rithm.
def solve():
stack = [[]]
s = []
while stack != []:
a = stack.pop()
if len(a) == 8:
s.append(a)
else:
614 CHAPTER 14. SEARCHING
for i in range(1, 9):
if valid(i, a):
stack.append(a+[i])
return s
def valid(x, a):
y = len(a) + 1
for i in range(1, y):
if x == a[i-1] or abs(y - i) == abs(x - a[i-1]):
return False
return True
Although there are 8 optional columns for each queen, not all of them are
valid and thus further expanded. Only those columns havent been occupied by
previous queens are tried. The algorithm only examines 15720, which is far less
than 8
8
= 16777216, possibilities [13].
Its quite easy to extend the algorithm, so that it can solve N queens puzzle,
where N 4. However, the time cost increases fast. The backtrack algorithm
is just slightly better than the one permuting the sequence of 1 to 8 (which is
bound to o(N!)). Another extension to the algorithm is based on the fact that
the chess board is square, which is symmetric both vertically and horizontally.
Thus a solution can generate other solutions by rotating and ipping. These
aspects are left as exercises to the reader.
Peg puzzle
I once received a puzzle of the leap frogs. It said to be homework for 2nd grade
student in China. As illustrated in gure 14.29, there are 6 frogs in 7 stones.
Each frog can either hop to the next stone if it is not occupied, or leap over one
frog to another empty stone. The frogs on the left side can only move to the
right, while the ones on the right side can only move to the left. These rules are
described in gure 14.30
Figure 14.29: The leap frogs puzzle.
The goal of this puzzle is to arrange the frogs to jump according to the
rules, so that the positions of the 3 frogs on the left are nally exchange with
the ones on the right. If we denote the frog on the left as A, on the right as
B, and the empty stone as O. The puzzle is to nd a solution to transform
from AAAOBBB to BBBOAAA.
This puzzle is just a special form of the peg puzzles. The number of pegs is
not limited to 6. it can be 8 or other bigger even numbers. Figure 14.31 shows
some variants.
14.3. SOLUTION SEARCHING 615
(a) Jump to the next
stone
(b) Jump over to the
right
(c) Jump over to the left
Figure 14.30: Moving rules.
(a) Solitaire (b) Hop over (c) Draught board
Figure 14.31: Variants of the peg puzzles from
http://home.comcast.net/ stegmann/jumping.htm
We can solve this puzzle by programing. The idea is similar to the 8 queens
puzzle. Denote the positions from the left most stone as 1, 2, ..., 7. In ideal
cases, there are 4 options to arrange the move. For example when start, the frog
on 3rd stone can hop right to the empty stone; symmetrically, the frog on the
5th stone can hop left; Alternatively, the frog on the 2nd stone can leap right,
while the frog on the 6th stone can leap left.
We can record the state and try one of these 4 options at every step. Of
course not all of them are possible at any time. If get stuck, we can backtrack
and try other options.
As we restrict the left side frogs only moving to the right, and the right frogs
only moving to the left. The moves are not reversible. There wont be any
repetition cases as what we have to deal with in the maze puzzle. However, we
still need record the steps in order to print them out nally.
In order to enforce these restriction, let A, O, B in representation AAAOBBB
be -1, 0, and 1 respectively. A state L is a list of elements, each element is one
of these 3 values. It starts from {1, 1, 1, 0, 1, 1, 1}. L[i] access the i-th ele-
ment, its value indicates if the i-th stone is empty, occupied by a frog from left
side, or occupied by a frog from right side. Denote the position of the vacant
stone as p. The 4 moving options can be stated as below.
Leap left: p < 6 and L[p + 2] > 0, swap L[p] L[p + 2];
Hop left: p < 7 and L[p + 1] > 0, swap L[p] L[p + 1];
Leap right: p > 2 and L[p 2] < 0, swap L[p 2] L[p];
Hop right: p > 1 and L[p 1] < 0, swap L[p 1] L[p].
Four functions leap
l
(L), hop
l
(L), leap
r
(L) and hop
r
(L) are dened accord-
ingly. If the state L does not satisfy the move restriction, these function return
L unchanged, otherwise, the changed state L

is returned accordingly.
616 CHAPTER 14. SEARCHING
We can also explicitly maintain a stack S to the attempts as well as the
historic movements. The stack is initialized with a singleton list of starting
state. The solution is accumulated to a list M, which is empty at the beginning:
solve({{1, 1, 1, 0, 1, 1, 1}}, ) (14.57)
As far as the stack isnt empty, we pop one intermediate attempt. If the
latest state is equal to {1, 1, 1, 0, 1, 1, 1}, a solution is found. We append
the series of moves till this state to the result list M; otherwise, We expand to
next possible state by trying all four possible moves, and push them back to the
stack for further search. Denote the top element in the stack S as s
1
, and the
latest state in s
1
as L. The algorithm can be dened as the following.
solve(S, M) =
_
_
_
M : S =
solve(S

, {reverse(s
1
)} M) : L = {1, 1, 1, 0, 1, 1, 1}
solve(P S

, M) : otherwise
(14.58)
Where P are possible moves from the latest state L:
P = {L

|L

{leap
l
(L), hop
l
(L), leap
r
(L), hop
r
(L)}, L = L

}
Note that the starting state is stored as the last element, while the nal state
is the rst. That is the reason why we reverse it when adding to solution list.
Translating this algorithm to Haskell gives the following example program.
solve = dfsSolve [[[-1, -1, -1, 0, 1, 1, 1]]] [] where
dfsSolve [] s = s
dfsSolve (c:cs) s
| head c == [1, 1, 1, 0, -1, -1, -1] = dfsSolve cs (reverse c:s)
| otherwise = dfsSolve ((map (:c) $ moves $ head c) ++ cs) s
moves s = filter (/=s) [leapLeft s, hopLeft s, leapRight s, hopRight s] where
leapLeft [] = []
leapLeft (0:y:1:ys) = 1:y:0:ys
leapLeft (y:ys) = y:leapLeft ys
hopLeft [] = []
hopLeft (0:1:ys) = 1:0:ys
hopLeft (y:ys) = y:hopLeft ys
leapRight [] = []
leapRight (-1:y:0:ys) = 0:y:(-1):ys
leapRight (y:ys) = y:leapRight ys
hopRight [] = []
hopRight (-1:0:ys) = 0:(-1):ys
hopRight (y:ys) = y:hopRight ys
Running this program nds 2 symmetric solutions, each takes 15 steps. One
solution is list in the below table.
14.3. SOLUTION SEARCHING 617
step -1 -1 -1 0 1 1 1
1 -1 -1 0 -1 1 1 1
2 -1 -1 1 -1 0 1 1
3 -1 -1 1 -1 1 0 1
4 -1 -1 1 0 1 -1 1
5 -1 0 1 -1 1 -1 1
6 0 -1 1 -1 1 -1 1
7 1 -1 0 -1 1 -1 1
8 1 -1 1 -1 0 -1 1
9 1 -1 1 -1 1 -1 0
10 1 -1 1 -1 1 0 -1
11 1 -1 1 0 1 -1 -1
12 1 0 1 -1 1 -1 -1
13 1 1 0 -1 1 -1 -1
14 1 1 1 -1 0 -1 -1
15 1 1 1 0 -1 -1 -1
Observe that the algorithm is in tail recursive manner, it can also be realized
imperatively. The algorithm can be more generalized, so that it solve the puzzles
of n frogs on each side. We represent the start state {-1, -1, ..., -1, 0, 1, 1, ...,
1} as s, and the mirrored end state as e.
1: function Solve(s, e)
2: S {{s}}
3: M
4: while S = do
5: s
1
Pop(S)
6: if s
1
[1] = e then
7: Add(M, Reverse(s
1
))
8: else
9: for m Moves(s
1
[1]) do
10: Push(S, {m} s
1
)
11: return M
The possible moves can be also generalized with procedure Moves to han-
dle arbitrary number of frogs. The following Python program implements this
solution.
def solve(start, end):
stack = [[start]]
s = []
while stack != []:
c = stack.pop()
if c[0] == end:
s.append(reversed(c))
else:
for m in moves(c[0]):
stack.append([m]+c)
return s
def moves(s):
ms = []
n = len(s)
p = s.index(0)
618 CHAPTER 14. SEARCHING
if p < n - 2 and s[p+2] > 0:
ms.append(swap(s, p, p+2))
if p < n - 1 and s[p+1] > 0:
ms.append(swap(s, p, p+1))
if p > 1 and s[p-2] < 0:
ms.append(swap(s, p, p-2))
if p > 0 and s[p-1] < 0:
ms.append(swap(s, p, p-1))
return ms
def swap(s, i, j):
a = s[:]
(a[i], a[j]) = (a[j], a[i])
return a
For 3 frogs in each side, we know that it takes 15 steps to exchange them.
Its interesting to examine the table that how many steps are needed along with
the number of frogs in each side. Our program gives the following result.
number of frogs 1 2 3 4 5 ...
number of steps 3 8 15 24 35 ...
It seems that the number of steps are all square numbers minus one. Its
natural to guess that the number of steps for n frogs in one side is (n +1)
2
1.
Actually we can prove it is true.
Compare to the nal state and the start state, each frog moves ahead n +1
stones in its opposite direction. Thus total 2n frogs move 2n(n + 1) stones.
Another important fact is that each frog on the left has to meet every one on
the right one time. And leap will happen when meets. Since the frog moves
two stones ahead by leap, and there are total n
2
meets happen, so that all these
meets cause moving 2n
2
stones ahead. The rest moves are not leap, but hop.
The number of hops are 2n(n + 1) 2n
2
= 2n. Sum up all n
2
leaps and 2n
hops, the total number of steps are n
2
+ 2n = (n + 1)
2
1.
Summary of DFS
Observe the above three puzzles, although they vary in many aspects, their
solutions show quite similar common structures. They all have some starting
state. The maze starts from the entrance point; The 8 queens puzzle starts
from the empty board; The leap frogs start from the state of AAAOBBB. The
solution is a kind of searching, at each attempt, there are several possible ways.
For the maze puzzle, there are four dierent directions to try; For the 8 queens
puzzle, there are eight columns to choose; For the leap frogs puzzle, there are
four movements of leap or hop. We dont know how far we can go when make a
decision, although the nal state is clear. For the maze, its the exit point; For
the 8 queens puzzle, we are done when all the 8 queens being assigned on the
board; For the leap frogs puzzle, the nal state is that all frogs exchanged.
We use a common approach to solve them. We repeatedly select one possible
candidate to try, record where weve achieved; If we get stuck, we backtrack
and try other options. We are sure by using this strategy, we can either nd a
solution, or tell that the problem is unsolvable.
Of course there can be some variation, that we can stop when nd one
answer, or go on searching all the solutions.
14.3. SOLUTION SEARCHING 619
If we draw a tree rooted at the starting state, expand it so that every branch
stands for a dierent attempt, our searching process is in a manner, that it
searches deeper and deeper. We wont consider any other options in the same
depth unless the searching fails so that weve to backtrack to upper level of
the tree. Figure 14.32 illustrates the order we search a state tree. The arrow
indicates how we go down and backtrack up. The number of the nodes shows
the order we visit them.
Figure 14.32: Example of DFS search order.
This kind of search strategy is called DFS (Deep-rst-search). We widely
use it unintentionally. Some programming environments, Prolog for instance,
adopt DFS as the default evaluation model. A maze is given by a set of rule
base, such as:
c(a, b). c(a, e).
c(b, c). c(b, f).
c(e, d), c(e, f).
c(f, c).
c(g, d). c(g, h).
c(h, f).
Where predicate c(X, Y ) means place X is connected with Y . Note that
this is a directed predicate, we can make Y to be connected with X as well by
either adding a symmetric rule, or create a undirected predicate. Figure 14.33
shows such a directed graph. Given two places X and Y , Prolog can tell if they
are connected by the following program.
go(X, X).
go(X, Y) :- c(X, Z), go(Z, Y)
This program says that, a place is connected with itself. Given two dierent
places X and Y , if X is connected with Z, and Z is connected with Y , then X
is connected with Y . Note that, there might be multiple choices for Z. Prolog
selects a candidate, and go on further searching. It only tries other candidates
if the recursive searching fails. In that case, Prolog backtracks and tries other
alternatives. This is exactly what DFS does.
DFS is quite straightforward when we only need a solution, but dont care
if the solution takes the fewest steps. For example, the solution it gives, may
not be the shortest path for the maze. Well see some more puzzles next. They
demands to nd the solution with the minimum attempts.
620 CHAPTER 14. SEARCHING
a
b e
c
f d
g
h
Figure 14.33: A directed graph.
The wolf, goat, and cabbage puzzle
This puzzle says that a farmer wants to cross a river with a wolf, a goat, and a
bucket of cabbage. There is a boat. Only the farmer can drive it. But the boat
is small. it can only hold one of the wolf, the goat, and the bucket of cabbage
with the farmer at a time. The farmer has to pick them one by one to the other
side of the river. However, the wolf would eat the goat, and the goat would eat
the cabbage if the farmer is absent. The puzzle asks to nd the fast solution so
that they can all safely go cross the river.
Figure 14.34: The wolf, goat, cabbage puzzle
The key point to this puzzle is that the wolf does not eat the cabbage. The
farmer can safely pick the goat to the other side. But next time, no matter if he
pick the wolf or the cabbage to cross the river, he has to take one back to avoid
the conict. In order to nd the fast the solution, at any time, if the farmer has
multiple options, we can examine all of them in parallel, so that these dierent
decisions compete. If we count the number of the times the farmer cross the
river without considering the direction, that crossing the river back and forth
means 2 times, we are actually checking the complete possibilities after 1 time,
14.3. SOLUTION SEARCHING 621
2 times, 3 times, ... When we nd a situation, that they all arrive at the other
bank, we are done. And this solution wins the competition, which is the fast
solution.
The problem is that we cant examine all the possible solutions in parallel
ideally. Even with a super computer equipped with many CPU cores, the setup
is too expensive to solve such a simple puzzle.
Lets consider a lucky draw game. People blindly pick from a box with
colored balls. There is only one black ball, all the others are white. The one
who pick the black ball wins the game; Otherwise, he must return the ball to
the box and wait for the next chance. In order to be fair enough, we can setup
a rule that no one can try the second time before all others have tried. We can
line people to a queue. Every time the rst guy pick a ball, if he does not win,
he then stands at the tail of the queue to wait for the second try. This queue
helps to ensure our rule.
Figure 14.35: A lucky-draw game, the i-th person goes from the queue, pick a
ball, then join the queue at tail if he fails to pick the black ball.
We can use the quite same idea to solve our puzzle. The two banks of the
river can be represented as two sets A and B. A contains the wolf, the goat,
the cabbage and the farmer; while B is empty. We take an element from one
set to the other each time. The two sets cant hold conict things if the farmer
is absent. The goal is to exchange the contents of A and B with fewest steps.
We initialize a queue with state A = {w, g, c, p}, B = as the only element.
As far as the queue isnt empty, we pick the rst element from the head, expand
it with all possible options, and put these new expanded candidates to the tail
of the queue. If the rst element on the head is the nal goal, that A = , B =
{w, g, c, p}, we are done. Figure 14.36 illustrates the idea of this search order.
Note that as all possibilities in the same level are examined, there is no need
for back-tracking.
622 CHAPTER 14. SEARCHING
Figure 14.36: Start from state 1, check all possible options 2, 3, and 4 for next
step; then all nodes in level 3, ...
There is a simple way to treat the set. A four bits binary number can be
used, each bit stands for a thing, for example, the wolf w = 1, the goat g = 2,
the cabbage c = 4, and the farmer p = 8. That 0 stands for the empty set, 15
stands for a full set. Value 3, solely means there are a wolf and a goat on the
river bank. In this case, the wolf will eat the goat. Similarly, value 6 stands for
another conicting case. Every time, we move the highest bit (which is 8), or
together with one of the other bits (4 or 2, or 1) from one number to the other.
The possible moves can be dened as below.
mv(A, B) =
_
{(A8 i, B + 8 +i)|i {0, 1, 2, 4}, i = 0 Ai = 0} : B < 8
{(A+ 8 +i, B 8 i)|i {0, 1, 2, 4}, i = 0 Bi = 0} : Otherwise
(14.59)
Where is the bitwise-and operation.
the solution can be given by reusing the queue dened in previous chapter.
Denote the queue as Q, which is initialed with a singleton list {(15, 0)}. If Q is
not empty, function DeQ(Q) extracts the head element M, the updated queue
becomes Q

. M is a list of pairs, stands for a series of movements between the


river banks. The rst element in m
1
= (A

, B

) is the latest state. Function


EnQ

(Q, L) is a slightly dierent enqueue operation. It pushes all the possible


moving sequences in L to the tail of the queue one by one and returns the
updated queue. With these notations, the solution function is dened like below.
solve(Q) =
_

_
: Q =
reverse(M) : A

= 0
solve(EnQ

(Q

,
_
{m} M| m mv(m
1
),
valid(m, M)
_
)) : otherwise
(14.60)
Where function valid(m, M) checks if the new moving candidate m = (A

, B

)
is valid. That neither A

nor B

is 3 or 6, and m hasnt been tried before in


M to avoid any repeatedly attempts.
valid(m, M) = A

= 3, A

= 6, B

= 3, B

= 6, m / M (14.61)
The following example Haskell program implements this solution. Note that
it uses a plain list to represent the queue for illustration purpose.
14.3. SOLUTION SEARCHING 623
import Data.Bits
solve = bfsSolve [[(15, 0)]] where
bfsSolve :: [[(Int, Int)]] [(Int, Int)]
bfsSolve [] = [] -- no solution
bfsSolve (c:cs) | (fst $ head c) == 0 = reverse c
| otherwise = bfsSolve (cs ++ map (:c)
(filter (valid c) $ moves $ head c))
valid (a, b) r = not $ or [ a elem [3, 6], b elem [3, 6],
(a, b) elem r]
moves (a, b) = if b < 8 then trans a b else map swap (trans b a) where
trans x y = [(x - 8 - i, y + 8 + i)
| i [0, 1, 2, 4], i == 0 | | (x &. i) /= 0]
swap (x, y) = (y, x)
This algorithm can be easily modied to nd all the possible solutions, but
not just stop after nding the rst one. This is left as the exercise to the reader.
The following shows the two best solutions to this puzzle.
Solution 1:
Left river Right
wolf, goat, cabbage, farmer
wolf, cabbage goat, farmer
wolf, cabbage, farmer goat
cabbage wolf, goat, farmer
goat, cabbage, farmer wolf
goat wolf, cabbage, farmer
goat, farmer wolf, cabbage
wolf, goat, cabbage, farmer
Solution 2:
Left river Right
wolf, goat, cabbage, farmer
wolf, cabbage goat, farmer
wolf, cabbage, farmer goat
wolf goat, cabbage, farmer
wolf, goat, farmer cabbage
goat wolf, cabbage, farmer
goat, farmer wolf, cabbage
wolf, goat, cabbage, farmer
This algorithm can also be realized imperatively. Observing that our solution
is in tail recursive manner, we can translate it directly to a loop. We use a list
S to hold all the solutions can be found. The singleton list {(15, 0)} is pushed
to queue when initializing. As long as the queue isnt empty, we extract the
head C from the queue by calling DeQ procedure. Examine if it reaches the
nal goal, if not, we expand all the possible moves and push to the tail of the
queue for further searching.
1: function Solve
2: S
3: Q
4: EnQ(Q, {(15, 0)})
5: while Q = do
6: C DeQ(Q)
624 CHAPTER 14. SEARCHING
7: if c
1
= (0, 15) then
8: Add(S, Reverse(C))
9: else
10: for m Moves(C) do
11: if Valid(m, C) then
12: EnQ(Q, {m} C)
13: return S
Where Moves, and Valid procedures are as same as before. The following
Python example program implements this imperative algorithm.
def solve():
s = []
queue = [[(0xf, 0)]]
while queue != []:
cur = queue.pop(0)
if cur[0] == (0, 0xf):
s.append(reverse(cur))
else:
for m in moves(cur):
queue.append([m]+cur)
return s
def moves(s):
(a, b) = s[0]
return valid(s, trans(a, b) if b < 8 else swaps(trans(b, a)))
def valid(s, mv):
return [(a, b) for (a, b) in mv
if a not in [3, 6] and b not in [3, 6] and (a, b) not in s]
def trans(a, b):
masks = [ 8 | (1<<i) for i in range(4)]
return [(a ^ mask, b | mask) for mask in masks if a & mask == mask]
def swaps(s):
return [(b, a) for (a, b) in s]
There is a minor dierence between the program and the pseudo code, that
the function to generate candidate moving options lters the invalid cases inside
it.
Every time, no matter the farmer drives the boat back and forth, there are
m options for him to choose, where m is the number of objects on the river bank
the farmer drives from. m is always less than 4, that the algorithm wont take
more than n
4
times at step n. This estimation is far more than the actual time,
because we avoid trying all invalid cases. Our solution examines all the possible
moving in the worst case. Because we check recorded steps to avoid repeated
attempt, the algorithm takes about O(n
2
) time to search for n possible steps.
Water jugs puzzle
This is a popular puzzle in classic AI. The history of it should be very long. It
says that there are two jugs, one is 9 quarts, the other is 4 quarts. How to use
them to bring up from the river exactly 6 quarts of water?
14.3. SOLUTION SEARCHING 625
There are varies versions of this puzzle, although the volume of the jugs, and
the target volume of water dier. The solver is said to be young Blaise Pascal
when he was a child, the French mathematician, scientist in one story, and
Sim`eon Denis Poisson in another story. Later in the popular Hollywood movie
Die-Hard 3, actor Bruce Willis and Samuel L. Jackson were also confronted
with this puzzle.
P`olya gave a nice way to solve this problem backwards in [14].
Figure 14.37: Two jugs with volume of 9 and 4.
Instead of thinking from the starting state as shown in gure 14.37. P`olya
pointed out that there will be 6 quarts of water in the bigger jugs at the nal
stage, which indicates the second last step, we can ll the 9 quarts jug, then
pour out 3 quarts from it. In order to achieve this, there should be 1 quart of
water left in the smaller jug as shown in gure 14.38.
Figure 14.38: The last two steps.
Its easy to see that ll the 9 quarters jug, then pour to the 4 quarters jug
twice can bring 1 quarters of water. As shown in gure 14.39. At this stage,
weve found the solution. By reversing our ndings, we can give the correct
steps to bring exactly 6 quarters of water.
P`olyas methodology is general. Its still hard to solve it without concrete
algorithm. For instance, how to bring up 2 gallons of water from 899 and 1147
gallon jugs?
There are 6 ways to deal with 2 jugs in total. Denote the smaller jug as A,
the bigger jug as B.
626 CHAPTER 14. SEARCHING
Figure 14.39: Fill the bigger jugs, and pour to the smaller one twice.
Fill jug A from the river;
Fill jug B from the river;
Empty jug A;
Empty jug B;
Pour water from jug A to B;
Pour water from jug B to A.
The following sequence shows an example. Note that in this example, we
assume that a < b < 2a.
A B operation
0 0 start
a 0 ll A
0 a pour A into B
a a ll A
2a - b b pour A into B
2a - b 0 empty B
0 2a - b pour A into B
a 2a - b ll A
3a - 2b b pour A into B
... ... ...
No matter what the above operations are taken, the amount of water in each
jug can be expressed as xa + yb, where a and b are volumes of jugs, for some
integers x and y. All the amounts of water we can get are linear combination of
a and b. We can immediately tell given two jugs, if a goal g is solvable or not.
For instance, we cant bring 5 gallons of water with two jugs of volume 4
and 6 gallon. The number theory ensures that, the 2 water jugs puzzle can be
solved if and only if g can be divided by the greatest common divisor of a and
b. Written as:
gcd(a, b)|g (14.62)
Where m|n means n can be divided by m. Whats more, if a and b are rel-
atively prime, which means gcd(a, b) = 1, its possible to bring up any quantity
g of water.
14.3. SOLUTION SEARCHING 627
Although gcd(a, b) enables us to determine if the puzzle is solvable, it doesnt
give us the detailed pour sequence. If we can nd some integer x and y, so that
g = xa +yb. We can arrange a sequence of operations (even it may not be the
best solution) to solve it. The idea is that, without loss of generality, suppose
x > 0, y < 0, we need ll jug A by x times, and empty jug B by y times in total.
Lets take a = 3, b = 5, and g = 4 for example, since 4 = 3 3 5, we can
arrange a sequence like the following.
A B operation
0 0 start
3 0 ll A
0 3 pour A into B
3 3 ll A
1 5 pour A into B
1 0 empty B
0 1 pour A into B
3 1 ll A
0 4 pour A into B
In this sequence, we ll A by 3 times, and empty B by 1 time. The procedure
can be described as the following:
Repeat x times:
1. Fill jug A;
2. Pour jug A into jug B, whenever B is full, empty it.
So the only problem left is to nd the x and y. There is a powerful tool
in number theory called, Extended Euclid algorithm, which can achieve this.
Compare to the classic Euclid GCD algorithm, which can only give the greatest
common divisor, The extended Euclid algorithm can give a pair of x, y as well,
so that:
(d, x, y) = gcd
ext
(a, b) (14.63)
where d = gcd(a, b) and ax + by = d. Without loss of generality, suppose
a < b, there exits quotation q and remainder r that:
b = aq +r (14.64)
Since d is the common divisor, it can divide both a and b, thus d can divide
r as well. Because r is less than a, we can scale down the problem by nding
GCD of a and r:
(d, x

, y

) = gcd
ext
(r, a) (14.65)
Where d = x

r + y

a according to the denition of the extended Euclid


algorithm. Transform b = aq + r to r = b aq, substitute r in above equation
yields:
d = x

(b aq) +y

a
= (y

q)a +x

b
(14.66)
628 CHAPTER 14. SEARCHING
This is the linear combination of a and b, so that we have:
_
x = y

b
a
y = x

(14.67)
Note that this is a typical recursive relationship. The edge case happens
when a = 0.
gcd(0, b) = b = 0a + 1b (14.68)
Summarize the above result, the extended Euclid algorithm can be dened
as the following:
gcd
ext
(a, b) =
_
(b, 0, 1) : a = 0
(d, y

b
a
, x

) : otherwise
(14.69)
Where d, x

, y

are dened in equation (14.65).


The 2 water jugs puzzle is almost solved, but there are still two detailed
problems need to be tackled. First, extended Euclid algorithm gives the linear
combination for the greatest common divisor d. While the target volume of
water g isnt necessarily equal to d. This can be easily solved by multiplying x
and y by m times, where m = g/gcd(a, b); Second, we assume x > 0, to form a
procedure to ll jug A with x times. However, the extended Euclid algorithm
doesnt ensure x to be positive. For instance gcd
ext
(4, 9) = (1, 2, 1). Whenever
we get a negative x, since d = xa + yb, we can continuously add b to x, and
decrease y by a till a is greater than zero.
At this stage, we are able to give the complete solution to the 2 water jugs
puzzle. Below is an example Haskell program.
extGcd 0 b = (b, 0, 1)
extGcd a b = let (d, x, y) = extGcd (b mod a) a in
(d, y - x (b div a), x)
solve a b g | g mod d /= 0 = [] -- no solution
| otherwise = solve (x g div d)
where
(d, x, y) = extGcd a b
solve x | x < 0 = solve (x + b)
| otherwise = pour x [(0, 0)]
pour 0 ps = reverse ((0, g):ps)
pour x ps@((a, b):_) | a == 0 = pour (x - 1) ((a, b):ps) -- fill a
| b == b = pour x ((a, 0):ps) -- empty b
| otherwise = pour x ((max 0 (a + b - b),
min (a + b) b):ps)
Although we can solve the 2 water jugs puzzle with extended Euclid al-
gorithm, the solution may not be the best. For instance, when we are going
to bring 4 gallons of water from 3 and 5 gallons jugs. The extended Euclid
algorithm produces the following sequence:
[(0,0),(3,0),(0,3),(3,3),(1,5),(1,0),(0,1),(3,1),
(0,4),(3,4),(2,5),(2,0),(0,2),(3,2),(0,5),(3,5),
(3,0),(0,3),(3,3),(1,5),(1,0),(0,1),(3,1),(0,4)]
14.3. SOLUTION SEARCHING 629
It takes 23 steps to achieve the goal, while the best solution only need 6
steps:
[(0,0),(0,5),(3,2),(0,2),(2,0),(2,5),(3,4)]
Observe the 23 steps, and we can nd that jug B has already contained 4
gallons of water at the 8-th step. But the algorithm ignores this fact and goes
on executing the left 15 steps. The reason is that the linear combination x and y
we nd with the extended Euclid algorithm are not the only numbers satisfying
g = xa + by. For all these numbers, the smaller |x| + |y|, the less steps are
needed. There is an exercise to addressing this problem in this section.
The interesting problem is how to nd the best solution? We have two
approaches, one is to nd x and y to minimize |x| + |y|; the other is to adopt
the quite similar idea as the wolf-goat-cabbage puzzle. We focus on the latter
in this section. Since there are at most 6 possible options: ll A, ll B, pour
A into B, pour B into A, empty A and empty B, we can try them in parallel,
and check which decision can lead to the best solution. We need record all the
states weve achieved to avoid any potential repetition. In order to realize this
parallel approach with reasonable resources, a queue can be used to arrange our
attempts. The elements stored in this queue are series of pairs (p, q), where p
and q represent the volume of waters contained in each jug. These pairs record
the sequence of our operations from the beginning to the latest. We initialize
the queue with the singleton list contains the starting state {(0, 0)}.
solve(a, b, g) = solve

{{(0, 0)}} (14.70)


Every time, when the queue isnt empty, we pick a sequence from the head
of the queue. If this sequence ends with a pair contains the target volume g, we
nd a solution, we can print this sequence by reversing it; Otherwise, we expand
the latest pair by trying all the possible 6 options, remove any duplicated states,
and add them to the tail of the queue. Denote the queue as Q, the rst sequence
stored on the head of the queue as S, the latest pair in S as (p, q), and the rest
of pairs as S

. After popping the head element, the queue becomes Q

. This
algorithm can be dened like below:
solve

(Q) =
_
_
_
: Q =
reverse(S) : p = g q = g
solve

(EnQ

(Q

, {{s

} S

|s

try(S)})) : otherwise
(14.71)
Where function EnQ

pushes a list of sequence to the queue one by one.


Function try(S) will try all possible 6 options to generate new pairs of water
volumes:
try(S) = {s

|s


_
_
_
fillA(p, q), fillB(p, q),
pourA(p, q), pourB(p, q),
emptyA(p, q), emptyB(p, q)
_
_
_
, s

/ S

} (14.72)
Its intuitive to dene the 6 options. For ll operations, the result is that the
volume of the lled jug is full; for empty operation, the result volume is empty;
630 CHAPTER 14. SEARCHING
for pour operation, we need test if the jug is big enough to hold all the water.
fillA(p, q) = (a, q) fillB(p, q) = (p, b)
emptyA(p, q) = (0, q) emptyB(p, q) = (p, 0)
pourA(p, q) = (max(0, p +q b), min(x +y, b))
pourB(p, q) = (min(x +y, a), max(0, x +y a))
(14.73)
The following example Haskell program implements this method:
solve a b g = bfs [[(0, 0)]] where
bfs [] = []
bfs (c:cs) | fst (head c) == g | | snd (head c) == g = reverse c
| otherwise = bfs (cs ++ map (:c) (expand c))
expand ((x, y):ps) = filter (notElem ps) $ map (f f x y)
[fillA, fillB, pourA, pourB, emptyA, emptyB]
fillA _ y = (a, y)
fillB x _ = (x, b)
emptyA _ y = (0, y)
emptyB x _ = (x, 0)
pourA x y = (max 0 (x + y - b), min (x + y) b)
pourB x y = (min (x + y) a, max 0 (x + y - a))
This method always returns the fast solution. It can also be realized in
imperative approach. Instead of storing the complete sequence of operations in
every element in the queue, we can store the unique state in a global history
list, and use links to track the operation sequence, this can save spaces.
(0, 0)
(3, 0)
fill A
(0, 5)
flll B
(3, 5)
fill B
(0, 0)
empty A
(0, 3)
pour A
(3, 5)
fill A
(0, 0)
empty B
(3, 2)
pour B
...
(0, 0)
(3, 0)
(0, 5)
(3, 5)
(0, 3)
(3, 2)
...
Figure 14.40: All attempted states are stored in a global list.
The idea is illustrated in gure 14.40. The initial state is (0, 0). Only ll
A and ll B are possible. They are tried and added to the record list; Next
we can try and record ll B on top of (3, 0), which yields new state (3, 5).
14.3. SOLUTION SEARCHING 631
However, when try empty A from state (3, 0), we would return to the start
state (0, 0). As this previous state has been recorded, it is ignored. All the
repeated states are in gray color in this gure.
With such settings, we neednt remember the operation sequence in each
element in the queue explicitly. We can add a parent link to each node in
gure 14.40, and use it to back-traverse to the starting point from any state.
The following example ANSI C code shows such a denition.
struct Step {
int p, q;
struct Step parent;
};
struct Step make_step(int p, int q, struct Step parent) {
struct Step s = (struct Step) malloc(sizeof(struct Step));
sp = p;
sq = q;
sparent = parent;
return s;
}
Where p, q are volumes of water in the 2 jugs. For any state s, dene
functions p(s) and q(s) return these 2 values, the imperative algorithm can be
realized based on this idea as below.
1: function Solve(a, b, g)
2: Q
3: Push-and-record(Q, (0, 0))
4: while Q = do
5: s Pop(Q)
6: if p(s) = g q(s) = g then
7: return s
8: else
9: C Expand(s)
10: for c C do
11: if c = s Visited(c) then
12: Push-and-record(Q, c)
13: return NIL
Where Push-and-record does not only push an element to the queue, but
also record this element as visited, so that we can check if an element has been
visited before in the future. This can be implemented with a list. All push
operations append the new elements to the tail. For pop operation, instead of
removing the element pointed by head, the head pointer only advances to the
next one. This list contains historic data which has to be reset explicitly. The
following ANSI C code illustrates this idea.
struct Step steps[1000], head, tail = steps;
void push(struct Step s) { tail++ = s; }
struct Step pop() { return head++; }
int empty() { return head == tail; }
632 CHAPTER 14. SEARCHING
void reset() {
struct Step p;
for (p = steps; p != tail; ++p)
free(p);
head = tail = steps;
}
In order to test a state has been visited, we can traverse the list to compare
p and q.
int eq(struct Step a, struct Step b) {
return ap == bp && aq == bq;
}
int visited(struct Step s) {
struct Step p;
for (p = steps; p != tail; ++p)
if (eq(p, s)) return 1;
return 0;
}
The main program can be implemented as below:
struct Step solve(int a, int b, int g) {
int i;
struct Step cur, cs[6];
reset();
push(make_step(0, 0, NULL));
while (!empty()) {
cur = pop();
if (curp == g | | curq == g)
return cur;
else {
expand(cur, a, b, cs);
for (i = 0; i < 6; ++i)
if(!eq(cur, cs[i]) && !visited(cs[i]))
push(cs[i]);
}
}
return NULL;
}
Where function expand tries all the 6 possible options:
void expand(struct Step s, int a, int b, struct Step cs) {
int p = sp, q = sq;
cs[0] = make_step(a, q, s); /fill A/
cs[1] = make_step(p, b, s); /fill B/
cs[2] = make_step(0, q, s); /empty A/
cs[3] = make_step(p, 0, s); /empty B/
cs[4] = make_step(max(0, p + q - b), min(p + q, b), s); /pour A/
cs[5] = make_step(min(p + q, a), max(0, p + q - a), s); /pour B/
}
And the result steps is back tracked in reversed order, it can be output with
a recursive function:
14.3. SOLUTION SEARCHING 633
void print(struct Step s) {
if (s) {
print(sparent);
printf("%d,%dn", sp, sq);
}
}
Kloski
Kloski is a block sliding puzzle. It appears in many countries. There are dierent
sizes and layouts. Figure 14.41 illustrates a traditional Kloski game in China.
(a) Initial layout of blocks (b) Block layout after several
movements
Figure 14.41: Huarong Dao, the traditional Kloski game in China.
In this puzzle, there are 10 blocks, each is labeled with text or icon. The
smallest block has size of 1 unit square, the biggest one is 2 2 units size. Note
there is a slot of 2 units wide at the middle-bottom of the board. The biggest
block represents a king in ancient time, while the others are enemies. The goal
is to move the biggest block to the slot, so that the king can escape. This game
is named as Huarong Dao, or Huarong Escape in China. Figure 14.42 shows
the similar Kloski puzzle in Japan. The biggest block means daughter, while
the others are her family members. This game is named as Daughter in the
box in Japan (Japanese name: hakoiri musume).
Figure 14.42: Daughter in the box, the Kloski game in Japan.
634 CHAPTER 14. SEARCHING
In this section, we want to nd a solution, which can slide blocks from the
initial state to the nal state with the minimum movements.
The intuitive idea to model this puzzle is to use a 5 4 matrix representing
the board. All pieces are labeled with a number. The following matrix M, for
example, shows the initial state of the puzzle.
M =
_

_
1 10 10 2
1 10 10 2
3 4 4 5
3 7 8 5
6 0 0 9
_

_
In this matrix, the cells of value i mean the i-th piece covers this cell. The
special value 0 represents a free cell. By using sequence 1, 2, ... to identify
pieces, a special layout can be further simplied as an array L. Each element
is a list of cells covered by the piece indexed with this element. For example,
L[4] = {(3, 2), (3, 3)} means the 4-th piece covers cells at position (3, 2) and
(3, 3), where (i, j) means the cell at row i and column j.
The starting layout can be written as the following Array.
{{(1, 1), (2, 1)}, {(1, 4), (2, 4)}, {(3, 1), (4, 1)}, {(3, 2), (3, 3)}, {(3, 4), (4, 4)},
{(5, 1)}, {(4, 2)}, {(4, 3)}, {(5, 4)}, {(1, 2), (1, 3), (2, 2), (2, 3)}}
When moving the Kloski blocks, we need examine all the 10 blocks, checking
each block if it can move up, down, left and right. it seems that this approach
would lead to a very huge amount of possibilities, because each step might have
10 4 options, there will be about 40
n
cases in the n-th step.
Actually, there wont be so much options. For example, in the rst step,
there are only 4 valid moving: the 6-th piece moves right; the 7-th and 8-th
move down; and the 9-th moves left.
All others are invalid moving. Figure 14.43 shows how to test if the moving
is possible.
The left example illustrates sliding block labeled with 1 down. There are two
cells covered by this block. The upper 1 moves to the cell previously occupied
by this same block, which is also labeled with 1; The lower 1 moves to a free
cell, which is labeled with 0;
The right example, on the other hand, illustrates invalid sliding. In this case,
the upper cells could move to the cell occupied by the same block. However, the
lower cell labeled with 1 cant move to the cell occupied by other block, which
is labeled with 2.
In order to test the valid moving, we need examine all the cells a block
will cover. If they are labeled with 0 or a number as same as this block, the
moving is valid. Otherwise it conicts with some other block. For a layout L,
the corresponding matrix is M, suppose we want to move the k-th block with
(x, y), where |x| 1, |y| 1. The following equation tells if the moving
is valid:
valid(L, k, x, y) :
(i, j) L[k] i

= i + y, j

= j + x,
(1, 1) (i

, j

) (5, 4), M
i

j
{k, 0}
(14.74)
14.3. SOLUTION SEARCHING 635
Figure 14.43: Left: both the upper and the lower 1 are OK; Right: the upper 1
is OK, the lower 1 conicts with 2.
Another important point to solve Kloski puzzle, is about how to avoid re-
peated attempts. The obvious case is that after a series of sliding, we end up
a matrix which have been transformed from. However, it is not enough to only
avoid the same matrix. Consider the following two metrics. Although M
1
= M
2
,
we need drop options to M
2
, because they are essentially the same.
M
1
=
_

_
1 10 10 2
1 10 10 2
3 4 4 5
3 7 8 5
6 0 0 9
_

_
M
2
=
_

_
2 10 10 1
2 10 10 1
3 4 4 5
3 7 6 5
8 0 0 9
_

_
This fact tells us, that we should compare the layout, but not merely matrix
to avoid repetition. Denote the corresponding layouts as L
1
and L
2
respectively,
its easy to verify that ||L
1
|| = ||L
2
||, where ||L|| is the normalized layout, which
is dened as below:
||L|| = sort({sort(l
i
)|l
i
L}) (14.75)
In other words, a normalized layout is ordered for all its elements, and every
element is also ordered. The ordering can be dened as that (a, b) (c, d)
an +b cn +d, where n is the width of the matrix.
Observing that the Kloski board is symmetric, thus a layout can be mirrored
from another one. Mirrored layout is also a kind of repeating, which should be
avoided. The following M
1
and M
2
show such an example.
636 CHAPTER 14. SEARCHING
M
1
=
_

_
10 10 1 2
10 10 1 2
3 5 4 4
3 5 8 9
6 7 0 0
_

_
M
2
=
_

_
3 1 10 10
3 1 10 10
4 4 2 5
7 6 2 5
0 0 9 8
_

_
Note that, the normalized layouts are symmetric to each other. Its easy to
get a mirrored layout like this:
mirror(L) = {{(i, n j + 1)|(i, j) l}|l L} (14.76)
We nd that the matrix representation is useful in validating the moving,
while the layout is handy to model the moving and avoid repeated attempt.
We can use the similar approach to solve the Kloski puzzle. We need a queue,
every element in the queue contains two parts: a series of moving and the latest
layout led by the moving. Each moving is in form of (k, (y, x)), which means
moving the k-th block, with y in row, and x in column in the board.
The queue contains the starting layout when initialized. Whenever this
queue isnt empty, we pick the rst one from the head, checking if the biggest
block is on target, that L[10] = {(4, 2), (4, 3), (5, 2), (5, 3)}. If yes, then we are
done; otherwise, we try to move every block with 4 options: left, right, up, and
down, and store all the possible, unique new layout to the tail of the queue.
During this searching, we need record all the normalized layouts weve ever
found to avoid any duplication.
Denote the queue as Q, the historic layouts as H, the rst layout on the head
of the queue as L, its corresponding matrix as M. and the moving sequence to
this layout as S. The algorithm can be dened as the following.
solve(Q, H) =
_
_
_
: Q =
reverse(S) : L[10] = {(4, 2), (4, 3), (5, 2), (5, 3)}
solve(Q

, H

) : otherwise
(14.77)
The rst clause says that if the queue is empty, weve tried all the possibilities
and cant nd a solution; The second clause nds a solution, it returns the
moving sequence in reversed order; These are two edge cases. Otherwise, the
algorithm expands the current layout, puts all the valid new layouts to the tail
of the queue to yield Q

, and updates the normalized layouts to H

. Then it
performs recursive searching.
In order to expand a layout to valid unique new layouts, we can dene a
function as below:
expand(L, H) = {(k, (y, x)| k {1, 2, ..., 10},
(y, x) {(0, 1), (0, 1), (1, 0), (1, 0)},
valid(L, k, x, y), unique(L

, H)}
(14.78)
Where L

is the the new layout by moving the k-th block with (y, x)
from L, M

is the corresponding matrix, and M

is the matrix to the mirrored


layout of L

. Function unique is dened like this:


unique(L

, H) = M

/ H M

/ H (14.79)
14.3. SOLUTION SEARCHING 637
Well next show some example Haskell Kloski programs. As array isnt
mutable in the purely functional settings, tree based map is used to represent
layout
10
. Some type synonyms are dened as below:
import qualified Data.Map as M
import Data.Ix
import Data.List (sort)
type Point = (Integer, Integer)
type Layout = M.Map Integer [Point]
type Move = (Integer, Point)
data Ops = Op Layout [Move]
The main program is almost as same as the sort(Q, H) function dened
above.
solve :: [Ops] [[[Point]]] [Move]
solve [] _ = [] -- no solution
solve (Op x seq : cs) visit
| M.lookup 10 x == Just [(4, 2), (4, 3), (5, 2), (5, 3)] = reverse seq
| otherwise = solve q visit
where
ops = expand x visit
visit = map (layout move x) ops ++ visit
q = cs ++ [Op (move x op) (op:seq) | op ops ]
Where function layout gives the normalized form by sorting. move returns
the updated map by sliding the i-th block with (y, x).
layout = sort map sort M.elems
move x (i, d) = M.update (Just map (flip shift d)) i x
shift (y, x) (dy, dx) = (y + dy, x + dx)
Function expand gives all the possible new options. It can be directly trans-
lated from expand(L, H).
expand :: Layout [[[Point]]] [Move]
expand x visit = [(i, d) | i [1..10],
d [(0, -1), (0, 1), (-1, 0), (1, 0)],
valid i d, unique i d] where
valid i d = all (p let p = shift p d in
inRange (bounds board) p &&
(M.keys $ M.filter (elem p) x) elem [[i], []])
(maybe [] id $ M.lookup i x)
unique i d = let mv = move x (i, d) in
all (notElem visit) (map layout [mv, mirror mv])
Note that we also lter out the mirrored layouts. The mirror function is
given as the following.
mirror = M.map (map ( (y, x) (y, 5 - x)))
This program takes several minutes to produce the best solution, which takes
116 steps. The nal 3 steps are shown as below:
10
Alternatively, nger tree based sequence shown in previous chapter can be used
638 CHAPTER 14. SEARCHING
...
[5, 3, 2, 1]
[5, 3, 2, 1]
[7, 9, 4, 4]
[A, A, 6, 0]
[A, A, 0, 8]
[5, 3, 2, 1]
[5, 3, 2, 1]
[7, 9, 4, 4]
[A, A, 0, 6]
[A, A, 0, 8]
[5, 3, 2, 1]
[5, 3, 2, 1]
[7, 9, 4, 4]
[0, A, A, 6]
[0, A, A, 8]
total 116 steps
The Kloski solution can also be realized imperatively. Note that the solve(Q, H)
is tail-recursive, its easy to transform the algorithm with looping. We can also
link one layout to its parent, so that the moving sequence can be recorded
globally. This can save some spaces, as the queue neednt store the moving in-
formation in every element. When output the result, we only need back-tracking
to the starting layout from the last one.
Suppose function Link(L

, L) links a new layout L

to its parent layout L.


The following algorithm takes a starting layout, and searches for best moving
sequence.
1: function Solve(L
0
)
2: H ||L
0
||
3: Q
4: Push(Q, Link(L
0
, NIL))
5: while Q = do
6: L Pop(Q)
7: if L[10] = {(4, 2), (4, 3), (5, 2), (5, 3)} then
8: return L
9: else
10: for each L

Expand(L, H) do
11: Push(Q, Link(L

, L))
12: Append(H, ||L

||)
13: return NIL No solution
The following example Python program implements this algorithm:
class Node:
def __init__(self, l, p = None):
self.layout = l
self.parent = p
14.3. SOLUTION SEARCHING 639
def solve(start):
visit = [normalize(start)]
queue = [Node(start)]
while queue != []:
cur = queue.pop(0)
layout = cur.layout
if layout[-1] == [(4, 2), (4, 3), (5, 2), (5, 3)]:
return cur
else:
for brd in expand(layout, visit):
queue.append(Node(brd, cur))
visit.append(normalize(brd))
return None # no solution
Where normalize and expand are implemented as below:
def normalize(layout):
return sorted([sorted(r) for r in layout])
def expand(layout, visit):
def bound(y, x):
return 1 y and y 5 and 1 x and x 4
def valid(m, i, y, x):
return m[y - 1][x - 1] in [0, i]
def unique(brd):
(m, n) = (normalize(brd), normalize(mirror(brd)))
return all(m != v and n != v for v in visit)
s = []
d = [(0, -1), (0, 1), (-1, 0), (1, 0)]
m = matrix(layout)
for i in range(1, 11):
for (dy, dx) in d:
if all(bound(y + dy, x + dx) and valid(m, i, y + dy, x + dx)
for (y, x) in layout[i - 1]):
brd = move(layout, (i, (dy, dx)))
if unique(brd):
s.append(brd)
return s
Like most programming languages, arrays are indexed from 0 but not 1 in
Python. This has to be handled properly. The rest functions including mirror,
matrix, and move are implemented as the following.
def mirror(layout):
return [[(y, 5 - x) for (y, x) in r] for r in layout]
def matrix(layout):
m = [[0]4 for _ in range(5)]
for (i, ps) in zip(range(1, 11), layout):
for (y, x) in ps:
m[y - 1][x - 1] = i
return m
def move(layout, delta):
(i, (dy, dx)) = delta
640 CHAPTER 14. SEARCHING
m = dup(layout)
m[i - 1] = [(y + dy, x + dx) for (y, x) in m[i - 1]]
return m
def dup(layout):
return [r[:] for r in layout]
Its possible to modify this Kloski algorithm, so that it does not only stop
at the fast solution, but also search all the solutions. In such case, the compu-
tation time is bound to the size of a space V , where V holds all the layouts can
be transformed from the starting layout. If all these layouts are stored glob-
ally, with a parent eld point to the predecessor, the space requirement of this
algorithm is also bound to O(V ).
Summary of BFS
The above three puzzles, the wolf-goat-cabbage puzzle, the water jugs puzzle,
and the Kloski puzzle show some common solution structure. Similar to the
DFS problems, they all have the starting state and the end state. The wolf-
goat-cabbage puzzle starts with the wolf, the goat, the cabbage and the farmer
all in one side, while the other side is empty. It ends up in a state that they
all moved to the other side. The water jugs puzzle starts with two empty jugs,
and ends with either jug contains a certain volume of water. The Kloski puzzle
starts from a layout and ends to another layout that the biggest block begging
slided to a given position.
All problems specify a set of rules which can transfer from one state to
another. Dierent form the DFS approach, we try all the possible options in
parallel. We wont search further until all the other alternatives in the same step
have been examined. This method ensures that the solution with the minimum
steps can be found before those with more steps. Review and compare the two
gures weve drawn before shows the dierence between these two approaches.
For the later one, because we expand the searching horizontally, it is called as
Breadth-rst search (BFS for short).
(a) Depth First Search (b) Breadth First Search
Figure 14.44: Search orders for DFS and BFS.
As we cant perform search really in parallel, BFS realization typically uti-
lizes a queue to store the search options. The candidate with less steps pops
14.3. SOLUTION SEARCHING 641
from the head, while the new candidate with more steps is pushed to the tail
of the queue. Note that the queue should meet constant time enqueue and de-
queue requirement, which weve explained in previous chapter of queue. Strictly
speaking, the example functional programs shown above dont meet this crite-
ria. They use list to mimic queue, which can only provide linear time pushing.
Readers can replace them with the functional queue we explained before.
BFS provides a simple method to search for optimal solutions in terms of the
number of steps. However, it cant search for more general optimal solution.
Consider another directed graph as shown in gure 14.45, the length of each
section varies. We cant use BFS to nd the shortest route from one city to
another.
a
b
15
e
4
c
7 f
11 10
d
5
6
g
8 h
9
12
Figure 14.45: A weighted directed graph.
Note that the shortest route from city a to city c isnt the one with the
fewest steps a b c. The total length of this route is 22; But the route with
more steps a e f c is the best. The length of it is 20. The coming
sections introduce other algorithms to search for optimal solution.
14.3.2 Search the optimal solution
Searching for the optimal solution is quite important in many aspects. People
need the best solution to save time, space, cost, or energy. However, its not
easy to nd the best solution with limited resources. There have been many
optimal problems can only be solved by brute-force. Nevertheless, weve found
that, for some of them, There exists special simplied ways to search the optimal
solution.
Grady algorithm
Human coding
Human coding is a solution to encode information with the shortest length of
code. Consider the popular ASCII code, which uses 7 bits to encode characters,
digits, and symbols. ASCII code can represent 2
7
= 128 dierent symbols. With
0, 1 bits, we need at least log
2
n bits to distinguish n dierent symbols. For text
with only case insensitive English letters, we can dene a code table like below.
642 CHAPTER 14. SEARCHING
char code char code
A 00000 N 01101
B 00001 O 01110
C 00010 P 01111
D 00011 Q 10000
E 00100 R 10001
F 00101 S 10010
G 00110 T 10011
H 00111 U 10100
I 01000 V 10101
J 01001 W 10110
K 01010 X 10111
L 01011 Y 11000
M 01100 Z 11001
With this code table, text INTERNATIONAL is encoded to 65 bits.
00010101101100100100100011011000000110010001001110101100000011010
Observe the above code table, which actually maps the letter A to Z from
0 to 25. There are 5 bits to represent every code. Code zero is forced as 00000
but not 0 for example. Such kind of coding method, is called xed-length
coding.
Another coding method is variable-length coding. That we can use just
one bit 0 for A, two bits 10 for C, and 5 bits 11001 for Z. Although
this approach can shorten the total length of the code for INTERNATIONAL
from 65 bits dramatically, it causes problem when decoding. When processing
a sequence of bits like 1101, we dont know if it means 1 followed by 101,
which stands for BF; or 110 followed by 1, which is GB, or 1101 which is
N.
The famous Morse code is variable-length coding system. That the most
used letter E is encoded as a dot, while Z is encoded as two dashes and two
dots. Morse code uses a special pause separator to indicate the termination of
a code, so that the above problem wont happen. There is another solution to
avoid ambiguity. Consider the following code table.
char code char code
A 110 E 1110
I 101 L 1111
N 01 O 000
R 001 T 100
Text INTERNATIONAL is encoded to 38 bits only:
10101100111000101110100101000011101111
If decode the bits against the above code table, we wont meet any ambiguity
symbols. This is because there is no code for any symbol is the prex of another
one. Such code is called prex-code. (You may wonder why it isnt called as
non-prex code.) By using prex-code, we neednt separators at all. So that
the length of the code can be shorten.
This is a very interesting problem. Can we nd a prex-code table, which
produce the shortest code for a given text? The very same problem was given
to David A. Human in 1951, who was still a student in MIT[15]. His professor
14.3. SOLUTION SEARCHING 643
Robert M. Fano told the class that those who could solve this problem neednt
take the nal exam. Human almost gave up and started preparing the nal
exam when he found the most ecient answer.
The idea is to create the coding table according to the frequency of the
symbol appeared in the text. The more used symbol is assigned with the shorter
code.
Its not hard to process some text, and calculate the occurrence for each
symbol. So that we have a symbol set, each one is augmented with a weight.
The weight can be the number which indicates the frequency this symbol occurs.
We can use the number of occurrence, or the probabilities for example.
Human discovered that a binary tree can be used to generate prex-code.
All symbols are stored in the leaf nodes. The codes are generated by traversing
the tree from root. When go left, we add a zero; and when go right we add a
one.
Figure 14.46 illustrates a binary tree. Taking symbol N for example, start-
ing from the root, we rst go left, then right and arrive at N. Thus the code
for N is 01; While for symbol A, we can go right, right, then left. So A is
encode to 110. Note that this approach ensures none code is the prex of the
other.
13
5 8
2 N, 3
O, 1 R, 1
4 4
T, 2 I, 2 A, 2 2
E, 1 L, 1
Figure 14.46: An encoding tree.
Note that this tree can also be used directly for decoding. When scan a
series of bits, if the bit is zero, we go left; if the bit is one, we go right. When
arrive at a leaf, we decode a symbol from that leaf. And we restart from the
root of the tree for the coming bits.
Given a list of symbols with weights, we need build such a binary tree, so
that the symbol with greater weight has shorter path from the root. Human
developed a bottom-up solution. When start, all symbols are put into a leaf
node. Every time, we pick two nodes, which has the smallest weight, and merge
them into a branch node. The weight of this branch is the sum of its two
children. We repeatedly pick the two smallest weighted nodes and merge till
there is only one tree left. Figure 14.47 illustrates such a building process.
We can reuse the binary tree denition to formalize Human coding. We
augment the weight information, and the symbols are only stored in leaf nodes.
644 CHAPTER 14. SEARCHING
2
E, 1 L, 1
(a) 1.
2
O, 1 R, 1
(b) 2.
4
T, 2 I, 2
(c) 3.
4
A, 2 2
E, 1 L, 1
(d) 4.
5
2 N, 3
O, 1 R, 1
(e) 5.
8
4 4
T, 2 I, 2 A, 2 2
E, 1 L, 1
(f) 6.
13
5 8
2 N, 3
O, 1 R, 1
4 4
T, 2 I, 2 A, 2 2
E, 1 L, 1
(g) 7.
Figure 14.47: Steps to build a Human tree.
14.3. SOLUTION SEARCHING 645
The following C like denition, shows an example.
struct Node {
int w;
char c;
struct Node left, right;
};
Some limitation can be added to the denition, as empty tree isnt allowed.
A Human tree is either a leaf, which contains a symbol and its weight; or a
branch, which only holds total weight of all leaves. The following Haskell code,
for instance, explicitly species these two cases.
data HTr w a = Leaf w a | Branch w (HTr w a) (HTr w a)
When merge two Human trees T
1
and T
2
to a bigger one, These two trees
are set as its children. We can select either one as the left, and the other as
the right. the weight of the result tree T is the sum of its two children. so that
w = w
1
+ w
2
. Dene T
1
< T
2
if w
1
< w
2
, One possible Human tree building
algorithm can be realized as the following.
build(A) =
_
T
1
: A = {T
1
}
build({merge(T
a
, T
b
)} A

) : otherwise
(14.80)
A is a list of trees. It is initialized as leaves for all symbols and their weights.
If there is only one tree in this list, we are done, the tree is the nal Human
tree. Otherwise, The two smallest tree T
a
and T
b
are extracted, and the rest
trees are hold in list A

. T
a
and T
b
are merged to one bigger tree, and put back
to the tree list for further recursive building.
(T
a
, T
b
, A

) = extract(A) (14.81)
We can scan the tree list to extract the 2 nodes with the smallest weight. Be-
low equation shows that when the scan begins, the rst 2 elements are compared
and initialized as the two minimum ones. An empty accumulator is passed as
the last argument.
extract(A) = extract

(min(T
1
, T
2
), max(T
1
, T
2
), {T
3
, T
4
, ...}, ) (14.82)
For every tree, if its weight is less than the smallest two weve ever found,
we update the result to contain this tree. For any given tree list A, denote the
rst tree in it as T
1
, and the rest trees except T
1
as A

. The scan process can


be dened as the following.
extract

(T
a
, T
b
, A, B) =
_
_
_
(T
a
, T
b
, B) : A =
extract

(T

a
, T

b
, A

, {T
b
} A) : T
1
< T
b
extract

(T
a
, T
b
, A

, {T
1
} A) : otherwise
(14.83)
Where T

a
= min(T
1
, T
a
), T

b
= max(T
1
, T
a
) are the updated two trees with
the smallest weights.
The following Haskell example program implements this Human tree build-
ing algorithm.
646 CHAPTER 14. SEARCHING
build [x] = x
build xs = build ((merge x y) : xs) where
(x, y, xs) = extract xs
extract (x:y:xs) = min2 (min x y) (max x y) xs [] where
min2 x y [] xs = (x, y, xs)
min2 x y (z:zs) xs | z < y = min2 (min z x) (max z x) zs (y:xs)
| otherwise = min2 x y zs (z:xs)
This building solution can also be realized imperatively. Given an array of
Human nodes, we can use the last two cells to hold the nodes with the smallest
weights. Then we scan the rest of the array from right to left. Whenever there
is a node with the smaller weight, this node will be exchanged with the bigger
one of the last two. After all nodes have been examined, we merge the trees in
the last two cells, and drop the last cell. This shrinks the array by one. We
repeat this process till there is only one tree left.
1: function Huffman(A)
2: while |A| > 1 do
3: n |A|
4: for i n 2 down to 1 do
5: if A[i] < Max(A[n], A[n 1]) then
6: Exchange A[i] Max(A[n], A[n 1])
7: A[n 1] Merge(A[n], A[n 1])
8: Drop(A[n])
9: return A[1]
The following C++ example program implements this algorithm. Note that
this algorithm neednt the last two elements being ordered.
typedef vector<Node> Nodes;
bool lessp(Node a, Node b) { return aw < bw; }
Node max(Node a, Node b) { return lessp(a, b) ? b : a; }
void swap(Nodes& ts, int i, int j, int k) {
swap(ts[i], ts[ts[j] < ts[k] ? k : j]);
}
Node huffman(Nodes ts) {
int n;
while((n = ts.size()) > 1) {
for (int i = n - 3; i 0; --i)
if (lessp(ts[i], max(ts[n-1], ts[n-2])))
swap(ts, i, n-1, n-2);
ts[n-2] = merge(ts[n-1], ts[n-2]);
ts.pop_back();
}
return ts.front();
}
The algorithm merges all the leaves, and it need scan the list in each iteration.
Thus the performance is quadratic. This algorithm can be improved. Observe
that each time, only the two trees with the smallest weights are merged. This
14.3. SOLUTION SEARCHING 647
reminds us the heap data structure. Heap ensures to access the smallest element
fast. We can put all the leaves in a heap. For binary heap, this is typically a
linear operation. Then we extract the minimum element twice, merge them,
then put the bigger tree back to the heap. This is O(lg n) operation if binary
heap is used. So the total performance is O(nlg n), which is better than the
above algorithm. The next algorithm extracts the node from the heap, and
starts Human tree building.
build(H) = reduce(top(H), pop(H)) (14.84)
This algorithm stops when the heap is empty; Otherwise, it extracts another
nodes from the heap for merging.
reduce(T, H) =
_
T : H =
build(insert(merge(T, top(H)), pop(H))) : otherwise
(14.85)
Function build and reduce are mutually recursive. The following Haskell
example program implements this algorithm by using heap dened in previous
chapter.
huffman :: (Num a, Ord a) [(b, a)] HTr a b
huffman = build Heap.fromList map ((c, w) Leaf w c) where
build h = reduce (Heap.findMin h) (Heap.deleteMin h)
reduce x Heap.E = x
reduce x h = build $ Heap.insert (Heap.deleteMin h) (merge x (Heap.findMin h))
The heap solution can also be realized imperatively. The leaves are rstly
transformed to a heap, so that the one with the minimum weight is put on the
top. As far as there are more than 1 elements in the heap, we extract the two
smallest, merge them to a bigger one, and put back to the heap. The nal tree
left in the heap is the result Human tree.
1: function Huffman(A)
2: Build-Heap(A)
3: while |A| > 1 do
4: T
a
Heap-Pop(A)
5: T
b
Heap-Pop(B)
6: Heap-Push(A, Merge(T
a
, T
b
))
7: return Heap-Pop(A)
The following example C++ code implements this heap solution. The heap
used here is provided in the standard library. Because the max-heap, but not
min-heap would be made by default, a greater predication is explicitly passed
as argument.
bool greaterp(Node a, Node b) { return bw < aw; }
Node pop(Nodes& h) {
Node m = h.front();
pop_heap(h.begin(), h.end(), greaterp);
h.pop_back();
return m;
}
648 CHAPTER 14. SEARCHING
void push(Node t, Nodes& h) {
h.push_back(t);
push_heap(h.begin(), h.end(), greaterp);
}
Node huffman1(Nodes ts) {
make_heap(ts.begin(), ts.end(), greaterp);
while (ts.size() > 1) {
Node t1 = pop(ts);
Node t2 = pop(ts);
push(merge(t1, t2), ts);
}
return ts.front();
}
When the symbol-weight list has been already sorted, there exists a linear
time method to build the Human tree. Observe that during the Human tree
building, it produces a series of merged trees with weight in ascending order.
We can use a queue to manage the merged trees. Every time, we pick the two
trees with the smallest weight from both the queue and the list, merge them
and push the result to the queue. All the trees in the list will be processed, and
there will be only one tree left in the queue. This tree is the result Human
tree. This process starts by passing an empty queue as below.
build

(A) = reduce

(extract

(, A)) (14.86)
Suppose A is in ascending order by weight, At any time, the tree with the
smallest weight is either the header of the queue, or the rst element of the list.
Denote the header of the queue is T
a
, after pops it, the queue is Q

; The rst
element in A is T
b
, the rest elements are hold in A

. Function extract

can be
dened like the following.
extract

(Q, A) =
_
_
_
(T
b
, (Q, A

)) : Q =
(T
a
, (Q

, A)) : A = T
a
< T
b
(T
b
, (Q, A

)) : otherwise
(14.87)
Actually, the pair of queue and tree list can be viewed as a special heap.
The tree with the minimum weight is continuously extracted and merged.
reduce

(T, (Q, A)) =


_
T : Q = A =
reduce

(extract

(push(Q

, merge(T, T

)), A

)) : otherwise
(14.88)
Where (T

, (Q

, A

)) = extract

(Q, A), which means extracting another


tree. The following Haskell example program shows the implementation of this
method. Note that this program explicitly sort the leaves, which isnt necessary
if the leaves are ordered. Again, the list, but not a real queue is used here for
illustration purpose. List isnt good at pushing new element, please refer to the
chapter of queue for details about it.
huffman :: (Num a, Ord a) [(b, a)] HTr a b
huffman = reduce wrap sort map ((c, w) Leaf w c) where
14.3. SOLUTION SEARCHING 649
wrap xs = delMin ([], xs)
reduce (x, ([], [])) = x
reduce (x, h) = let (y, (q, xs)) = delMin h in
reduce $ delMin (q ++ [merge x y], xs)
delMin ([], (x:xs)) = (x, ([], xs))
delMin ((q:qs), []) = (q, (qs, []))
delMin ((q:qs), (x:xs)) | q < x = (q, (qs, (x:xs)))
| otherwise = (x, ((q:qs), xs))
This algorithm can also be realized imperatively.
1: function Huffman(A) A is ordered by weight
2: Q
3: T Extract(Q, A)
4: while Q = A = do
5: Push(Q, Merge(T, Extract(Q, A)))
6: T Extract(Q, A)
7: return T
Where function Extract(Q, A) extracts the tree with the smallest weight
from the queue and the list. It mutates the queue and tree if necessary. Denote
the head of the queue is T
a
, and the rst element of the list as T
b
.
1: function Extract(Q, A)
2: if Q = (A = T
a
< T
b
) then
3: return Pop(Q)
4: else
5: return Detach(A)
Where procedure Detach(A), removes the rst element from A, and returns
this element as result. In most imperative settings, as detaching the rst element
is slow linear operation for array, we can store the trees in descending order by
weight, and remove the last element. This is a fast constant time operation.
The below C++ example code shows this idea.
Node extract(queue<Node>& q, Nodes& ts) {
Node t;
if (!q.empty() && (ts.empty() | | lessp(q.front(), ts.back()))) {
t = q.front();
q.pop();
} else {
t = ts.back();
ts.pop_back();
}
return t;
}
Node huffman2(Nodes ts) {
queue<Node> q;
sort(ts.begin(), ts.end(), greaterp);
Node t = extract(q, ts);
while (!q.empty() | | !ts.empty()) {
q.push(merge(t, extract(q, ts)));
t = extract(q, ts);
}
return t;
650 CHAPTER 14. SEARCHING
}
Note that the sorting isnt necessary if the trees have already been ordered.
It can be a linear time reversing in case the trees are in ascending order by
weight.
There are three dierent Human man tree building methods explained.
Although they follow the same approach developed by Human, the result trees
varies. Figure 14.48 shows the three dierent Human trees built with these
methods.
13
5 8
A, 2 N, 3 4 4
2 T, 2
L, 1 E, 1
2 I, 2
O, 1 R, 1
(a) Created by scan method.
13
5 8
2 N, 3
O, 1 R, 1
4 4
T, 2 I, 2 A, 2 2
E, 1 L, 1
(b) Created by heap method.
13
5 8
2 N, 3
O, 1 R, 1
4 4
A, 2 I, 2 T, 2 2
E, 1 L, 1
(c) Linear time building for sorted list.
Figure 14.48: Variation of Human trees for the same symbol list.
Although these three trees are not identical. They are all able to generate the
most ecient code. The formal proof is skipped here. The detailed information
can be referred to [15] and Section 16.3 of [2].
The Human tree building is the core idea of Human coding. Many things
can be easily achieved with the Human tree. For example, the code table can
be generated by traversing the tree. We start from the root with the empty
prex p. For any branches, we append a zero to the prex if turn left, and
append a one if turn right. When a leaf node is arrived, the symbol represented
by this node and the prex are put to the code table. Denote the symbol of a
leaf node as c, the children for tree T as T
l
and T
r
respectively. The code table
association list can be built with code(T, ), which is dened as below.
code(T, p) =
_
{(c, p)} : leaf(T)
code(T
l
, p {0}) code(T
r
, p {1}) : otherwise
(14.89)
Where function leaf(T) tests if tree T is a leaf or a branch node. The
14.3. SOLUTION SEARCHING 651
following Haskell example program generates a map as the code table according
to this algorithm.
code tr = Map.fromList $ traverse [] tr where
traverse bits (Leaf _ c) = [(c, bits)]
traverse bits (Branch _ l r) = (traverse (bits ++ [0]) l) ++
(traverse (bits ++ [1]) r)
The imperative code table generating algorithm is left as exercise. The
encoding process can scan the text, and look up the code table to output the
bit sequence. The realization is skipped here.
The decoding process is realized by looking up the Human tree according
to the bit sequence. We start from the root, whenever a zero is received, we turn
left, otherwise if a one is received, we turn right. If a leaf node is arrived, the
symbol represented by this leaf is output, and we start another looking up from
the root. The decoding process ends when all the bits are consumed. Denote
the bit sequence as B = {b
1
, b
2
, ...}, all bits except the rst one are hold in B

,
below denition realizes the decoding algorithm.
decode(T, B) =
_

_
{c} : B = leaf(T)
{c} decode(root(T), B) : leaf(T)
decode(T
l
, B

) : b
1
= 0
decode(T
r
, B

) : otherwise
(14.90)
Where root(T) returns the root of the Human tree. The following Haskell
example code implements this algorithm.
decode tr cs = find tr cs where
find (Leaf _ c) [] = [c]
find (Leaf _ c) bs = c : find tr bs
find (Branch _ l r) (b:bs) = find (if b == 0 then l else r) bs
Note that this is an on-line decoding algorithm with linear time performance.
It consumes one bit per time. This can be clearly noted from the below imper-
ative realization, where the index keeps increasing by one.
1: function Decode(T, B)
2: W
3: n |B|, i 1
4: while i < n do
5: R T
6: while Leaf(R) do
7: if B[i] = 0 then
8: R Left(R)
9: else
10: R Right(R)
11: i i + 1
12: W W Symbol(R)
13: return W
This imperative algorithm can be implemented as the following example
C++ program.
string decode(Node root, const char bits) {
652 CHAPTER 14. SEARCHING
string w;
while (bits) {
Node t = root;
while (!isleaf(t))
t = 0 == bits++ ? tleft : tright;
w += tc;
}
return w;
}
Human coding, especially the Human tree building shows an interesting
strategy. Each time, there are multiple options for merging. Among the trees
in the list, Human method always selects two trees with the smallest weight.
This is the best choice at that merge stage. However, these series of local best
options generate a global optimal prex code.
Its not always the case that the local optimal choice also leads to the global
optimal solution. In most cases, it doesnt. Human coding is a special one. We
call the strategy that always choosing the local best option as greedy strategy.
Greedy method works for many problems. However, its not easy to tell
if the greedy method can be applied to get the global optimal solution. The
generic formal proof is still an active research area. Section 16.4 in [2] provides
a good treatment for Matroid tool, which covers many problems that greedy
algorithm can be applied.
Change-making problem
We often change money when visiting other countries. People tend to use credit
card more often nowadays than before, because its quite convenient to buy
things without considering much about changes. If we changed some money in
the bank, there are often some foreign money left by the end of the trip. Some
people like to change them to coins for collection. Can we nd a solution, which
can change the given amount of money with the least number of coins?
Lets use USA coin system for example. There are 5 dierent coins: 1 cent,
5 cent, 25 cent, 50 cent, and 1 dollar. A dollar is equal to 100 cents. Using the
greedy method introduced above, we can always pick the largest coin which is
not greater than the remaining amount of money to be changed. Denote list
C = {1, 5, 25, 50, 100}, which stands for the value of coins. For any given money
X, the change coins can be generated as below.
change(X, C) =
_
_
_
: X = 0
{c
m
} change(X c
m
, C) :
otherwise,
c
m
= max({c C, c X})
(14.91)
If C is in descending order, c
m
can be found as the rst one not greater
than X. If we want to change 1.42 dollar, This function produces a coin list of
{100, 25, 5, 5, 5, 1, 1}. The output coins list can be easily transformed to contain
pairs {(100, 1), (25, 1), (5, 3), (1, 2)}. That we need one dollar, a quarter, three
coins of 5 cent, and 2 coins of 1 cent to make the change. The following Haskell
example program outputs result as such.
solve x = assoc change x where
14.3. SOLUTION SEARCHING 653
change 0 _ = []
change x cs = let c = head $ filter ( x) cs in c : change (x - c) cs
assoc = (map (cs (head cs, length cs))) group
As mentioned above, this program assumes the coins are in descending order,
for instance like below.
solve 142 [100, 50, 25, 5, 1]
This algorithm is tail recursive, it can be transformed to a imperative loop-
ing.
1: function Change(X, C)
2: R
3: while X = 0 do
4: c
m
= max({c C, c X})
5: R {c
m
} R
6: X X c
m
7: return R
The following example Python program implements this imperative version
and manages the result with a dictionary.
def change(x, coins):
cs = {}
while x != 0:
m = max([c for c in coins if c x])
cs[m] = 1 + cs.setdefault(m, 0)
x = x - m
return cs
For a coin system like USA, the greedy approach can nd the optimal so-
lution. The amount of coins is the minimum. Fortunately, our greedy method
works in most countries. But it is not always true. For example, suppose a
country have coins of value 1, 3, and 4 units. The best change for value 6, is
to use two coins of 3 units, however, the greedy method gives a result of three
coins: one coin of 4, two coins of 1. Which isnt the optimal result.
Summary of greedy method
As shown in the change making problem, greedy method doesnt always give the
best result. In order to nd the optimal solution, we need dynamic programming
which will be introduced in the next section.
However, the result is often good enough in practice. Lets take the word-
wrap problem for example. In modern software editors and browsers, text spans
to multiple lines if the length of the content is too long to be hold. With word-
wrap supported, user neednt hard line breaking. Although dynamic program-
ming can wrap with the minimum number of lines, its overkill. On the contrary,
greedy algorithm can wrap with lines approximate to the optimal result with
quite eective realization as below. Here it wraps text T, not to exceeds line
width W, with space s between each word.
1: L W
2: for w T do
3: if |w| +s > L then
654 CHAPTER 14. SEARCHING
4: Insert line break
5: L W |w|
6: else
7: L L |w| s
For each word w in the text, it uses a greedy strategy to put as many words
in a line as possible unless it exceeds the line width. Many word processors use
a similar algorithm to do word-wrapping.
There are many cases, the strict optimal result, but not the approximate
one is necessary. Dynamic programming can help to solve such problems.
Dynamic programming
In the change-making problem, we mentioned the greedy method cant always
give the optimal solution. For any coin system, are there any way to nd the
best changes?
Suppose we have nd the best solution which makes X value of money.
The coins needed are contained in C
m
. We can partition these coins into two
collections, C
1
and C
2
. They make money of X
1
, and X
2
respectively. Well
prove that C
1
is the optimal solution for X
1
, and C
2
is the optimal solution for
X
2
.
Proof. For X
1
, Suppose there exists another solution C

1
, which uses less coins
than C
1
. Then changing solution C

1
C
2
uses less coins to make X than C
m
.
This is conict with the fact that C
m
is the optimal solution to X. Similarity,
we can prove C
2
is the optimal solution to X
2
.
Note that it is not true in the reverse situation. If we arbitrary select a
value Y < X, divide the original problem to nd the optimal solutions for sub
problems Y and X Y . Combine the two optimal solutions doesnt necessarily
yield optimal solution for X. Consider this example. There are coins with value
1, 2, and 4. The optimal solution for making value 6, is to use 2 coins of value
2, and 4; However, if we divide 6 = 3+3, since each 3 can be made with optimal
solution 3 = 1 + 2, the combined solution contains 4 coins (1 + 1 + 2 + 2).
If an optimal problem can be divided into several sub optimal problems, we
call it has optimal substructure. We see that the change-making problem has
optimal substructure. But the dividing has to be done based on the coins, but
not with an arbitrary value.
The optimal substructure can be expressed recursively as the following.
change(X) =
_
: X = 0
least({c change(X c)|c C, c X}) : otherwise
(14.92)
For any coin system C, the changing result for zero is empty; otherwise, we
check every candidate coin c, which is not greater then value X, and recursively
nd the best solution for X c; We pick the coin collection which contains the
least coins as the result.
Below Haskell example program implements this top-down recursive solu-
tion.
14.3. SOLUTION SEARCHING 655
change _ 0 = []
change cs x = minimumBy (compare on length)
[c:change cs (x - c) | c cs, c x]
Although this program outputs correct answer [2, 4] when evaluates change
[1, 2, 4] 6, it performs very bad when changing 1.42 dollar with USA coins
system. It failed to nd the answer within 15 minutes in a computer with
2.7GHz CPU and 8G memory.
The reason why its slow is because there are a lot of duplicated computing
in the top-down recursive solution. When it computes change(142), it needs to
examine change(141), change(137), change(117), change(92), and change(42).
While change(141) next computes to smaller values by deducing with 1, 2, 25,
50 and 100 cents. it will eventually meets value 137, 117, 92, and 42 again. The
search space explodes with power of 5.
This is quite similar to compute Fibonacci numbers in a top-down recursive
way.
F
n
=
_
1 : n = 1 n = 2
F
n1
+F
n2
: otherwise
(14.93)
When we calculate F
8
for example, we recursively calculate F
7
and F
6
. While
when we calculate F
7
, we need calculate F
6
again, and F
5
, ... As shown in the
below expand forms, the calculation is doubled every time, and the same value
is calculate again and again.
F
8
= F
7
+F
6
= F
6
+F
5
+F
5
+F
4
= F
5
+F
4
+F
4
+F
3
+F
4
+F
3
+F
3
+F
2
= ...
In order to avoid duplicated computation, a table F can be maintained when
calculating the Fibonacci numbers. The rst two elements are lled as 1, all
others are left blank. During the top-down recursive calculation, If need F
k
,
we rst look up this table for the k-th cell, if it isnt blank, we use that value
directly. Otherwise we need further calculation. Whenever a value is calculated,
we store it in the corresponding cell for looking up in the future.
1: F {1, 1, NIL, NIL, ...}
2: function Fibonacci(n)
3: if n > 2 F[n] = NIL then
4: F[n] Fibonacci(n 1) + Fibonacci(n 2)
5: return F[n]
By using the similar idea, we can develop a new top-down change-making
solution. We use a table T to maintain the best changes, it is initialized to all
empty coin list. During the top-down recursive computation, we look up this
table for smaller changing values. Whenever a intermediate value is calculated,
it is stored in the table.
1: T {, , ...}
2: function Change(X)
3: if X > 0 T[X] = then
4: for c C do
5: if c X then
656 CHAPTER 14. SEARCHING
6: C
m
{c} Change(X c)
7: if T[X] = |C
m
| < |T[X]| then
8: T[X] C
m
9: return T[X]
The solution to change 0 money is denitely empty , otherwise, we look
up T[X] to retrieve the solution to change X money. If it is empty, we need
recursively calculate it. We examine all coins in the coin system C which is not
greater than X. This is the sub problem of making changes for money X c.
The minimum amount of coins plus one coin of c is stored in T[X] nally as the
result.
The following example Python program implements this algorithm just takes
8000 ms to give the answer of changing 1.42 dollar in US coin system.
tab = [[] for _ in range(1000)]
def change(x, cs):
if x > 0 and tab[x] == []:
for s in [[c] + change(x - c, cs) for c in cs if c x]:
if tab[x] == [] or len(s) < len(tab[x]):
tab[x] = s
return tab[x]
Another solution to calculate Fibonacci number, is to compute them in order
of F
1
, F
2
, F
3
, ..., F
n
. This is quite natural when people write down Fibonacci
series.
1: function Fibo(n)
2: F = {1, 1, NIL, NIL, ...}
3: for i 3 to n do
4: F[i] F[i 1] +F[i 2]
5: return F[n]
We can use the quite similar idea to solve the change making problem. Starts
from zero money, which can be changed by an empty list of coins, we next try
to gure out how to change money of value 1. In US coin system for example,
A cent can be used; The next values of 2, 3, and 4, can be changed by two coins
of 1 cent, three coins of 1 cent, and 4 coins of 1 cent. At this stage, the solution
table looks like below
0 1 2 3 4
{1} {1, 1} {1, 1, 1} {1, 1, 1, 1}
The interesting case happens for changing value 5. There are two options,
use another coin of 1 cent, which need 5 coins in total; The other way is to use
1 coin of 5 cent, which uses less coins than the former. So the solution table
can be extended to this.
0 1 2 3 4 5
{1} {1, 1} {1, 1, 1} {1, 1, 1, 1} {5}
For the next change value 6, since there are two types of coin, 1 cent and 5
cent, are less than this value, we need examine both of them.
If we choose the 1 cent coin, we need next make changes for 5; Since weve
already known that the best solution to change 5 is {5}, which only needs
a coin of 5 cents, by looking up the solution table, we have one candidate
solution to change 6 as {5, 1};
14.3. SOLUTION SEARCHING 657
The other option is to choose the 5 cent coin, we need next make changes
for 1; By looking up the solution table weve lled so far, the sub optimal
solution to change 1 is {1}. Thus we get another candidate solution to
change 6 as {1, 5};
It happens that, both options yield a solution of two coins, we can select
either of them as the best solution. Generally speaking, the candidate with
fewest number of coins is selected as the solution, and lled into the table.
At any iteration, when we are trying to change the i < X value of money,
we examine all the types of coin. For any coin c not greater than i, we look
up the solution table to fetch the sub solution T[i c]. The number of coins
in this sub solution plus the one coin of c are the total coins needed in this
candidate solution. The fewest candidate is then selected and updated to the
solution table.
The following algorithm realizes this bottom-up idea.
1: function Change(X)
2: T {, , ...}
3: for i 1 to X do
4: for c C, c i do
5: if T[i] = 1 +|T[i c]| < |T[i]| then
6: T[i] {c} T[i c]
7: return T[X]
This algorithm can be directly translated to imperative programs, like Python
for example.
def changemk(x, cs):
s = [[] for _ in range(x+1)]
for i in range(1, x+1):
for c in cs:
if c i and (s[i] == [] or 1 + len(s[i-c]) < len(s[i])):
s[i] = [c] + s[i-c]
return s[x]
Observe the solution table, its easy to nd that, there are many duplicated
contents being stored.
6 7 8 9 10 ...
{1, 5} {1, 1, 5} {1, 1, 1, 5} {1, 1, 1, 1, 5} {5, 5} ...
This is because the optimal sub solutions are completely copied and saved
in parent solution. In order to use less space, we can only record the delta
part from the sub optimal solution. In change-making problem, it means that
we only need to record the coin being selected for value i.
1: function Change(X)
2: T {0, , , ...}
3: S {NIL, NIL, ...}
4: for i 1 to X do
5: for c C, c i do
6: if 1 +T[i c] < T[i] then
7: T[i] 1 +T[i c]
8: S[i] c
9: while X > 0 do
10: Print(S[X])
658 CHAPTER 14. SEARCHING
11: X X S[X]
Instead of recording the complete solution list of coins, this new algorithm
uses two tables T and S. T holds the minimum number of coins needed for
changing value 0, 1, 2, ...; while S holds the rst coin being selected for the
optimal solution. For the complete coin list to change money X, the rst coin
is thus S[X], the sub optimal solution is to change money X

= X S[X]. We
can look up table S[X

] for the next coin. The coins for sub optimal solutions
are repeatedly looked up like this till the beginning of the table. Below Python
example program implements this algorithm.
def chgmk(x, cs):
cnt = [0] + [x+1] x
s = [0]
for i in range(1, x+1):
coin = 0
for c in cs:
if c i and 1 + cnt[i-c] < cnt[i]:
cnt[i] = 1 + cnt[i-c]
coin = c
s.append(coin)
r = []
while x > 0:
r.append(s[x])
x = x - s[x]
return r
This change-making solution loops n times for given money n. It examines
at most the full coin system in each iteration. The time is bound to (nk)
where k is the number of coins for a certain coin system. The last algorithm
adds O(n) spaces to record sub optimal solutions with table T and S.
In purely functional settings, There is no means to mutate the solution
table and look up in constant time. One alternative is to use nger tree as we
mentioned in previous chapter
11
. We can store the minimum number of coins,
and the coin leads to the sub optimal solution in pairs.
The solution table, which is a nger tree, is initialized as T = {(0, 0)}. It
means change 0 money need no coin. We can fold on list {1, 2, ..., X}, start
from this table, with a binary function change(T, i). The folding will build the
solution table, and we can construct the coin list from this table by function
make(X, T).
makeChange(X) = make(X, fold(change, {(0, 0)}, {1, 2, ..., X})) (14.94)
In function change(T, i), all the coins not greater than i are examined to
select the one lead to the best result. The fewest number of coins, and the coin
being selected are formed to a pair. This pair is inserted to the nger tree, so
that a new solution table is returned.
change(T, i) = insert(T, fold(sel, (, 0), {c|c C, c i})) (14.95)
11
Some purely functional programming environments, Haskell for instance, provide built-in
array; while other almost pure ones, such as ML, provide mutable array
14.3. SOLUTION SEARCHING 659
Again, folding is used to select the candidate with the minimum number of
coins. This folding starts with initial value (, 0), on all valid coins. function
sel((n, c), c

) accepts two arguments, one is a pair of length and coin, which


is the best solution so far; the other is a candidate coin, it examines if this
candidate can make better solution.
sel((n, c), c

) =
_
(1 +n

, c

) : 1 +n

< n, (n

, c

) = T[i c

]
(n, c) : otherwise
(14.96)
After the solution table is built, the coins needed can be generated from it.
make(X, T) =
_
: X = 0
{c} make(X c, T) : otherwise, (n, c) = T[X]
(14.97)
The following example Haskell program uses Data.Sequence, which is the
library of nger tree, to implement change making solution.
import Data.Sequence (Seq, singleton, index, (|>))
changemk x cs = makeChange x $ foldl change (singleton (0, 0)) [1..x] where
change tab i = let sel c = min (1 + fst (index tab (i - c)), c)
in tab |> (foldr sel ((x + 1), 0) $ filter (i) cs)
makeChange 0 _ = []
makeChange x tab = let c = snd $ index tab x in c : makeChange (x - c) tab
Its necessary to memorize the optimal solution to sub problems no matter
using the top-down or the bottom-up approach. This is because a sub problem is
used many times when computing the overall optimal solution. Such properties
are called overlapping sub problems.
Properties of dynamic programming
Dynamic programming was originally named by Richard Bellman in 1940s. It is
a powerful tool to search for optimal solution for problems with two properties.
Optimal sub structure. The problem can be broken down into smaller
problems, and the optimal solution can be constructed eciently from
solutions of these sub problems;
Overlapping sub problems. The problem can be broken down into sub
problems which are reused several times in nding the overall solution.
The change-making problem, as weve explained, has both optimal sub struc-
tures, and overlapping sub problems.
Longest common subsequence problem
The longest common subsequence problem, is dierent with the longest com-
mon substring problem. Weve show how to solve the later in the chapter of
sux tree. The longest common subsequence neednt be consecutive part of the
original sequence.
660 CHAPTER 14. SEARCHING
Figure 14.49: The longest common subsequence
For example, The longest common substring for text Mississippi, and
Missunderstanding is Miss, while the longest common subsequence for them
are Misssi. This is shown in gure 14.49.
If we rotate the gure vertically, and consider the two texts as two pieces of
source code, it turns to be a di result between them. Most modern version
control tools need calculate the dierence content among the versions. The
longest common subsequence problem plays a very important role.
If either one of the two strings X and Y is empty, the longest common subse-
quence LCS(X, Y ) is denitely empty; Otherwise, denote X = {x
1
, x
2
, ..., x
n
},
Y = {y
1
, y
2
, ..., y
m
}, if the rst elements x
1
and y
1
are same, we can recursively
nd the longest subsequence of X

= {x
2
, x
3
, ..., x
n
} and Y

= {y
2
, y
3
, ..., y
m
}.
And the nal result LCS(X, Y ) can be constructed by concatenating x
1
with
LCS(X

, Y

); Otherwise if x
1
= y
1
, we need recursively nd the longest com-
mon subsequences of LCS(X, Y

) and LCS(X

, Y ), and pick the longer one as


the nal result. Summarize these cases gives the below denition.
LCS(X, Y ) =
_
_
_
: X = Y =
{x
1
} LCS(X

, Y

) : x
1
= y
1
longer(LCS(X, Y

), LCS(X

, Y )) : otherwise
(14.98)
Note that this algorithm shows clearly the optimal substructure, that the
longest common subsequence problem can be broken to smaller problems. The
sub problem is ensured to be at least one element shorter than the original one.
Its also clear that, there are overlapping sub-problems. The longest common
subsequences to the sub strings are used multiple times in nding the overall
optimal solution.
The existence of these two properties, the optimal substructure and the
overlapping sub-problem, indicates the dynamic programming can be used to
14.3. SOLUTION SEARCHING 661
solve this problem.
A 2-dimension table can be used to record the solutions to the sub-problems.
The rows and columns represent the substrings of X and Y respectively.
a n t e n n a
1 2 3 4 5 6 7
b 1
a 2
n 3
a 4
n 5
a 6
This table shows an example of nding the longest common subsequence for
strings antenna and banana. Their lengths are 7, and 6. The right bottom
corner of this table is looked up rst, Since its empty we need compare the
7th element in antenna and the 6th in banana, they are both a, Thus we
need next recursively look up the cell at row 5, column 6; Its still empty, and
we repeated this till either get a trivial case that one substring becomes empty,
or some cell we are looking up has been lled before. Similar to the change-
making problem, whenever the optimal solution for a sub-problem is found, it is
recorded in the cell for further reusing. Note that this process is in the reversed
order comparing to the recursive equation given above, that we start from the
right most element of each string.
Considering that the longest common subsequence for any empty string is
still empty, we can extended the solution table so that the rst row and column
hold the empty strings.
a n t e n n a

b
a
n
a
n
a
Below algorithm realizes the top-down recursive dynamic programming so-
lution with such a table.
1: T NIL
2: function LCS(X, Y )
3: m |X|, n |Y |
4: m

m+ 1, n

n + 1
5: if T = NIL then
6: T {{, , ..., }, {, NIL, NIL, ...}, ...} m

7: if X = Y = T[m

][n

] = NIL then
8: if X[m] = Y [n] then
9: T[m

][n

] Append(LCS(X[1..m1], Y [1..n 1]), X[m])


10: else
11: T[m

][n

] Longer(LCS(X, Y [1..n1]), LCS(X[1..m1], Y ))


12: return T[m

][n

]
The table is rstly initialized with the rst row and column lled with empty
662 CHAPTER 14. SEARCHING
strings; the rest are all NIL values. Unless either string is empty, or the cell
content isnt NIL, the last two elements of the strings are compared, and recur-
sively computes the longest common subsequence with substrings. The following
Python example program implements this algorithm.
def lcs(xs, ys):
m = len(xs)
n = len(ys)
global tab
if tab is None:
tab = [[""](n+1)] + [[""] + [None]n for _ in xrange(m)]
if m != 0 and n !=0 and tab[m][n] is None:
if xs[-1] == ys[-1]:
tab[m][n] = lcs(xs[:-1], ys[:-1]) + xs[-1]
else:
(a, b) = (lcs(xs, ys[:-1]), lcs(xs[:-1], ys))
tab[m][n] = a if len(b) < len(a) else b
return tab[m][n]
The longest common subsequence can also be found in a bottom-up manner
as what weve done with the change-making problem. Besides that, instead of
recording the whole sequences in the table, we can just store the lengths of the
longest subsequences, and later construct the subsubsequence with this table
and the two strings. This time, the table is initialized with all values set as 0.
1: function LCS(X, Y )
2: m |X|, n |Y |
3: T {{0, 0, ...}, {0, 0, ...}, ...} (m+ 1) (n + 1)
4: for i 1 to m do
5: for j 1 to n do
6: if X[i] = Y [j] then
7: T[i + 1][j + 1] T[i][j] + 1
8: else
9: T[i + 1][j + 1] Max(T[i][j + 1], T[i + 1][j])
10: return Get(T, X, Y, m, n)
11: function Get(T, X, Y, i, j)
12: if i = 0 j = 0 then
13: return
14: else if X[i] = Y [j] then
15: return Append(Get(T, X, Y, i 1, j 1), X[i])
16: else if T[i 1][j] > T[i][j 1] then
17: return Get(T, X, Y, i 1, j)
18: else
19: return Get(T, X, Y, i, j 1)
In the bottom-up approach, we start from the cell at the second row and
the second column. The cell is corresponding to the rst element in both X,
and Y . If they are same, the length of the longest common subsequence so far
is 1. This can be yielded by increasing the length of empty sequence, which is
stored in the top-left cell, by one; Otherwise, we pick the maximum value from
the upper cell and left cell. The table is repeatedly lled in this manner.
After that, a back-track is performed to construct the longest common sub-
14.3. SOLUTION SEARCHING 663
sequence. This time we start from the bottom-right corner of the table. If the
last elements in X and Y are same, we put this element as the last one of the
result, and go on looking up the cell along the diagonal line; Otherwise, we
compare the values in the left cell and the right cell, and go on looking up the
cell with the bigger value.
The following example Python program implements this algorithm.
def lcs(xs, ys):
m = len(xs)
n = len(ys)
c = [[0](n+1) for _ in xrange(m+1)]
for i in xrange(1, m+1):
for j in xrange(1, n+1):
if xs[i-1] == ys[j-1]:
c[i][j] = c[i-1][j-1] + 1
else:
c[i][j] = max(c[i-1][j], c[i][j-1])
return get(c, xs, ys, m, n)
def get(c, xs, ys, i, j):
if i==0 or j==0:
return []
elif xs[i-1] == ys[j-1]:
return get(c, xs, ys, i-1, j-1) + [xs[i-1]]
elif c[i-1][j] > c[i][j-1]:
return get(c, xs, ys, i-1, j)
else:
return get(c, xs, ys, i, j-1)
The bottom-up dynamic programming solution can also be dened in purely
functional way. The nger tree can be used as a table. The rst row is lled
with n+1 zero values. This table can be built by folding on sequence X. Then
the longest common subsequence is constructed from the table.
LCS(X, Y ) = construct(fold(f, {{0, 0, ..., 0}}, zip({1, 2, ...}, X))) (14.99)
Note that, since the table need be looked up by index, X is zipped with
natural numbers. Function f creates a new row of this table by folding on
sequence Y , and records the lengths of the longest common sequence for all
possible cases so far.
f(T, (i, x)) = insert(T, fold(longest, {0}, zip({1, 2, ...}, Y ))) (14.100)
Function longest takes the intermediate lled row result, and a pair of index
and element in Y , it compares if this element is the same as the one in X. Then
lls the new cell with the length of the longest one.
longest(R, (j, y)) =
_
insert(R, 1 +T[i 1][j 1]) : x = y
insert(R, max(T[i 1][j], T[i][j 1])) : otherwise
(14.101)
664 CHAPTER 14. SEARCHING
After the table is built. The longest common sub sequence can be con-
structed recursively by looking up this table. We can pass the reversed sequences

X, and

Y together with their lengths m and n for ecient building.
construct(T) = get((

X, m), (

Y , n)) (14.102)
If the sequences are not empty, denote the rst elements as x and y. The rest
elements are hold in

X

and

Y

respectively. The function get can be dened
as the following.
get((

X, i), (

Y , j)) =
_

_
:

X =

Y =
get((

, i 1), (

Y

, j 1)) {x} : x = y
get((

, i 1), (

Y , j)) : T[i 1][j] > T[i][j 1]


get((

X, i), (

Y

, j 1)) : otherwise
(14.103)
Below Haskell example program implements this solution.
lcs xs ys = construct $ foldl f (singleton $ fromList $ replicate (n+1) 0)
(zip [1..] xs) where
(m, n) = (length xs, length ys)
f tab (i, x) = tab |> (foldl longer (singleton 0) (zip [1..] ys)) where
longer r (j, y) = r |> if x == y
then 1 + (tab index (i-1) index (j-1))
else max (tab index (i-1) index j) (r index (j-1))
construct tab = get (reverse xs, m) (reverse ys, n) where
get ([], 0) ([], 0) = []
get ((x:xs), i) ((y:ys), j)
| x == y = get (xs, i-1) (ys, j-1) ++ [x]
| (tab index (i-1) index j) > (tab index i index (j-1)) =
get (xs, i-1) ((y:ys), j)
| otherwise = get ((x:xs), i) (ys, j-1)
Subset sum problem
Dynamic programming does not limit to solve the optimization problem, but can
also solve some more general searching problems. Subset sum problem is such an
example. Given a set of integers, is there a non-empty subset sums to zero? for
example, there are two subsets of {11, 64, 82, 68, 86, 55, 88, 21, 51} both
sum to zero. One is {64, 82, 55, 88, 51}, the other is {64, 82, 68, 86}.
Of course summing to zero is a special case, because sometimes, people want
to nd a subset, whose sum is a given value s. Here we are going to develop a
method to nd all the candidate subsets.
There is obvious a brute-force exhausting search solution. For every element,
we can either pick it or not. So there are total 2
n
options for set with n elements.
Because for every selection, we need check if it sums to s. This is a linear
operation. The overall complexity is bound to O(n2
n
). This is the exponential
algorithm, which takes very huge time if the set is big.
There is a recursive solution to subset sum problem. If the set is empty, there
is no solution denitely; Otherwise, let the set is X = {x
1
, x
2
, ...}. If x
1
= s, then
subset {x
1
} is a solution, we need next search for subsets X

= {x
2
, x
3
, ...} for
14.3. SOLUTION SEARCHING 665
those sum to s; Otherwise if x
1
= s, there are two dierent kinds of possibilities.
We need search X

for both sum s, and sum s x


1
. For any subset sum to
s x
1
, we can add x
1
to it to form a new set as a solution. The following
equation denes this algorithm.
solve(X, s) =
_
_
_
: X =
{{x
1
}} solve(X

, s) : x
1
= s
solve(X

, s) {{x
1
} S|S solve(X

, s x
1
)} : otherwise
(14.104)
There are clear substructures in this denition, although they are not in a
sense of optimal. And there are also overlapping sub-problems. This indicates
the problem can be solved with dynamic programming with a table to memorize
the solutions to sub-problems.
Instead of developing a solution to output all the subsets directly, lets con-
sider how to give the existence answer rstly. That output yes if there exists
some subset sum to s, and no otherwise.
One fact is that, the upper and lower limit for all possible answer can be
calculated in one scan. If the given sum s doesnt belong to this range, there is
no solution obviously.
_
s
l
=

{x X, x < 0}
s
u
=

{x X, x > 0}
(14.105)
Otherwise, if s
l
s s
u
, since the values are all integers, we can use a
table, with s
u
s
l
+1 columns, each column represents a possible value in this
range, from s
l
to s
u
. The value of the cell is either true or false to represents
if there exists subset sum to this value. All cells are initialized as false. Starts
from the rst element x
1
in X, denitely, set {x
1
} can sum to x
1
, so that the
cell represents this value in the rst row can be lled as true.
s
l
s
l
+ 1 ... x
1
... s
u
x
1
F F ... T ... F
With the next element x
2
, There are three possible sums. Similar as the
rst row, {x
2
} sums to x
2
; For all possible sums in previous row, they can also
been achieved without x
2
. So the cell below to x
1
should also be lled as true;
By adding x
2
to all possible sums so far, we can also get some new values. That
the cell represents x
1
+x
2
should be true.
s
l
s
l
+ 1 ... x
1
... x
2
... x
1
+x
2
... s
u
x
1
F F ... T ... F ... F ... F
x
2
F F ... T ... T ... T ... F
Generally speaking, when ll the i-th row, all the possible sums constructed
with {x
1
, x
2
, ..., x
i1
so far can also be achieved with x
i
. So the cells previously
are true should also be true in this new row. The cell represents value x
i
should
also be true since the singleton set {x
i
} sums to it. And we can also adds x
i
to all previously constructed sums to get the new results. Cells represent these
new sums should also be lled as true.
When all the elements are processed like this, a table with |X| rows is built.
Looking up the cell represents s in the last row tells if there exists subset can
sum to this value. As mentioned above, there is no solution if s < s
l
or s
u
< s.
We skip handling this case for the sake of brevity.
1: function Subset-Sum(X, s)
666 CHAPTER 14. SEARCHING
2: s
l

{x X, x < 0}
3: s
u

{x X, x > 0}
4: n |X|
5: T {{False, False, ...}, {False, False, ...}, ...} n (s
u
s
l
+ 1)
6: for i 1 to n do
7: for j s
l
to s
u
do
8: if X[i] = j then
9: T[i][j] True
10: if i > 1 then
11: T[i][j] T[i][j] T[i 1][j]
12: j

j X[i]
13: if s
l
j

s
u
then
14: T[i][j] T[i][j] T[i 1][j

]
15: return T[n][s]
Note that the index to the columns of the table, doesnt range from 1 to s
u

s
l
+1, but maps directly from s
l
to s
u
. Because most programming environments
dont support negative index, this can be dealt with T[i][j s
l
]. The following
example Python program utilizes the property of negative indexing.
def solve(xs, s):
low = sum([x for x in xs if x < 0])
up = sum([x for x in xs if x > 0])
tab = [[False](up-low+1) for _ in xs]
for i in xrange(0, len(xs)):
for j in xrange(low, up+1):
tab[i][j] = (xs[i] == j)
j1 = j - xs[i];
tab[i][j] = tab[i][j] or tab[i-1][j] or
(low j1 and j1 up and tab[i-1][j1])
return tab[-1][s]
Note that this program doesnt use dierent branches for i = 0 and i =
1, 2, ..., n 1. This is because when i = 0, the row index to i 1 = 1 refers to
the last row in the table, which are all false. This simplies the logic one more
step.
With this table built, its easy to construct all subsets sum to s. The method
is to look up the last row for cell represents s. If the last element x
n
= s, then
{x
n
} denitely is a candidate. We next look up the previous row for s, and
recursively construct all the possible subsets sum to s with {x
1
, x
2
, x
3
, ..., x
n1
}.
Finally, we look up the second last row for cell represents s x
n
. And for every
subset sums to this value, we add element x
n
to construct a new subset, which
sums to s.
1: function Get(X, s, T, n)
2: S
3: if X[n] = s then
4: S S {X[n]}
5: if n > 1 then
6: if T[n 1][s] then
7: S S Get(X, s, T, n 1)
8: if T[n 1][s X[n]] then
14.3. SOLUTION SEARCHING 667
9: S S {{X[n]} S

|S

Get(X, s X[n], T, n 1) }
10: return S
The following Python example program translates this algorithm.
def get(xs, s, tab, n):
r = []
if xs[n] == s:
r.append([xs[n]])
if n > 0:
if tab[n-1][s]:
r = r + get(xs, s, tab, n-1)
if tab[n-1][s - xs[n]]:
r = r + [[xs[n]] + ys for ys in get(xs, s - xs[n], tab, n-1)]
return r
This dynamic programming solution to subset sum problem loops O(n(s
u

s
l
+1)) times to build the table, and recursively uses O(n) time to construct the
nal solution from this table. The space it used is also bound to O(n(s
u
s
l
+1)).
Instead of using table with n rows, a vector can be used alternatively. For
every cell represents a possible sum, the list of subsets are stored. This vector
is initialized to contain all empty sets. For every element in X, we update the
vector, so that it records all the possible sums which can be built so far. When
all the elements are considered, the cell corresponding to s contains the nal
result.
1: function Subset-Sum(X, s)
2: s
l

{x X, x < 0}
3: s
u

{x X, x > 0}
4: T {, , ...} s
u
s
l
+ 1
5: for x X do
6: T

Duplicate(T)
7: for j s
l
to s
u
do
8: j

j x
9: if x = j then
10: T

[j] T

[j] {x}
11: if s
l
j

s
u
T[j

] = then
12: T

[j] T

[j] {{x} S|S T[j

]}
13: T T

14: return T[s]


The corresponding Python example program is given as below.
def subsetsum(xs, s):
low = sum([x for x in xs if x < 0])
up = sum([x for x in xs if x > 0])
tab = [[] for _ in xrange(low, up+1)]
for x in xs:
tab1 = tab[:]
for j in xrange(low, up+1):
if x == j:
tab1[j].append([x])
j1 = j - x
if low j1 and j1 up and tab[j1] != []:
tab1[j] = tab1[j] + [[x] + ys for ys in tab[j1]]
668 CHAPTER 14. SEARCHING
tab = tab1
return tab[s]
This imperative algorithm shows a clear structure, that the solution table is
built by looping every element. This can be realized in purely functional way
by folding. A nger tree can be used to represents the vector spans from s
l
to
s
u
. It is initialized with all empty values as in the following equation.
subsetsum(X, s) = fold(build, {, , ..., }, X)[s] (14.106)
After folding, the solution table is built, the answer is looked up at cell s
12
.
For every element x X, function build folds the list {s
l
, s
l
+1, ..., s
u
}, with
every value j, it checks if it equals to x and appends the singleton set {x} to
the j-th cell. Not that here the cell is indexed from s
l
, but not 0. If the cell
corresponding to j x is not empty, the candidate solutions stored in that place
are also duplicated and add element x is added to every solution.
build(T, x) = fold(f, T, {s
l
, s
l
+ 1, ..., s
u
}) (14.107)
f(T, j) =
_
T

[j] {{x} Y |Y T[j

]} : s
l
j

s
u
T[j

] = , j

= j x
T

: otherwise
(14.108)
Here the adjustment is applied on T

, which is another adjustment to T as


shown as below.
T

=
_
{x} T[j] : x = j
T : otherwise
(14.109)
Note that the rst clause in both equation (14.108) and (14.109) return a
new table with certain cell being updated with the given value.
The following Haskell example program implements this algorithm.
subsetsum xs s = foldl build (fromList [[] | _ [l..u]]) xs idx s where
l = sum $ filter (< 0) xs
u = sum $ filter (> 0) xs
idx t i = index t (i - l)
build tab x = foldl (t j let j = j - x in
adjustIf (l j && j u && tab idx j /= [])
(++ [(x:ys) | ys tab idx j]) j
(adjustIf (x == j) ([x]:) j t)) tab [l..u]
adjustIf pred f i seq = if pred then adjust f (i - l) seq else seq
Some materials like [16] provide common structures to abstract dynamic pro-
gramming. So that problems can be solved with a generic solution by customiz-
ing the precondition, the comparison of candidate solutions for better choice,
and the merge method for sub solutions. However, the variety of problems
makes things complex in practice. Its important to study the properties of the
problem carefully.
Exercise 14.3
12
Again, here we skip the error handling to the case that s < s
l
or s > s
u
. There is no
solution if s is out of range.
14.4. SHORT SUMMARY 669
Realize a maze solver by using the stack approach, which can nd all the
possible paths.
There are 92 distinct solutions for the 8 queens puzzle. For any one so-
lution, rotating it 90

, 180

, 270

gives solutions too. Also ipping it ver-


tically and horizontally also generate solutions. Some solutions are sym-
metric, so that rotation or ip gives the same one. There are 12 unique
solutions in this sense. Modify the program to nd the 12 unique solu-
tions. Improve the program, so that the 92 distinct solutions can be found
with fewer search.
Make the 8 queens puzzle solution generic so that it can solve n queens
puzzle.
Make the functional solution to the leap frogs puzzle generic, so that it
can solve n frogs case.
Modify the wolf, goat, and cabbage puzzle algorithm, so that it can nd
all possible solutions.
Give the complete algorithm denition to solve the 2 water jugs puzzle
with extended Euclid algorithm.
We neednt the exact linear combination information x and y in fact.
After we know the puzzle is solvable by testing with GCD, we can blindly
execute the process that: ll A, pour A into B, whenever B is full, empty
it till there is expected volume in one jug. Realize this solution. Can this
one nd faster solution than the original version?
Compare to the extended Euclid method, the DFS approach is a kind of
brute-force searching. Improve the extended Euclid approach by nding
the best linear combination which minimize |x| +|y|.
Realize the imperative Human code table generating algorithm.
One option to realize the bottom-up solution for the longest common
subsequence problem is to record the direction in the table. Thus, instead
of storing the length information, three values like N, for north, W
for west, and NW for northwest are used to indicate how to construct
the nal result. We start from the bottom-right corner of the table, if
the cell value is NW, we go along the diagonal by moving to the cell
in the upper-left; if its N, we move vertically to the upper row; and
move horizontally if its W. Implement this approach in your favorite
programming language.
14.4 Short summary
This chapter introduces the elementary methods about searching. Some of them
instruct the computer to scan for interesting information among the data. They
often have some structure, that can be updated during the scan. This can be
considered as a special case for the information reusing approach. The other
670 CHAPTER 14. SEARCHING
commonly used strategy is divide and conquer, that the scale of the search
domain is kept decreasing till some obvious result. This chapter also explains
methods to search for solutions among domains. The solutions typically are not
the elements being searched. They can be a series of decisions or some operation
arrangement. If there are multiple solutions, sometimes, people want to nd the
optimized one. For some spacial cases, there exist simplied approach such as
the greedy methods. And dynamic programming can be used for more wide
range of problems when they shows optimal substructures.
Bibliography
[1] Donald E. Knuth. The Art of Computer Programming, Volume 3: Sorting
and Searching (2nd Edition). Addison-Wesley Professional; 2 edition (May
4, 1998) ISBN-10: 0201896850 ISBN-13: 978-0201896855
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[3] M. Blum, R.W. Floyd, V. Pratt, R. Rivest and R. Tarjan, Time bounds
for selection, J. Comput. System Sci. 7 (1973) 448-461.
[4] Jon Bentley. Programming pearls, Second Edition. Addison-Wesley Pro-
fessional; 1999. ISBN-13: 978-0201657883
[5] Richard Bird. Pearls of functional algorithm design. Chapter 3. Cam-
bridge University Press. 2010. ISBN, 1139490605, 9781139490603
[6] Edsger W. Dijkstra. The saddleback search. EWD-934. 1985.
http://www.cs.utexas.edu/users/EWD/index09xx.html.
[7] Robert Boyer, and Strother Moore. MJRTY - A Fast Majority Vote Al-
gorithm. Automated Reasoning: Essays in Honor of Woody Bledsoe, Au-
tomated Reasoning Series, Kluwer Academic Publishers, Dordrecht, The
Netherlands, 1991, pp. 105-117.
[8] Cormode, Graham; S. Muthukrishnan (2004). An Improved Data Stream
Summary: The Count-Min Sketch and its Applications. J. Algorithms 55:
29C38.
[9] Knuth Donald, Morris James H., jr, Pratt Vaughan. Fast pattern matching
in strings. SIAM Journal on Computing 6 (2): 323C350. 1977.
[10] Robert Boyer, Strother Moore. A Fast String Searching Algorithm.
Comm. ACM (New York, NY, USA: Association for Computing Machin-
ery) 20 (10): 762C772. 1977
[11] R. N. Horspool. Practical fast searching in strings. Software - Practice &
Experience 10 (6): 501C506. 1980.
[12] Wikipedia. Boyer-Moore string search algorithm.
http://en.wikipedia.org/wiki/Boyer-Moore string search algorithm
[13] Wikipedia. Eight queens puzzle. http://en.wikipedia.org/wiki/Eight queens puzzle
671
672 BIBLIOGRAPHY
[14] George Polya. How to solve it: A new aspect of mathematical method.
Princeton University Press(April 25, 2004). ISBN-13: 978-0691119663
[15] Wikipedia. David A. Human. http://en.wikipedia.org/wiki/David A. Human
[16] Fethi Rabhi, Guy Lapalme Algorithms: a functional programming ap-
proach. Second edition. Addison-Wesley.
Part V
Appendix
673
Appendices
675
AlgoXY 677
Lists
Larry LIU Xinyu Larry LIU Xinyu
Email: [email protected]
678 Sequences
Appendix A
Lists
A.1 Introduction
This book intensely uses recursive list manipulations in purely functional set-
tings. List can be treated as a counterpart to array in imperative settings, which
are the bricks to many algorithms and data structures.
For the readers who are not familiar with functional list manipulation, this
appendix provides a quick reference. All operations listed in this appendix
are not only described in equations, but also implemented in both functional
programming languages as well as imperative languages as examples. We also
provide a special type of implementation in C++ template meta programming
similar to [3] for interesting in next appendix.
Besides the elementary list operations, this appendix also contains explana-
tion of some high order function concepts such as mapping, folding etc.
A.2 List Denition
Like arrays in imperative settings, lists play a critical role in functional setting
1
.
Lists are built-in support in some programming languages like Lisp families and
ML families so it neednt explicitly dene list in those environment.
List, or more precisely, singly linked-list is a data structure that can be
described below.
A list is either empty;
Or contains an element and a list.
Note that this denition is recursive. Figure A.1 illustrates a list with N
nodes. Each node contains two part, a key element and a sub list. The sub list
contained in the last node is empty, which is denoted as NIL.
1
Some reader may argue that lambda calculus plays the most critical role. Lambda cal-
culus is somewhat as assembly languages to the computation world, which is worthy studying
from the essence of computation model to the practical programs. However, we dont dive
into the topic in this book. Users can refer to [4] for detail.
679
680 APPENDIX A. LISTS
key[1] next
key[2] next
...
key[N] NIL
Figure A.1: A list contains N nodes
This data structure can be explicitly dened in programming languages sup-
port record (or compound type) concept. The following ISO C++ code denes
list
2
.
template<typename T>
struct List {
T key;
List next;
};
A.2.1 Empty list
It is worth to mention about empty list a bit more in detail. In environment
supporting the nil concept, for example, C or java like programming languages,
empty list can have two dierent representations. One is the trivial NIL (or
null, or 0, which varies from languages); the other is an non-NIL empty list as
{}, the latter is typically allocated with memory but lled with nothing. In Lisp
dialects, the empty is commonly written as (). In ML families, its written as
[]. We use to denote empty list in equations and use NIL in pseudo code
sometimes to describe algorithms in this book.
A.2.2 Access the element and the sub list
Given a list L, two functions can be dened to access the element stored in
it and the sub list respectively. They are typically denoted as first(L), and
rest(L) or head(L) and tail(L) for the same meaning. These two functions are
named as car and cdr in Lisp for historic reason about the design of machine
2
We only use template to parameterize the type of the element in this chapter. Except
this point, all imperative source code are in ANSI C style to avoid language specic features.
A.3. BASIC LIST MANIPULATION 681
registers [5]. In languages support Pattern matching (e.g. ML families, Prolog
and Erlang etc.) These two functions are commonly realized by matching the
cons which well introduced later. for example the following Haskell program:
head (x:xs) = x
tail (x:xs) = xs
If the list is dened in record syntax like what we did above, these two
functions can be realized by accessing the record elds
3
.
template<typename T>
T first(List<T> xs) { return xskey; }
template<typename T>
List<T> rest(List<T> xs) { return xsnext; }
In this book, L

is used to denote the rest(L) sometimes, also we uses l


1
to represent first(L) in the context that the list is literately given in form
L = {l
1
, l
2
, ..., l
N
}.
More interesting, as far as in an environment support recursion, we can
dene List. The following example dene a list of integers in C++ compile
time.
struct Empty;
template<int x, typename T> struct List {
static const int first = x;
typedef T rest;
};
This line constructs a list of {1, 2, 3, 4, 5} in compile time.
typedef List<1, List<2, List<3, List<4 List<5, Empty> > > > > A;
A.3 Basic list manipulation
A.3.1 Construction
The last C++ template meta programming example actually shows literate
construction of a list. A list can be constructed from an element with a sub
list, where the sub list can be empty. We denote function cons(x, L) as the
constructor. This name is used in most Lisp dialects. In ML families, there are
cons operator dened as ::, (in Haskell its :).
We can dene cons to create a record as we dened above in ISO C++, for
example
4
.
template<typename T>
List<T> cons(T x, List<T> xs) {
List<T> lst = new List<T>;
lstkey = x;
lstnext = xs;
return lst;
}
3
They can be also named as key and next or be dened as class methods.
4
It is often dened as a constructor method for the class template, However, we dene it
as a standalone function for illustration purpose.
682 APPENDIX A. LISTS
A.3.2 Empty testing and length calculating
It is trivial to test if a list is empty. If the environment contains nil concept, the
testing should also handle nil case. Both Lisp dialects and ML families provide
null testing functions. Empty testing can also be realized by pattern-matching
with empty list if possible. The following Haskell program shows such example.
null [] = True
null _ = False
In this book we will either use empty(L) or L = where empty testing
happens.
With empty testing dened, its possible to calculate length for a list. In
imperative settings, Length is often implemented like the following.
function Length(L)
n 0
while L = NIL do
n n + 1
L Next(L)
This ISO C++ code translates the algorithm to real program.
template<typename T>
int length(List<T> xs) {
int n = 0;
for (; xs; ++n, xs = xsnext);
return n
}
However, in purely funcitonal setting, we cant mutate a counter variable.
the idea is that, if the list is empty, then its size is zero; otherwise, we can
recursively calculate the length of the sub list, then add it by one to get the
length of this list.
length(L) =
_
0 : L =
1 +length(L

) : otherwise
(A.1)
Here L

= rest(L) as mentioned above, its {l


2
, l
3
, ..., l
N
} for list contains N
elements. Note that both L and L

can be empty . In this equation, we also


use = to test if list L is empty. In order to know the length of a list, we need
traverse all the elements from the head to the end, so that this algorithm is
proportion to the number of elements stored in the list. It is a linear algorithm
bound to O(N) time.
Below are two programs in Haskell and in Scheme/Lisp realize this recursive
algorithm.
length [] = 0
length (x:xs) = 1 + length xs
(define (length lst)
(if (null? lst) 0 (+ 1 (length (cdr lst)))))
How to testing if two lists are identical is left as exercise to the reader.
A.3. BASIC LIST MANIPULATION 683
A.3.3 indexing
One big dierence between array and list (singly-linked list accurately) is that
array supports random access. Many programming languages support using
x[i] to access the i-th element stored in array in constant O(1) time. The
index typically starts from 0, but its not the all case. Some programming
languages using 1 as the rst index. In this appendix, we treat index starting
from 0. However, we must traverse the list with i steps to reach the target
element. The traversing is quite similar to the length calculation. Thus its
commonly expressed as below in imperative settings.
function Get-At(L, i)
while i = 0 do
L Next(L)
return First(L)
Note that this algorithm doesnt handle the error case such that the index
isnt within the bound of the list. We assume that 0 i < |L|, where |L| =
length(L). The error handling is left as exercise to the reader. The following
ISO C++ code is a line-by-line translation of this algorithm.
template<typename T>
T getAt(List<T> lst, int n) {
while(n--)
lst = lstnext;
return lstkey;
}
However, in purely functional settings, we turn to recursive traversing in-
stead of while-loop.
getAt(L, i) =
_
First(L) : i = 0
getAt(Rest(L), i 1) : otherwise
(A.2)
In order to get the i-th element, the algorithm does the following:
if i is 0, then we are done, the result is the rst element in the list;
Otherwise, the result is to get the (i 1)-th element from the sub-list.
This algorithm can be translated to the following Haskell code.
getAt i (x:xs) = if i == 0 then x else getAt i-1 xs
Note that we are using pattern matching to ensure the list isnt empty, which
actually handles all out-of-bound cases with un-matched pattern error. Thus if
i > |L|, we nally arrive at a edge case that the index is i |L|, while the list is
empty; On the other hand, if i < 0, minus it by one makes it even farther away
from 0. We nally end at the same error that the index is some negative, while
the list is empty;
The indexing algorithm takes time proportion to the value of index, which is
bound to O(N) linear time. This section only address the read semantics. How
to mutate the element at a given position is explained in later section.
684 APPENDIX A. LISTS
A.3.4 Access the last element
Although accessing the rst element and the rest list L

is trivial, the opposite


operations, that retrieving the last element and the initial sub list need linear
time without using a tail pointer. If the list isnt empty, we need traverse it till
the tail to get these two components. Below are their imperative descriptions.
function Last(L)
x NIL
while L = NIL do
x First(L)
L Rest(L)
return x
function Init(L)
L

NIL
while Rest(L) = NIL do
L

Append(L

, First(L))
L Rest(L)
return L

The algorithm assumes that the input list isnt empty, so the error handling
is skipped. Note that the Init() algorithm uses the appending algorithm which
will be dened later.
Below are the corresponding ISO C++ implementation. The optimized ver-
sion by utilizing tail pointer is left as exercise.
template<typename T>
T last(List<T> xs) {
T x; / Can be set to a special value to indicate empty list err. /
for (; xs; xs = xsnext)
x = xskey;
return x;
}
template<typename T>
List<T> init(List<T> xs) {
List<T> ys = NULL;
for (; xsnext; xs = xsnext)
ys = append(ys, xskey);
return ys;
}
While these two algorithms can be implemented in purely recursive manner
as well. When we want to access the last element.
If the list contains only one element (the rest sub-list is empty), the result
is this very element;
Otherwise, the result is the last element of the rest sub-list.
last(L) =
_
First(L) : Rest(L) =
last(Rest(L)) : otherwise
(A.3)
A.3. BASIC LIST MANIPULATION 685
The similar approach can be used to get a list contains all elements except
for the last one.
The edge case: If the list contains only one element, then the result is an
empty list;
Otherwise, we can rst get a list contains all elements except for the last
one from the rest sub-list, then construct the nal result from the rst
element and this intermediate result.
init(L) =
_
: L

=
cons(l
1
, init(L

)) : otherwise
(A.4)
Here we denote l
1
as the rst element of L, and L

is the rest sub-list. This


recursive algorithm neednt use appending, It actually construct the nal result
list from right to left. Well introduce a high-level concept of such kind of
computation later in this appendix.
Below are Haskell programs implement last() and init() algorithms by using
pattern matching.
last [x] = x
last (_:xs) = last xs
init [x] = []
init (x:xs) = x : init xs
Where [x] matches the singleton list contains only one element, while (_:xs)
matches any non-empty list, and the underscore (_) is used to indicate that we
dont care about the element. For the detail of pattern matching, readers can
refer to any Haskell tutorial materials, such as [8].
A.3.5 Reverse indexing
Reverse indexing is a general case for last(), nding the i-th element in a singly-
linked list with the minimized memory spaces is interesting, and this problem
is often used in technical interview in some companies. A naive implementation
takes 2 rounds of traversing, the rst round is to determine the length of the
list N, then, calculate the left-hand index by N i 1. Finally a second round
of traverse is used to access the element with the left-hand index. This idea can
be give as the following equation.
getAtR(L, i) = getAt(L, length(L) i 1)
There exists better imperative solution. For illustration purpose, we omit
the error cases such as index is out-of-bound etc. The idea is to keep two
pointers p
1
, p
2
, with the distance of i between them, that rest
i
(p
2
) = p
1
, where
rest
i
(p
1
) means repleatedly apply rest() function i times. It says that succeeds
i steps from p
2
gets p
1
. We can start p
2
from the head of the list and advance
the two pointers in parallel till one of them (p
1
) arrives at the end of the list.
At that time point, pointer p
2
exactly arrived at the i-th element from right.
Figure A.2 illustrates this idea.
It is straightforward to realize the imperative algorithm based on this double
pointers solution.
686 APPENDIX A. LISTS
x[1] x[2] ... x[i+1] ... x[N] .
p2 p1
(a) p
2
starts from the head, which is behind p
1
in i steps.
x[1] x[2] ... x[N-i] ... x[N] .
p2 p1
(b) When p
1
reaches the end, p
2
points to the i-th element from right.
Figure A.2: Double pointers solution to reverse indexing.
function Get-At-R(L, i)
p L
while i = 0 do
L Rest(L)
i i 1
while Rest(L) = NIL do
L Rest(L)
p Rest(p)
return First(p)
The following ISO C++ code implements the double pointers right indexing
algorithm.
template<typename T>
T getAtR(List<T> xs, int i) {
List<T> p = xs;
while(i--)
xs = xsnext;
for(; xsnext; xs = xsnext, p = pnext);
return pkey;
}
The same idea can be realized recursively as well. If we want to access the
i-th element of list L, we can examine the two lists L and S = {l
i
, l
i+1
, ..., l
N
}
simultaneously, where S is a sub-list of L without the rst i elements.
The edge case: If S is a singleton list, then the i-th element from right is
the rst element in L;
Otherwise, we drop the rst element from L and S, and recursively exam-
A.3. BASIC LIST MANIPULATION 687
ine L

and S

.
This algorithm description can be formalized as the following equations.
getAtR(L, i) = examine(L, drop(i, L)) (A.5)
Where function examine(L, S) is dened as below.
examine(L, S) =
_
first(L) : |S| = 1
examine(rest(L), rest(S)) : otherwise
(A.6)
Well explain the detail of drop() function in later section about list mutating
operations. Here it can be implemented as repeatedly call rest() with specied
times.
drop(n, L) =
_
L : n = 0
drop(n 1, rest(L)) : otherwise
Translating the equations to Haskell yields this example program.
atR :: [a] Int a
atR xs i = get xs (drop i xs) where
get (x:_) [_] = x
get (_:xs) (_:ys) = get xs ys
drop n as@(_:as) = if n == 0 then as else drop (n-1) as
Here we use dummy variable _ as the placeholders for components we dont
care.
A.3.6 Mutating
Strictly speaking, we cant mutate the list at all in purely functional settings.
Unlike in imperative settings, mutate is actually realized by creating new list.
Almost all functional environments support garbage collection, the original list
may either be persisted for reusing, or released (dropped) at sometime (Chapter
2 in [6]).
Appending
Function cons can be viewed as building list by insertion element always on head.
If we chains multiple cons operations, it can repeatedly construct a list from
right to the left. Appending on the other hand, is an operation adding element
to the tail. Compare to cons which is trivial constant time O(1) operation, We
must traverse the whole list to locate the appending position. It means that
appending is bound to O(N), where N is the length of the list. In order to speed
up the appending, imperative implementation typically uses a eld (variable) to
record the tail position of a list, so that the traversing can be avoided. However,
in purely functional settings we cant use such tail pointer. The appending has
to be realized in recursive manner.
append(L, x) =
_
{x} : L =
cons(first(L), append(rest(L), x)) : otherwise
(A.7)
That the algorithm handles two dierent appending cases:
688 APPENDIX A. LISTS
If the list is empty, the result is a singleton list contains x, which is the
element to be appended. The singleton list notion {x} = cons(x, ), is a
simplied form of cons the element with an empty list ;
Otherwise, for the none-empty list, the result can be achieved by rst ap-
pending the element x to the rest sub-list, then construct the rst element
of L with the recursive appending result.
For the none-trivial case, if we denote L = {l
1
, l
2
, ...}, and L

= {l
2
, l
3
, ...}
the equation can be written as.
append(L, x) =
_
{x} : L =
cons(l
1
, append(L

, x)) : otherwise
(A.8)
Well use both forms in the rest of this appendix.
The following Scheme/Lisp program implements this algorithm.
(define (append lst x)
(if (null? lst)
(list x)
(cons (car lst) (append (cdr lst) x))))
Even without the tail pointer, its possible to traverse the list imperatively
and append the element at the end.
function Append(L, x)
if L = NIL then
return Cons(x, NIL)
H L
while Rest(L) = NIL do
L Rest(L)
Rest(L) Cons(x, NIL)
return H
The following ISO C++ programs implements this algorithm. How to uti-
lize a tail eld to speed up the appending is left as exercise to the reader for
interesting.
template<typename T>
List<T> append(List<T> xs, T x) {
List<T> tail, head;
for (head = tail = xs; xs; xs = xsnext)
tail = xs;
if (!head)
head = cons<T>(x, NULL);
else
tailnext = cons<T>(x, NULL);
return head;
}
Mutate element at a given position
Although we have dened random access algorithm getAt(L, i), we cant just
mutate the element returned by this function in a sense of purely functional
A.3. BASIC LIST MANIPULATION 689
settings. It is quite common to provide reference semantics in imperative pro-
gramming languages and in some almost functional environment. Readers can
refer to [4] for detail. For example, the following ISO C++ example returns a
reference instead of a value in indexing program.
template<typename T>
T& getAt(List<T> xs, int n) {
while (n--)
xs = xsnext;
return xskey;
}
So that we can use this function to mutate the 2nd element as below.
List<int> xs = cons(1, cons(2, cons<int>(3, NULL)));
getAt(xs, 1) = 4;
In an impure functional environment, such as Scheme/Lisp, to set the i-
th element to a given value can be implemented by mutate the referenced cell
directly as well.
(define (set-at! lst i x)
(if (= i 0)
(set-car! lst x)
(set-at! (cdr lst) (- i 1) x)))
This program rst checks if the index i is zero, if so, it mutate the rst
element of the list to given value x; otherwise, it deduces the index i by one, and
tries to mutate the rest of the list at this new index with value x. This function
doesnt return meaningful value. It is for use of side-eect. For instance, the
following code mutates the 2nd element in a list.
(define lst (1 2 3 4 5))
(set-at! lst 1 4)
(display lst)
(1 4 3 4 5)
In order to realize a purely functional setAt(L, i, x) algorithm, we need avoid
directly mutating the cell, but creating a new one:
Edge case: If we want to set the value of the rst element (i = 0), we
construct a new list, with the new value and the sub-list of the previous
one;
Otherwise, we construct a new list, with the previous rst element, and a
new sub-list, which has the (i 1)-th element set with the new value.
This recursive description can be formalized by the following equation.
setAt(L, i, x) =
_
cons(x, L

) : i = 0
cons(l
1
, setAt(L

, i 1, x)) : otherwise
(A.9)
Comparing the below Scheme/Lisp implementation to the previous one re-
veals the dierence from imperative mutating.
690 APPENDIX A. LISTS
(define (set-at lst i x)
(if (= i 0)
(cons x (cdr lst))
(cons (car lst) (set-at (cdr lst) (- i 1) x))))
Here we skip the error handling for out-of-bound error etc. Again, similar
to the random access algorithm, the performance is bound to linear time, as
traverse is need to locate the position to set the value.
insertion
There are two semantics about list insertion. One is to insert an element at a
given position, which can be denoted as insert(L, i, x). The algorithm is close
to setAt(L, i, x); The other is to insert an element to a sorted list, so that the
the result list is still sorted.
Lets rst consider how to insert an element x at a given position i. The
obvious thing is that we need rstly traverse i elements to get to the position,
the rest of work is to construct a new sub-list with x being the head of this
sub-list. Finally, we construct the whole result by attaching this new sub-list to
the end of the rst i elements.
The algorithm can be described accordingly to this idea. If we want to insert
an element x to a list L at i.
Edge case: If i is zero, then the insertion turns to be a trivial cons
operation cons(x, L);
Otherwise, we recursively insert x to the sub-list L

at position i 1; then
construct the rst element with this result.
Below equation formalizes the insertion algorithm.
insert(L, i, x) =
_
cons(x, L) : i = 0
cons(l
1
, insert(L

, i 1, x)) : otherwise
(A.10)
The following Haskell program implements this algorithm.
insert xs 0 y = y:xs
insert (x:xs) i y = x : insert xs (i-1) y
This algorithm doesnt handle the out-of-bound error. However, we can
interpret the case, that the position i exceeds the length of the list as appending.
Readers can considering about it in the exercise of this section.
The algorithm can also be designed imperatively: If the position is zero, just
construct the new list with the element to be inserted as the rst one; Otherwise,
we record the head of the list, then start traversing the list i steps. We also need
an extra variable to memorize the previous position for the later list insertion
operation. Below is the pseudo code.
function Insert(L, i, x)
if i = 0 then
return Cons(x, L)
H L
p L
A.3. BASIC LIST MANIPULATION 691
while i = 0 do
p L
L Rest(L)
i i 1
Rest(p) Cons(x, L)
return H
And the ISO C++ example program is given by translating this algorithm.
template<typename T>
List<T> insert(List<T> xs, int i , int x) {
List<T> head, prev;
if (i == 0)
return cons(x, xs);
for (head = xs; i; --i, xs = xsnext)
prev = xs;
prevnext = cons(x, xs);
return head;
}
If the list L is sorted, that is for any position 1 i j N, we have l
i
l
j
.
We can design an algorithm which inserts a new element x to the list, so that
the result list is still sorted.
insert(x, L) =
_
_
_
cons(x, ) : L =
cons(x, L) : x < l
1
cons(l
1
, insert(x, L

)) : otherwise
(A.11)
The idea is that, to insert an element x to a sorted list L:
If either L is empty or x is less than the rst element in L, we just put x
in front of L to construct the result;
Otherwise, we recursively insert x to the sub-list L

.
The following Haskell program implements this algorithm. Note that we use
, to determine the ordering. Actually this constraint can be loosened to the
strict less (<), that if elements can be compare in terms of <, we can design
a program to insert element so that the result list is still sorted. Readers can
refer to the chapters of sorting in this book for details about ordering.
insert y [] = [y]
insert y xs@(x:xs) = if y x then y : xs else x : insert y xs
Since the algorithm need compare the elements one by one, its also a linear
time algorithm. Note that here we use the as notion for pattern matching in
Haskell. Readers can refer to [8] and [7] for details.
This ordered insertion algorithm can be designed in imperative manner, for
example like the following pseudo code
5
.
function Insert(x, L)
if L = x < First(L) then
return Cons(x, L)
5
Reader can refer to the chapter The evolution of insertion sort in this book for a minor
dierent one
692 APPENDIX A. LISTS
H L
while Rest(L) = First(Rest(L)) < x do
L Rest(L)
Rest(L) Cons(x, Rest(L))
return H
If either the list is empty, or the new element to be inserted is less than
the rst element in the list, we can just put this element as the new rst one;
Otherwise, we record the head, then traverse the list till a position, where x is
less than the rest of the sub-list, and put x in that position. Compare this one
to the insert at algorithm shown previously, the variable p uses to point to the
previous position during traversing is omitted by examine the sub-list instead
of current list. The following ISO C++ program implements this algorithm.
template<typename T>
List<T> insert(T x, List<T> xs) {
List<T> head;
if (!xs | | x < xskey)
return cons(x, xs);
for (head = xs; xsnext && xsnextkey < x; xs = xsnext);
xsnext = cons(x, xsnext);
return head;
}
With this linear time ordered insertion dened, its possible to implement
quadratic time insertion-sort by repeatedly inserting elements to an empty list
as formalized in this equation.
sort(L) =
_
: L =
insert(l
1
, sort(L

)) : otherwise
(A.12)
This equation says that if the list to be sorted is empty, the result is also
empty, otherwise, we can rstly recursively sort all elements except for the
rst one, then ordered insert the rst element to this intermediate result. The
corresponding Haskell program is given as below.
isort [] = []
isort (x:xs) = insert x (isort xs)
And the imperative linked-list base insertion sort is described in the follow-
ing. That we initialize the result list as empty, then take the element one by
one from the list to be sorted, and ordered insert them to the result list.
function Sort(L)
L


while L = do
L

Insert(First(L), L

)
L Rest(L)
return L

Note that, at any time during the loop, the result list is kept sorted. There is
a major dierence between the recursive algorithm (formalized by the equation)
and the procedural one (described by the pseudo code), that the former process
the list from right, while the latter from left. Well see in later section about
tail-recursion how to eliminate this dierence.
The ISO C++ version of linked-list insertion sort is list like this.
A.3. BASIC LIST MANIPULATION 693
template<typename T>
List<T> isort(List<T> xs) {
List<T> ys = NULL;
for(; xs; xs = xsnext)
ys = insert(xskey, ys);
return ys;
}
There is also a dedicated chapter discusses insertion sort in this book. Please
refer to that chapter for more details including performance analysis and ne-
tuning.
deletion
In purely functional settings, there is no deletion at all in terms of mutating, the
data is persist, what the semantic deletion means is actually to create a new
list with all the elements in previous one except for the element being deleted.
Similar to the insertion, there are also two deletion semantics. One is to
delete the element at a given position; the other is to nd and delete elements
of a given value. The rst can be expressed as delete(L, i), while the second is
delete(L, x).
In order to design the algorithm delete(L, i) (or delete at), we can use the
idea which is quite similar to random access and insertion, that we rst traverse
the list to the specied position, then construct the result list with the elements
we have traversed, and all the others except for the next one we havent traversed
yet.
The strategy can be realized in a recursive manner that in order to delete
the i-th element from list L,
If i is zero, that we are going to delete the rst element of a list, the result
is obviously the rest of the list;
If the list to be removed element is empty, the result is anyway empty;
Otherwise, we can recursively delete the (i 1)-th element from the sub-
list L

, then construct the nal result from the rst element of L and this
intermediate result.
Note there are two edge cases, and the second case is major used for error
handling. This algorithm can be formalized with the following equation.
delete(L, i) =
_
_
_
L

: i = 0
: L =
cons(l
1
, delete(L

, i 1)) :
(A.13)
Where L

= rest(L), l
1
= first(L). The corresponding Haskell example
program is given below.
del (_:xs) 0 = xs
del [] _ = []
del (x:xs) i = x : del xs (i-1)
This is a linear time algorithm as well, and there are also alternatives for
implementation, for example, we can rst split the list at position i 1, to get
2 sub-lists L
1
and L
2
, then we can concatenate L
1
and L

2
.
694 APPENDIX A. LISTS
The delete at algorithm can also be realized imperatively, that we traverse
to the position by looping:
function Delete(L, i)
if i = 0 then
return Rest(L)
H L
p L
while i = 0 do
i i 1
p L
L Rest(L)
Rest(p) Rest(L)
return H
Dierent from the recursive approach, The error handling for out-of-bound is
skipped. Besides that the algorithm also skips the handling of resource releasing
which is necessary in environments without GC (Garbage collection). Below ISO
C++ code for example, explicitly releases the node to be deleted.
template<typename T>
List<T> del(List<T> xs, int i) {
List<T> head, prev;
if (i == 0)
head = xsnext;
else {
for (head = xs; i; --i, xs = xsnext)
prev = xs;
prevnext = xsnext;
}
xsnext = NULL;
delete xs;
return head;
}
Note that the statement xs->next = NULL is neccessary if the destructor is
designed to release the whole linked-list recursively.
The nd and delete semantic can further be represented in two ways, one is
just nd the rst occurrence of a given value, and delete this element from the
list; The other is to nd ALL occurrence of this value, and delete these elements.
The later is more general case, and it can be achieved by a minor modication
of the former. We left the nd all and delete algorithm as an exercise to the
reader.
The algorithm can be designed exactly as the term nd and delete but
not nd then delete, that the nding and deleting are processed in one pass
traversing.
If the list to be dealt with is empty, the result is obviously empty;
If the list isnt empty, we examine the rst element of the list, if it is
identical to the given value, the result is the sub list;
Otherwise, we keep the rst element, and recursively nd and delete the
element in the sub list with the given value. The nal result is a list
constructed with the kept rst element, and the recursive deleting result.
A.3. BASIC LIST MANIPULATION 695
This algorithm can be formalized by the following equation.
delete(L, x) =
_
_
_
: L =
L

: l
1
= x
cons(l
1
, delete(L

, x)) : otherwise
(A.14)
This algorithm is bound to linear time as it traverses the list to nd and
delete element. Translating this equation to Haskell program yields the below
code, note that, the rst edge case is handled by pattern-matching the empty
list; while the other two cases are further processed by if-else expression.
del [] _ = []
del (x:xs) y = if x == y then xs else x : del xs y
Dierent from the above imperative algorithms, which skip the error han-
dling in most cases, the imperative nd and delete realization must deal with
the problem that the given value doesnt exist.
function Delete(L, x)
if L = then Empty list
return
if First(L) = x then
H Rest(L)
else
H L
while L = First(L) = x do List isnt empty
p L
L Rest(L)
if L = then Found
Rest(p) Rest(L)
return H
If the list is empty, the result is anyway empty; otherwise, the algorithm
traverses the list till either nds an element identical to the given value or to
the end of the list. If the element is found, it is removed from the list. The
following ISO C++ program implements the algorithm. Note that there are
codes release the memory explicitly.
template<typename T>
List<T> del(List<T> xs, T x) {
List<T> head, prev;
if (!xs)
return xs;
if (xskey == x)
head = xsnext;
else {
for (head = xs; xs && xskey != x; xs = xsnext)
prev = xs;
if (xs)
prevnext = xsnext;
}
if (xs) {
xsnext = NULL;
delete xs;
}
696 APPENDIX A. LISTS
return head;
}
concatenate
Concatenation can be considered as a general case for appending, that append-
ing only adds one more extra element to the end of the list, while concatenation
adds multiple elements.
However, It will lead to quadratic algorithm if implement concatenation
naively by appending, which performs poor. Consider the following equation.
concat(L
1
, L
2
) =
_
L
1
: L
2
=
concat(append(L
1
, first(L
2
)), rest(L
2
)) : otherwise
Note that each appending algorithm need traverse to the end of the list,
which is proportion to the length of L
1
, and we need do this linear time ap-
pending work |L
2
| times, so the total performance is O(|L
1
| + (|L
1
| + 1) +... +
(|L
1
| +|L
2
|)) = O(|L
1
||L
2
| +|L
2
|
2
).
The key point is that the linking operation of linked-list is fast (constant
O(1) time), we can traverse to the end of L
1
only one time, and link the second
list to the tail of L
1
.
concat(L
1
, L
2
) =
_
L
2
: L
1
=
cons(first(L
1
), concat(rest(L
1
), L
2
)) : otherwise
(A.15)
This algorithm only traverses the rst list one time to get the tail of L
1
, and
then linking the second list with this tail. So the algorithm is bound to linear
O(|L
1
|) time.
This algorithm is described as the following.
If the rst list is empty, the concatenate result is the second list;
Otherwise, we concatenate the second list to the sub-list of the rst one,
and construct the nal result with the rst element and this intermediate
result.
Most functional languages provide built-in functions or operators for list
concatenation, for example in ML families ++ is used for this purpose.
[] ++ ys = ys
xs ++ [] = xs
(x:xs) ++ ys = x : xs ++ ys
Note we add another edge case that if the second list is empty, we neednt
traverse to the end of the rst one and perform linking, the result is merely the
rst list.
In imperative settings, concatenation can be realized in constant O(1) time
with the augmented tail record. We skip the detailed implementation for this
method, reader can refer to the source code which can be download along with
this appendix.
The imperative algorithm without using augmented tail record can be de-
scribed as below.
A.3. BASIC LIST MANIPULATION 697
function Concat(L
1
, L
2
)
if L
1
= then
return L
2
if L
2
= then
return L
1
H L
1
while Rest(L
1
) = do
L
1
Rest(L
1
)
Rest(L
1
) L
2
return H
And the corresponding ISO C++ example code is given like this.
template<typename T>
List<T> concat(List<T> xs, List<T> ys) {
List<T> head;
if (!xs)
return ys;
if (!ys)
return xs;
for (head = xs; xsnext; xs = xsnext);
xsnext = ys;
return head;
}
A.3.7 sum and product
Recursive sum and product
It is common to calculate the sum or product of a list of numbers. They are
quite similar in terms of algorithm structure. Well see how to abstract such
structure in later section.
In order to calculate the sum of a list:
If the list is empty, the result is zero;
Otherwise, the result is the rst element plus the sum of the rest of the
list.
Formalize this description gives the following equation.
sum(L) =
_
0 : L =
l
1
+sum(L

) : otherwise
(A.16)
However, we cant merely replace plus to times in this equation to achieve
product algorithm, because it always returns zero. We can dene the product
of empty list as 1 to solve this problem.
product(L) =
_
1 : L =
l
1
product(L

) : otherwise
(A.17)
The following Haskell program implements sum and product.
698 APPENDIX A. LISTS
sum [] = 0
sum (x:xs) = x + sum xs
product [] = 1
product (x:xs) = x product xs
Both algorithms traverse the whole list during calculation, so they are bound
to O(N) linear time.
Tail call recursion
Note that both sum and product algorithms actually compute the result from
right to left. We can change them to the normal way, that calculate the accu-
mulated result from left to right. For example with sum, the result is actually
accumulated from 0, and adds element one by one to this accumulated result
till all the list is consumed. Such approach can be described as the following.
When accumulate result of a list by summing:
If the list is empty, we are done and return the accumulated result;
Otherwise, we take the rst element from the list, accumulate it to the
result by summing, and go on processing the rest of the list.
Formalize this idea to equation yields another version of sum algorithm.
sum

(A, L) =
_
A : L =
sum

(A+l
1
, L

) : otherwise
(A.18)
And sum can be implemented by calling this function by passing start value
0 and the list as arguments.
sum(L) = sum

(0, L) (A.19)
The interesting point of this approach is that, besides it calculates the result
in a normal order from left to right; by observing the equation of sum

(A, L),
we found it neednt remember any intermediate results or states when perform
recursion. All such states are either passed as arguments (A for example) or
can be dropped (previous elements of the list for example). So in a practical
implementation, such kind of recursive function can be optimized by eliminating
the recursion at all.
We call such kind of function as tail recursion (or tail call), and the op-
timization of removing recursion in this case as tail recursion optimization[10]
because the recursion happens as the nal action in such function. The ad-
vantage of tail recursion optimization is that the performance can be greatly
improved, so that we can avoid the issue of stack overow in deep recursion
algorithms such as sum and product.
Changing the sum and product Haskell programs to tail-recursion manner
gives the following modied programs.
sum = sum 0 where
sum acc [] = acc
sum acc (x:xs) = sum (acc + x) xs
product = product 1 where
product acc [] = acc
product acc (x:xs) = product (acc x) xs
A.3. BASIC LIST MANIPULATION 699
In previous section about insertion sort, we mentioned that the functional
version sorts the elements form right, this can also be modied to tail recursive
realization.
sort

(A, L) =
_
A : L =
sort

(insert(l
1
, A), L

) : otherwise
(A.20)
The the sorting algorithm is just calling this function by passing empty list
as the accumulator argument.
sort(L) = sort

(, L) (A.21)
Implementing this tail recursive algorithm to real program is left as exercise
to the reader.
As the end of this sub-section, lets consider an interesting problem, that
how to design an algorithm to compute b
N
eectively? (refer to problem 1.16
in [5].)
A naive brute-force solution is to repeatedly multiply b for N times from 1,
which leads to a linear O(N) algorithm.
function Pow(b, N)
x 1
loopN times
x x b
return x
Actually, the solution can be greatly improved. Consider we are trying to
calculate b
8
. By the rst 2 iterations in above naive algorithm, we got x = b
2
.
At this stage, we neednt multiply x with b to get b
3
, we can directly calculate
x
2
, which leads to b
4
. And if we do this again, we get (b
4
)
2
= b
8
. Thus we only
need looping 3 times but not 8 times.
An algorithm based on this idea to compute b
N
if N = 2
M
for some non-
negative integer M can be shown in the following equation.
pow(b, N) =
_
b : N = 1
pow(b,
N
2
)
2
: otherwise
It is easy to extend this divide and conquer algorithm so that N can be any
non-negative integer.
For the trivial case, that N is zero, the result is 1;
If N is even number, we can halve N, and compute b
N
2
rst. Then calculate
the square number of this result.
Otherwise, N is odd. Since N1 is even, we can rst recursively compute
b
N1
, the multiply b one more time to this result.
Below equation formalizes this description.
pow(b, N) =
_
_
_
1 : N = 0
pow(b,
N
2
)
2
: 2|N
b pow(b, N 1) : otherwise
(A.22)
700 APPENDIX A. LISTS
However, its hard to turn this algorithm to be tail-recursive mainly because
of the 2nd clause. In fact, the 2nd clause can be alternatively realized by
squaring the base number, and halve the exponent.
pow(b, N) =
_
_
_
1 : N = 0
pow(b
2
,
N
2
) : 2|N
b pow(b, N 1) : otherwise
(A.23)
With this change, its easy to get a tail-recursive algorithm as the following,
so that b
N
= pow

(b, N, 1).
pow

(b, N, A) =
_
_
_
A : N = 0
pow

(b
2
,
N
2
, A) : 2|N
pow

(b, N 1, Ab) : otherwise


(A.24)
Compare to the naive brute-force algorithm, we improved the performance
to O(lg N). Actually, this algorithm can be improved even one more step.
Observe that if we represent N in binary format N = (a
m
a
m1
...a
1
a
0
)
2
, we
clear know that the computation for b
2
i
is necessary if a
i
= 1. This is quite
similar to the idea of Binomial heap (reader can refer to the chapter of binomial
heap in this book). Thus we can calculate the nal result by multiplying all of
them for bits with value 1.
For instance, when we compute b
11
, as 11 = (1011)
2
= 2
3
+ 2 + 1, thus
b
11
= b
2
3
b
2
b. We can get the result by the following steps.
1. calculate b
1
, which is b;
2. Get b
2
from previous result;
3. Get b
2
2
from step 2;
4. Get b
2
3
from step 3.
Finally, we multiply the results of step 1, 2, and 4 which yields b
11
.
Summarize this idea, we can improve the algorithm as below.
pow

(b, N, A) =
_
_
_
A : N = 0
pow

(b
2
,
N
2
, A) : 2|N
pow

(b
2
,
N
2
, Ab) : otherwise
(A.25)
This algorithm essentially shift N to right for 1 bit each time (by dividing N
by 2). If the LSB (Least Signicant Bit, which is the lowest bit) is 0, it means
N is even. It goes on computing the square of the base, without accumulating
the nal product (Just like the 3rd step in above example); If the LSB is 1, it
means N is odd. It squares the base and accumulates it to the product A; The
edge case is when N is zero, which means we exhaust all the bits in N, thus the
nal result is the accumulator A. At any time, the updated base number b

, the
shifted exponent number N

, and the accumulator A satisfy the invariant that


b
N
= b
N

A.
This algorithm can be implemented in Haskell like the following.
A.3. BASIC LIST MANIPULATION 701
pow b n = pow b n 1 where
pow b n acc | n == 0 = acc
| even n = pow (bb) (n div 2) acc
| otherwise = pow (bb) (n div 2) (accb)
Compare to previous algorithm, which minus N by one to change it to even
when N is odd, this one halves N every time. It exactly runs m rounds, where m
is the number of bits of N. However, the performance is still bound to O(lg N).
How to implement this algorithm imperatively is left as exercise to the reader.
Imperative sum and product
The imperative sum and product are just applying plus and times while travers-
ing the list.
function Sum(L)
s 0
while L = do
s s+ First(L)
L Rest(L)
return s
function Product(L)
p 1
while L = do
p p First(L)
L Rest(L)
return p
The corresponding ISO C++ example programs are list as the following.
template<typename T>
T sum(List<T> xs) {
T s;
for (s = 0; xs; xs = xsnext)
s += xskey;
return s;
}
template<typename T>
T product(List<T> xs) {
T p;
for (p = 1; xs; xs = xsnext)
p = xskey;
return p;
}
One interesting usage of product algorithm is that we can calculate factorial
of N by calculating the product of {1, 2, ..., N} that N! = product([1..N]).
A.3.8 maximum and minimum
Another very useful use case is to get the minimum or maximum element of
a list. Well see that their algorithm structures are quite similar again. Well
702 APPENDIX A. LISTS
generalize this kind of feature and introduce about higher level abstraction in
later section. For both maximum and minimum algorithms, we assume that the
given list isnt empty.
In order to nd the minimum element in a list.
If the list contains only one element, (a singleton list), the minimum ele-
ment is this one;
Otherwise, we can rstly nd the minimum element of the rest list, then
compare the rst element with this intermediate result to determine the
nal minimum value.
This algorithm can be formalized by the following equation.
min(L) =
_
_
_
l
1
: L = {l
1
}
l
1
: l
1
min(L

)
min(L

) : otherwise
(A.26)
In order to get the maximum element instead of the minimum one, we can
simply replace the comparison to in the above equation.
max(L) =
_
_
_
l
1
: L = {l
1
}
l
1
: l
1
max(L

)
max(L

) : otherwise
(A.27)
Note that both maximum and minimum actually process the list from right
to left. It remind us about tail recursion. We can modify them so that the list
is processed from left to right. Whats more, the tail recursion version brings us
on-line algorithm, that at any time, we hold the minimum or maximum result
of the list we examined so far.
min

(L, a) =
_
_
_
a : L =
min(L

, l
1
) : l
1
< a
min(L

, a) : otherwise
(A.28)
max

(L, a) =
_
_
_
a : L =
max(L

, l
1
) : a < l
1
max(L

, a) : otherwise
(A.29)
Dierent from the tail recursion sum and product, we cant pass constant
value to min

, or max

in practice, this is because we have to pass innity


(min(L, )) or negative innity (max(L, )) in theory, but in a real machine
neither of them can be represented since the length of word is limited.
Actually, there is workaround, we can instead pass the rst element of the
list, so that the algorithms become applicable.
min(L) = min(L

, l
1
)
max(L) = max(L

, l
1
)
(A.30)
The corresponding real programs are given as the following. We skip the
none tail recursion programs, as they are intuitive enough. Reader can take
them as exercises for interesting.
A.3. BASIC LIST MANIPULATION 703
min (x:xs) = min xs x where
min [] a = a
min (x:xs) a = if x < a then min xs x else min xs a
max (x:xs) = max xs x where
max [] a = a
max (x:xs) a = if a < x then max xs x else max xs a
The tail call version can be easily translated to imperative min/max algo-
rithms.
function Min(L)
m First(L)
L Rest(L)
while L = do
if First(L) < m then
m First(L)
L Rest(L)
return m
function Max(L)
m First(L)
L Rest(L)
while L = do
if m < First(L) then
m First(L)
L Rest(L)
return m
The corresponding ISO C++ programs are given as below.
template<typename T>
T min(List<T> xs) {
T x;
for (x = xskey; xs; xs = xsnext)
if (xskey < x)
x = xskey;
return x;
}
template<typename T>
T max(List<T> xs) {
T x;
for (x = xskey; xs; xs = xsnext)
if (x < xskey)
x = xskey;
return x;
}
Another method to achieve tail-call maximum( and minimum) algorithm is
by discarding the smaller element each time. The edge case is as same as before;
for recursion case, since there are at least two elements in the list, we can take
the rst two for comparing, then drop one and go on process the rest. For a list
with more than two elements, denote L

as rest(rest(L)) = {l
3
, l
4
, ...}, we have
704 APPENDIX A. LISTS
the following equation.
max(L) =
_
_
_
l
1
: |L| = 1
max(cons(l
1
, L

)) : l
2
< l
1
max(L

) : otherwise
(A.31)
min(L) =
_
_
_
l
1
: |L| = 1
min(cons(l
1
, L

)) : l
1
< l
2
min(L

) : otherwise
(A.32)
The relative example Haskell programs are given as below.
min [x] = x
min (x:y:xs) = if x < y then min (x:xs) else min (y:xs)
max [x] = x
max (x:y:xs) = if x < y then max (y:ys) else max (x:xs)
Exercise A.1
Given two lists L
1
and L
2
, design a algorithm eq(L
1
, L
2
) to test if they
are equal to each other. Here equality means the lengths are same, and
at the same time, every elements in both lists are identical.
Consider varies of options to handle the out-of-bound error case when
randomly access the element in list. Realize them in both imperative
and functional programming languages. Compare the solutions based on
exception and error code.
Augment the list with a tail eld, so that the appending algorithm can
be realized in constant O(1) time but not linear O(N) time. Feel free to
choose your favorite imperative programming language. Please dont refer
to the example source code along with this book before you try it.
With tail eld augmented to list, for which list operations this eld must
be updated? How it aects the performance?
Handle the out-of-bound case in insertion algorithm by treating it as ap-
pending.
Write the insertion sort algorithm by only using less than (<).
Design and implement the algorithm that nd all the occurrence of a given
value and delete them from the list.
Reimplenent the algorithm to calculate the length of a list in tail-call
recursion manner.
Implement the insertion sort in tail recursive manner.
Implement the O(lg N) algorithm to calculate b
N
in your favorite imper-
ative programming language. Note that we only need accumulate the
intermediate result when the bit is not zero.
A.4. TRANSFORMATION 705
A.4 Transformation
In previous section, we list some basic operations for linked-list. In this section,
we focus on the transformation algorithms for list. Some of them are corner
stones of abstraction for functional programming. Well show how to use list
transformation to solve some interesting problems.
A.4.1 mapping and for-each
It is every-day programming routine that, we need output something as readable
string. If we have a list of numbers, and we want to print the list to console
like 3 1 2 5 4. One option is to convert the numbers to strings, so that we can
feed them to the printing function. One such trivial conversion program may
like this.
toStr(L) =
_
: L =
cons(str(l
1
), toStr(L

)) : otherwise
(A.33)
The other example is that we have a dictionary which is actually a list of
words grouped in their initial letters, for example: [[a, an, another, ... ], [bat,
bath, bool, bus, ...], ..., [zero, zoo, ...]]. We want to know the frequency of them
in English, so that we process some English text, for example, Hamlet or the
Bible and augment each of the word with a number of occurrence in these
texts. Now we have a list like this:
[[(a, 1041), (an, 432), (another, 802), ... ],
[(bat, 5), (bath, 34), (bool, 11), (bus, 0), ...],
...,
[(zero 12), (zoo, 0), ...]]
If we want to nd which word in each initial is used most, how to write a
program to work this problem out? The output is a list of words that every one
has the most occurrence in the group, which is categorized by initial, something
like [a, but, can, ...]. We actually need a program which can transfer a list of
group of augmented words into a list of words.
Lets work it out step by step. First, we need dene a function, which takes
a list of word - number pairs, and nd the word has the biggest number aug-
mented. Sorting is overkill. What we need is just a special max

() function, Note
that the max() function developed in previous section cant be used directly.
Suppose for a pair of values p = (a, b), function fst(p) = a, and snd(p) = b are
accessors to extract the values, max

() can be dened as the following.


max

(L) =
_
_
_
l
1
: |L| = 1
l
1
: snd(max

(L

)) < snd(l
1
)
max

(L

) : otherwise
(A.34)
Alternatively, we can dene a dedicated function to compare word-number
of occurrence pair, and generalize the max() function by passing a compare
function.
less(p
1
, p
2
) = snd(p
1
) < snd(p
2
) (A.35)
706 APPENDIX A. LISTS
maxBy(cmp, L) =
_
_
_
l
1
: |L| = 1
l
1
: cmp(l
1
, maxBy(cmp, L

))
maxBy(cmp, L

) : otherwise
(A.36)
Then max

() is just a special case of maxBy() with the compare function


comparing on the second value in a pair.
max

(L) = maxBy(less, L) (A.37)


Here we write all functions in purely recursive way, they can be modied in
tail call manner. This is left as exercise to the reader.
With max

() function dened, its possible to complete the solution by pro-


cessing the whole list.
solve(L) =
_
: L =
cons(fst(max

(l
1
)), solve(L

)) : otherwise
(A.38)
Map
Compare the solve() function in (A.38) and toStr() function in (A.33), it re-
veals very similar algorithm structure. although they targets on very dierent
problems, and one is trivial while the other is a bit complex.
The structure of toStr() applies the function str() which can turn a number
into string on every element in the list; while solve() rst applies max

() function
to every element (which is actually a list of pairs), then applies fst() function,
which essentially turns a list of pairs into a string. It is not hard to abstract
such common structure like the following equation, which is called as mapping.
map(f, L) =
_
: L =
cons(f(l
1
)), map(f, L

)) : otherwise
(A.39)
Because map takes a converter function f as argument, its called a kind of
high-order function. In functional programming environment such as Haskell,
mapping can be implemented just like the above equation.
map :: (ab)[a][b]
map _ [] = []
map f (x:xs) = f x : map f xs
The two concrete cases we discussed above can all be represented in high
order mapping.
toStr = map str
solve = map (fst max

)
Where f g means function composing, that we rst apply g then apply f.
For instance function h(x) = f(g(x)) can be represented as h = f g, reading
like function h is composed by f and g. Note that we use Curried form to omit
the argument L for brevity. Informally speaking, If we feed a function which
needs 2 arguments, for instance f(x, y) = z with only 1 argument, the result
turns to be a function which need 1 argument. For instance, if we feed f with
A.4. TRANSFORMATION 707
only argument x, it turns to be a new function take one argument y, dened as
g(y) = f(x, y), or g = fx. Note that x isnt a free variable any more, as it is
bound to a value. Reader can refer to any book about functional programming
for details about function composing and Currying.
Mapping can also be understood from the domain theory point of view.
Consider function y = f(x), it actually denes a mapping from domain of
variable x to the domain of value y. (x and y can have dierent types). If the
domains can be represented as set X, and Y , we have the following relation.
Y = {f(x)|x X} (A.40)
This type of set denition is called Zermelo Frankel set abstraction (as known
as ZF expression) [7]. The dierent is that here the mapping is from a list to
another list, so there can be duplicated elements. In languages support list
comprehension, for example Haskell and Python etc (Note that the Python list
is a built-in type, but not the linked-list we discussed in this appendix), mapping
can be implemented as a special case of list comprehension.
map f xs = [ f x | x xs]
List comprehension is a powerful tool. Here is another example that realizes
the permutation algorithm in list comprehension. Many textbooks introduce
how to implement all-permutation for a list, such as [7], and [9]. It is possible
to design a more general version perm(L, r), that if the length of the list L is
N, this algorithm permutes r elements from the total N elements. We know
that there are P
r
N
=
N!
(Nr)!
solutions.
perm(L, r) =
_
{} : r = 0 |L| < r
{{l} P|l L, P perm(L {l}, r 1)} : otherwise
(A.41)
In this equation, {l} P means cons(l, P), and L{l} denotes delete(L, l),
which is dened in previous section. If we take zero element for permutation,
or there are too few elements (less than r), the result is a list contains a empty
list; Otherwise for non-trivial case, the algorithm picks one element l from the
list, and recursively permutes the rest N 1 elements by picking up r 1 ones;
then it puts all the possible l in front of all the possible r 1 permutations.
Here is the Haskell implementation of this algorithm.
perm _ 0 = [[]]
perm xs r | length xs < r = [[]]
| otherwise = [ x:ys | x xs, ys perm (delete x xs) (r-1)]
Well go back to the list comprehension later in section about ltering.
Mapping can also be realized imperatively. We can apply the function while
traversing the list, and construct the new list from left to right. Since that the
new element is appended to the result list, we can track the tail position to
achieve constant time appending, so the mapping algorithms is linear in terms
of the passed in function.
function Map(f, L)
L


p
while L = do
708 APPENDIX A. LISTS
if p = then
p Cons(f( First(L) ), )
L

p
else
Next(p) Cons(f( First(L) ), )
p Next(p)
L Next(L)
return L

Because It is a bit complex to annotate the type of the passed-in function


in ISO C++, as it involves some detailed language specic features. See [11]
for detail. In fact ISO C++ provides the very same mapping concept as in
std::transform. However, it needs the reader have knowledge of function
object, iterator etc, which are out of the scope of this book. Reader can refer
to any ISO C++ STL materials for detail.
For brevity purpose, we switch to Python programming language for example
code. So that the type inference can be avoid in compile time. The denition
of a simple singly linked-list in Python is give as the following.
class List:
def __init__(self, x = None, xs = None):
self.key = x
self.next = xs
def cons(x, xs):
return List(x, xs)
The mapping program, takes a function and a linked-list, and maps the
functions to every element as described in above algorithm.
def mapL(f, xs):
ys = prev = List()
while xs is not None:
prev.next = List(f(xs.key))
prev = prev.next
xs = xs.next
return ys.next
Dierent from the pseudo code, this program uses a dummy node as the head
of the resulting list. So it neednt test if the variable stores the last appending
position is NIL. This small trick makes the program compact. We only need
drop the dummy node before returning the result.
For each
For the trivial task such as printing a list of elements out, its quite OK to just
print each element without converting the whole list to a list of strings. We can
actually simplify the program.
function Print(L)
while L = do
print First(L)
L Rest(L)
More generally, we can pass a procedure such as printing, to this list traverse,
so the procedure is performed for each element.
A.4. TRANSFORMATION 709
function For-Each(L, P)
while L = do
P(First(L))
L Rest(L)
For-each algorithm can be formalized in recursive approach as well.
foreach(L, p) =
_
u : L =
do(p(l
1
), foreach(L

, p)) : otherwise
(A.42)
Here u means unit, its can be understood as doing nothing, The type of it
is similar to the void concept in C or java like programming languages. The
do() function evaluates all its arguments, discards all the results except for the
last one, and returns the last result as the nal value of do(). It is equivalent to
(begin ...) in Lisp families, and do block in Haskell in some sense. For the
details about unit type, please refer to [4].
Note that the for-each algorithm is just a simplied mapping, there are only
two minor dierence points:
It neednt form a result list, we care the side eect rather than the
returned value;
For each focus more on traversing, while mapping focus more on applying
function, thus the order of arguments are typically arranged as map(f, L)
and foreach(L, p).
Some Functional programming facilities provides options for both returning
the result list or discarding it. For example Haskell Monad library provides both
mapM, mapM_ and forM, forM_. Readers can refer to language specic materials
for detail.
Examples for mapping
Well show how to use mapping by an example, which is a problem of ACM/ICPC[12].
For sake of brevity, we modied the problem description a bit. Suppose there
are N lights in a room, all of them are o. We execute the following process N
times:
1. We switch all the lights in the room, so that they are all on;
2. We switch the 2, 4, 6, ... lights, that every other light is switched, if the
light is on, it will be o, and it will be on if the previous state is o;
3. We switch every third lights, that the 3, 6, 9, ... are switched;
4. ...
And at the last round, only the last light (the N-th light) is switched.
The question is how many lights are on nally?
Before we show the best answer to this puzzle, lets rst work out a naive
brute-force solution. Suppose there are N lights, which can be represented as a
list of 0, 1 numbers, where 0 means the light is o, and 1 means on. The initial
state is a list of N zeros: {0, 0, ..., 0}.
710 APPENDIX A. LISTS
We can label the light from 1 to N. A mapping can help us to turn the
above list into a labeled list
6
.
map(
i
(i, 0), {1, 2, 3, ...N})
This mapping augments each natural number with zero, the result is a list
of pairs: L = {(1, 0), (2, 0), ..., (N, 0)}.
Next we operate this list of pairs N times from 1 to N. For every time i,
we switch the second value in this pair if the rst label can be divided by i.
Consider the fact that 1 0 = 1, and 1 1 = 0, we can realize switching of 0, 1
value x by 1 x. At the i-th operation, for light (j, x), if i|j, (or j mod i = 0),
we then perform switching, otherwise, we leave the light untouched.
switch(i, (j, x)) =
_
(j, 1 x) : j mod i = 0
(j, x) : otherwise
(A.43)
The i-th operation on all lights can be realized as mapping again:
map(switch(i), L) (A.44)
Note that, here we use Curried form of switch() function, which is equivalent
to
map(
(j,x)
switch(i, (j, x)), L)
Here we need dene a function proc(), which can perform the above mapping
on L over and over by N times. One option is to realize it in purely recursive
way as the following, so that we can call it like proc({1, 2, ..., N}, L)
7
.
proc(I, L) =
_
L : I =
operate(I

, map(switch(i
1
), L)) : otherwise
(A.45)
Where I = cons(i
1
, I

) if I isnt empty.
At this stage, we can sum the second value of each pair in list L to get the
answer. The sum function has been dened in previous section, so the only
thing left is mapping.
solve(N) = sum(map(snd, proc({1, 2, ..., N}, L))) (A.46)
Translating this naive brute-force solution to Haskell yields below program.
solve = sum (map snd) proc where
proc n = operate [1..n] $ map (i (i, 0)) [1..n]
operate [] xs = xs
operate (i:is) xs = operate is (map (switch i) xs)
Lets see whats the answer for there are 1, 2, ..., 100 lights.
[1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,
8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,10]
6
Readers who are familiar with functional programming, may use zipping to achieve this.
Well explain zipping in later section.
7
This can also be realized by folding, which will be explained in later section.
A.4. TRANSFORMATION 711
This result is interesting:
the rst 3 answers are 1;
the 4-th to the 8-th answers are 2;
the 9-th to the 15-th answers are 3;
...
It seems that the i
2
-th to the ((i + 1)
2
1)-th answers are i. Actually, we
can prove this fact as the following.
Proof. Given N lights, labeled from 1 to N, consider which lights are on nally.
Since the initial states for all lights are o, we can say that, the lights which
are manipulated odd times are on. For every light i, it will be switched at the
j round if i can be divided by j (denote as j|i). So only the lights which have
odd number of factors are on at the end.
So the key point to solve this puzzle, is to nd all numbers which have odd
number of factors. For any positive integer N, denote S the set of all factors of
N. S is initialized to . if p is a factor of N, there must exist a positive integer
q that N = pq, which means q is also a factor of N. So we add 2 dierent factors
to the set S if and only if p = q, which keeps |S| even all the time unless p = q.
In such case, N is a perfect square number, and we can only add 1 factor to the
set S, which leads to an odd number of factors.
At this stage, we can design a fast solution by nding the number of perfect
square numbers under N.
solve(N) =

N (A.47)
The next Haskell command veries that the answer for 1, 2, ..., 100 lights
are as same as above.
map (floor.sqrt) [1..100]
[1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,
8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,10]
Mapping is generic concept that it doesnt only limit in linked-list, but also
can be applied to many complex data structures. The chapter about binary
search tree in this book explains how to map on trees. As long as we can
traverse a data structure in some order, and the empty data structure can be
identied, we can use the same mapping idea. Well return to this kind of
high-order concept in the section of folding later.
A.4.2 reverse
How to reverse a singly linked-list with minimum space is a popular techni-
cal interview problem in some companies. The pointer manipulation must be
arranged carefully in imperative programming languages such as ANSI C. How-
ever, well show that, there exists an easy way to write this program:
1. Firstly, write a pure recursive straightforward solution;
712 APPENDIX A. LISTS
2. Then, transform the pure recursive solution to tail-call manner;
3. Finally, translate the tail-call solution to pure imperative pointer opera-
tions.
The pure recursive solution is simple enough that we can write it out imme-
diately. In order to reverse a list L.
If L is empty, the reversed result is empty. This is the trivial edge case;
Otherwise, we can rst reverse the rest of the sub-list, then append the
rst element to the end.
This idea can be formalized to the below equation.
reverse(L) =
_
: L =
append(reverse(L

), l
1
) : otherwise
(A.48)
Translating it to Haskell yields below program.
reverse [] = []
reverse (x:xs) = reverse xs ++ [x]
However, this solution doesnt perform well, as appending has to traverse
to the end of list, which leads to a quadratic time algorithm. It is not hard to
improve this program by changing it to tail-call manner. That we can use a accu-
mulator to store the intermediate reversed result, and initialize the accumulated
result as empty. So the algorithm is formalized as reverse(L) = reverse

(L, ).
reverse

(L, A) =
_
A : L =
reverse

(L

, {l
1
} A) : otherwise
(A.49)
Where {l
1
}A means cons(l
1
, A). Dierent from appending, its a constant
O(1) time operation. The core idea is that we repeatedly take the element one
by one from the head of the original list, and put them in front the accumulated
result. This is just like we store all the elements in a stack, them pop them out.
This is a linear time algorithm.
Below Haskell program implements this tail-call version.
reverse [] acc = acc
reverse (x:xs) acc = reverse xs (x:acc)
Since the nature of tail-recursion call neednt book-keep any context (typi-
cally by stack), most modern compilers are able to optimize it to a pure imper-
ative loop, and reuse the current context and stack etc. Lets manually do this
optimization so that we can get a imperative algorithm.
function Reverse(L)
A
while L = do
A Cons(First(L), A)
L Rest(L)
A.5. EXTRACT SUB-LISTS 713
However, because we translate it directly from a functional solution, this
algorithm actually produces a new reversed list, but does not mutate the original
one. It is not hard to change it to an in-place solution by reusing L. For example,
the following ISO C++ program implements the in-place algorithm. It takes
O(1) memory space, and reverses the list in O(N) time.
template<typename T>
List<T> reverse(List<T> xs) {
List<T> p, ys = NULL;
while (xs) {
p = xs;
xs = xsnext;
pnext = ys;
ys = p;
}
return ys;
}
Exercise A.2
Implement the algorithm to nd the maximum element in a list of pair in
tail call approach in your favorite programming language.
A.5 Extract sub-lists
Dierent from arrays which are capable to slice a continuous segment fast and
easily, It needs more work to extract sub lists from singly linked list. Such
operations are typically linear algorithms.
A.5.1 take, drop, and split-at
Taking rst N elements from a list is semantically similar to extract sub list from
the very left like sublist(L, 1, N), where the second and the third arguments to
sublist are the positions the sub-list starts and ends. For the trivial edge case,
that either N is zero or the list is empty, the sub list is empty; Otherwise, we
can recursively take the rst N 1 elements from the rest of the list, and put
the rst element in front of it.
take(N, L) =
_
: L = N = 0
cons(l
1
, take(N 1, L

)) : otherwise
(A.50)
Note that the edge cases actually handle the out-of-bound error. The fol-
lowing Haskell program implements this algorithm.
take _ [] = []
take 0 _ = []
take n (x:xs) = x : take (n-1) xs
Dropping on the other hand, drops the rst N elements and returns the left
as result. It is equivalent to get the sub list from right like sublist(L, N+1, |L|),
714 APPENDIX A. LISTS
where |L| is the length of the list. Dropping can be designed quite similar to
taking by discarding the rst element in the recursive case.
drop(N, L) =
_
_
_
: L =
L : N = 0
drop(N 1, L

)) : otherwise
(A.51)
Translating the algorithm to Haskell gives the below example program.
drop _ [] = []
drop 0 L = L
drop n (x:xs) = drop (n-1) xs
The imperative taking and dropping are quite straight-forward, that they
are left as exercises to the reader.
With taking and dropping dened, extracting sub list at arbitrary position
for arbitrary length can be realized by calling them.
sublist(L, from, count) = take(count, drop(from1, L)) (A.52)
or in another semantics by providing left and right boundaries:
sublist(L, from, to) = drop(from1, take(to, L)) (A.53)
Note that the elements in range [from, to] is returned by this function, with
both ends included. All the above algorithms perform in linear time.
take-while and drop-while
Compare to taking and dropping, there is another type of operation, that we
either keep taking or dropping elements as far as a certain condition is met. The
taking and dropping algorithms can be viewed as special cases for take-while
and drop-while.
Take-while examines elements one by one as far as the condition is satised,
and ignore all the rest of elements even some of them satisfy the condition.
This is the dierent point from ltering which well explained in later section.
Take-while stops once the condition tests fail; while ltering traverses the whole
list.
takeWhile(p, L) =
_
_
_
: L =
: p(l
1
)
cons(l
1
, takeWhile(p, L

)) : otherwise
(A.54)
Take-while accepts two arguments, one is the predicate function p, which
can be applied to element in the list and returns Boolean value as result; the
other argument is the list to be processed.
It is easy to dene the drop-while symmetrically.
dropWhile(p, L) =
_
_
_
: L =
L : p(l
1
)
dropWhile(p, L

) : otherwise
(A.55)
The corresponding Haskell example programs are given as below.
A.5. EXTRACT SUB-LISTS 715
takeWhile _ [] = []
takeWhile p (x:xs) = if p x then x : takeWhile p xs else []
dropWhile _ [] = []
dropWhile p xs@(x:xs) = if p x then dropWhile p xs else xs
split-at
With taking and dropping dened, splitting-at can be realized trivially by calling
them.
splitAt(i, L) = (take(i, L), drop(i, L)) (A.56)
A.5.2 breaking and grouping
breaking
Breaking can be considered as a general form of splitting, instead of splitting
at a given position, breaking examines every element for a certain predicate,
and nds the longest prex of the list for that condition. The result is a pair of
sub-lists, one is that longest prex, the other is the rest.
There are two dierent breaking semantics, one is to pick elements satisfying
the predicate as long as possible; the other is to pick those dont satisfy. The
former is typically dened as span, while the later as break.
Span can be described, for example, in such recursive manner: In order to
span a list L for predicate p:
If the list is empty, the result for this edge trivial case is a pair of empty
lists (, );
Otherwise, we test the predicate against the rst element l
1
, if l
1
satises
the predicate, we denote the intermediate result for spanning the rest of
list as (A, B) = span(p, L

), then we put l
1
in front of A to get pair
({l
1
} A, B), otherwise, we just return (, L) as the result.
For breaking, we just test the negate of predicate and all the others are as
same as spanning. Alternatively, one can dene breaking by using span as in
the later example program.
span(p, L) =
_
_
_
(, ) : L =
({l
1
} A, B) : p(l
1
) = True, (A, B) = span(p, L

)
(, L) : otherwise
(A.57)
break(p, L) =
_
_
_
(, ) : L =
({l
1
} A, B) : p(l
1
), (A, B) = break(p, L

)
(, L) : otherwise
(A.58)
Note that both functions only nd the longest prex, they stop immediately
when the condition is fail even if there are elements in the rest of the list meet
the predicate (or not). Translating them to Haskell gives the following example
program.
716 APPENDIX A. LISTS
span _ [] = ([], [])
span p xs@(x:xs) = if p x then let (as, bs) = span xs in (x:as, bs) else ([], xs)
break p = span (not p)
Span and break can also be realized imperatively as the following.
function Span(p, L)
A
while L = p(l
1
) do
A Cons(l
1
, A)
L Rest(L)
return (A, L)
function Break(p, L)
return Span(p, L)
This algorithm creates a new list to hold the longest prex, another option is
to turn it into in-place algorithm to reuse the spaces as in the following Python
example.
def span(p, xs):
ys = xs
last = None
while xs is not None and p(xs.key):
last = xs
xs = xs.next
if last is None:
return (None, xs)
last.next = None
return (ys, xs)
Note that both span and break need traverse the list to test the predicate,
thus they are linear algorithms bound to O(N).
grouping
Grouping is a commonly used operation to solve the problems that we need
divide the list into some small groups. For example, Suppose we want to group
the string Mississippi, which is actual a list of char { M, s, s, i, s, s,
i, p, p, i}. into several small lists in sequence, that each one contains
consecutive identical characters. The grouping operation is expected to be:
group(Mississippi) = { M, i, ss, i, ss, i, pp, i}
Another example, is that we have a list of numbers:
L = {15, 9, 0, 12, 11, 7, 10, 5, 6, 13, 1, 4, 8, 3, 14, 2}
We want to divide it into several small lists, that each sub-list is ordered
descending. The grouping operation is expected to be :
group(L) = {{15, 9, 0}, {12, 11, 7}, {10, 5}, {6}, {13, 1}, {4}, {8, 3}, {14, 2}}
A.5. EXTRACT SUB-LISTS 717
Both cases play very important role in real algorithms. The string grouping
is used in creating Trie/Patricia data structure, which is a powerful tool in string
searching area; The ordered sub-list grouping can be used in nature merge sort.
There are dedicated chapters in this book explain the detail of these algorithms.
It is obvious that we need abstract the grouping condition so that we know
where to break the original list into small ones. This predicate can be passed to
the algorithm as an argument like group(p, L), where predicate p accepts two
consecutive elements and test if the condition matches.
The rst idea to solve the grouping problem is traversing takes two elements
at each time, if the predicate test succeeds, put both elements into a small
group; otherwise, only put the rst one into the group, and use the second one
to initialize another new group. Denote the rst two elements (if there are) are
l
1
, l
2
, and the sub-list without the rst element as L

. The result is a list of list


G = {g
1
, g
2
, ...}, denoted as G = group(p, L).
group(p, L) =
_

_
{} : L =
{{l
1
}} : |L| = 1
{{l
1
} g

1
, g

2
, ...} : p(l
1
, l
2
), G

= group(p, L

) = {g

1
, g

2
, ...}
{{l
1
}, g

1
, g

2
, ...} : otherwise
(A.59)
Note that {l
1
} g

1
actually means cons(l
1
, g

1
), which performs in constant
time. This is a linear algorithm performs proportion to the length of the list, it
traverses the list in one pass which is bound to O(N). Translating this program
to Haskell gives the below example code.
group _ [] = [[]]
group _ [x] = [[x]]
group p (x:xs@(x:_)) | p x x = (x:ys):yss
| otherwise = [x]:r
where
r@(ys:yss) = group p xs
It is possible to implement this algorithm in imperative approach, that we
initialize the result groups as {l
1
} if L isnt empty, then we traverse the list from
the second one, and append to the last group if the two consecutive elements
satisfy the predicate; otherwise we start a new group.
function Group(p, L)
if L = then
return {}
x First(L)
L Rest(L)
g {x}
G {g}
while L = do
y First(L)
if p(x, y) then
g Append(g, y)
else
g {y}
G Append(G, g)
718 APPENDIX A. LISTS
x y
L Next(L)
return G
However, dierent from the recursive algorithm, this program performs in
quadratic time if the appending function isnt optimized by storing the tail
position. The corresponding Python program is given as below.
def group(p, xs):
if xs is None:
return List(None)
(x, xs) = (xs.key, xs.next)
g = List(x)
G = List(g)
while xs is not None:
y = xs.key
if p(x, y):
g = append(g, y)
else:
g = List(y)
G = append(G, g)
x = y
xs = xs.next
return G
With the grouping function dened, the two example cases mentioned at the
beginning of this section can be realized by passing dierent predictions.
group(=, {m, i, s, s, i, s, s, i, p, p, i}) = {{M}, {i}, {ss}, {i}, {ss}, {i}, {pp}, {i}}
group(, {15, 9, 0, 12, 11, 7, 10, 5, 6, 13, 1, 4, 8, 3, 14, 2})
= {{15, 9, 0}, {12, 11, 7}, {10, 5}, {6}, {13, 1}, {4}, {8, 3}, {14, 2}}
Another solution is to use the span function we have dened to realize group-
ing. We pass a predicate to span, which will break the list into two parts: The
rst part is the longest sub-list satisfying the condition. We can repeatedly
apply the span with the same predication to the second part, till it becomes
empty.
However, the predicate function we passed to span is an unary function, that
it takes an element as argument, and test if it satises the condition. While in
grouping algorithm, the predicate function is a binary function. It takes two
adjacent elements for testing. The solution is that, we can use currying and pass
the rst element to the binary predicate, and use it to test the rest of elements.
group(p, L) =
_
{} : L =
{{l
1
} A} group(p, B) : otherwise
(A.60)
Where (A, B) = span(
x
p(l
1
, x), L

) is the result of spanning on the rest


sub-list of L.
Although this new dened grouping function can generate correct result for
the rst case as in the following Haskell code snippet.
A.5. EXTRACT SUB-LISTS 719
groupBy (==) "Mississippi"
["m","i","ss","i","ss","i","pp","i"]
However, it seems that this algorithm cant group the list of numbers into
ordered sub lists.
groupBy () [15, 9, 0, 12, 11, 7, 10, 5, 6, 13, 1, 4, 8, 3, 14, 2]
[[15,9,0,12,11,7,10,5,6,13,1,4,8,3,14,2]]
The reason is because that, as the rst element 15 is used as the left param-
eter to operator for span, while 15 is the maximum value in this list, so the
span function ends with putting all elements to A, and B is left empty. This
might seem a defect, but it is actually the correct behavior if the semantic is to
group equal elements together.
Strictly speaking, the equality predicate must satisfy three properties: re-
exive, transitive, and symmetric. They are specied as the following.
Reexive. x = x, which says that any element is equal to itself;
Transitive. x = y, y = z x = z, which says that if two elements are
equal, and one of them is equal to another, then all the tree are equal;
Symmetric. x = y y = x, which says that the order of comparing two
equal elements doesnt aect the result.
When we group character list Mississippi, the equal (=) operator is used,
which obviously conforms these three properties. So that it generates correct
grouping result. However, when passing () as equality predicate, to group a
list of numbers, it violets both reexive and symmetric properties, that is reason
why we get wrong grouping result.
This fact means that the second algorithm we designed by using span, limits
the semantic to strictly equality, while the rst one does not. It just tests the
condition for every two adjacent elements, which is much weaker than equality.
Exercise A.3
1. Implement the in-place imperative taking and dropping algorithms in your
favorite programming language, note that the out of bound cases should
be handled. Please try both languages with and without GC (Garbage
Collection) support.
2. Implement take-while and drop-while in your favorite imperative program-
ming language. Please try both dynamic type language and static type
language (with and without type inference). How to specify the type of
predicate function as generic as possible in static type system?
3. Consider the following denition of span.
span(p, L) =
_
_
_
(, ) : L =
({l
1
} A, B) : p(l
1
) = True, (A, B) = span(p, L

)
(A, {l
1
} B) : otherwise
Whats the dierence between this algorithm and the the one weve shown
in this section?
720 APPENDIX A. LISTS
4. Implement the grouping algorithm by using span in imperative way in
your favorite programming language.
A.6 Folding
We are ready to introduce one of the most critical concept in high order pro-
gramming, folding. It is so powerful tool that almost all the algorithms so far
in this appendix can be realized by folding. Folding is sometimes be named as
reducing (the abstracted concept is identical to the buzz term map-reduce in
cloud computing in some sense). For example, both STL and Python provide
reduce function which realizes partial form of folding.
A.6.1 folding from right
Remind the sum and product denition in previous section, they are quite sim-
ilar actually.
sum(L) =
_
0 : L =
l
1
+sum(L

) : otherwise
product(L) =
_
1 : L =
l
1
product(L

) : otherwise
It is obvious that they have same structure. Whats more, if we list the
insertion sort denition, we can nd that it also shares this structure.
sort(L) =
_
: L =
insert(l
1
, sort(L

)) : otherwise
This hint us that we can abstract this essential common structure, so that
we neednt repeat it again and again. Observing sum, product, and sort, there
are two dierent points which we can parameterize.
The result of the trivial edge case varies. It is zero for sum, 1 for product,
and empty list for sorting.
The function applied to the rst element and the intermediate result varies.
It is plus for sum, multiply for product, and ordered-insertion for sorting.
If we parameterize the result of trivial edge case as initial value z (stands for
abstract zero concept), the function applied in recursive case as f (which takes
two parameters, one is the rst element in the list, the other is the recursive
result for the rest of the list), this common structure can be dened as something
like the following.
proc(f, z, L) =
_
z : L =
f(l
1
, proc(f, z, L

)) : otherwise
Thats it, and we should name this common structure a better name instead
of the meaningless proc. Lets see the characteristic of this common structure.
For list L = {x
1
, x
2
, ..., x
N
}, we can expand the computation like the following.
A.6. FOLDING 721
proc(f, z, L) = f(x
1
, proc(f, z, L

)
= f(x
1
, f(x
2
, proc(f, z, L

))
...
= f(x
1
, f(x
2
, f(..., f(x
N
, f(f, z, ))...)
= f(x
1
, f(x
2
, f(..., f(x
N
, z))...)
Since f takes two parameters, its a binary function, thus we can write it in
inx form. The inx form is dened as below.
x
f
y = f(x, y) (A.61)
The above expanded result is equivalent to the following by using inx no-
tion.
proc(f, z, L) = x
1

f
(x
2

f
(...(x
N

f
z))...)
Note that the parentheses are necessary, because the computation starts
from the right-most (x
N

f
z), and repeatedly fold to left towards x
1
. This is
quite similar to folding a Chinese hand-fan as illustrated in the following photos.
A Chinese hand-fan is made of bamboo and paper. Multiple bamboo frames are
stuck together with an axis at one end. The arc shape paper is fully expanded
by these frames as shown in Figure A.3 (a); The fan can be closed by folding
the paper. Figure A.3 (b) shows that some part of the fan is folded from right.
After these folding nished, the fan results a stick, as shown in Figure A.3 (c).
We can considered that each bamboo frame along with the paper on it as an
element, so these frames forms a list. A unit process to close the fan is to rotate
a frame for a certain angle, so that it lays on top of the collapsed part. When
we start closing the fan, the initial collapsed result is the rst bamboo frame.
The close process is folding from one end, and repeatedly apply the unit close
steps, till all the frames is rotated, and the folding result is a stick closed form.
Actually, the sum and product algorithms exactly do the same thing as
closing the fan.
sum({1, 2, 3, 4, 5}) = 1 + (2 + (3 + (4 + 5)))
= 1 + (2 + (3 + 9))
= 1 + (2 + 12)
= 1 + 14
= 15
product({1, 2, 3, 4, 5}) = 1 (2 (3 (4 5)))
= 1 (2 (3 20))
= 1 (2 60)
= 1 120
= 120
In functional programming, we name this process folding, and particularly,
since we execute from the most inner structure, which starts from the right-most
one. This type of folding is named folding right.
foldr(f, z, L) =
_
z : L =
f(l
1
, foldr(f, z, L

)) : otherwise
(A.62)
722 APPENDIX A. LISTS
(a) A folding fan fully opened.
(b) The fan is partly folded on right. (c) The fan is fully folded, closed to a stick.
Figure A.3: Folding a Chinese hand-fan
A.6. FOLDING 723
Lets see how to use fold-right to realize sum and product.

N
i=1
x
i
= x
1
+ (x
2
+ (x
3
+... + (x
N
1
+x
N
))...)
= foldr(+, 0, {x
1
, x
2
, ..., x
N
})
(A.63)

N
i=1
x
i
= x
1
(x
2
(x
3
... + (x
N
1
x
N
))...)
= foldr(, 1, {x
1
, x
2
, ..., x
N
})
(A.64)
The insertion-sort algorithm can also be dened by using folding right.
sort(L) = foldr(insert, , L) (A.65)
A.6.2 folding from left
As mentioned in section of tail recursive call. Both pure recursive sum and
product compute from right to left and they must book keep all the intermediate
results and contexts. As we abstract fold-right from the very same structure,
folding from right does the book keeping as well. This will be expensive if the
list is very long.
Since we can change the realization of sum and product to tail-recursive call
manner, it quite possible that we can provide another folding algorithm, which
processes the list from left to right in normal order, and enable the tail-call
optimization by reusing the same context.
Instead of induction from sum, product and insertion, we can directly change
the folding right to tail call. Observe that the initial value z, actually represents
the intermediate result at any time. We can use it as the accumulator.
foldl(f, z, L) =
_
z : L =
foldl(f, f(z, l
1
), L

) : otherwise
(A.66)
Every time when the list isnt empty, we take the rst element, apply function
f on the accumulator z and it to get a new accumulator z

= f(z, l
1
). After
that we can repeatedly folding with the very same function f, the updated
accumulator z

, and list L

.
Lets verify that this tail-call algorithm actually folding from left.

5
i=1
i = foldl(+, 0, {1, 2, 3, 4, 5})
= foldl(+, 0 + 1, {2, 3, 4, 5})
= foldl(+, (0 + 1) + 2{3, 4, 5}
= foldl(+, ((0 + 1) + 2) + 3, {4, 5})
= foldl(+, (((0 + 1) + 2) + 3) + 4, {5})
= foldl(+, ((((0 + 1) + 2 + 3) + 4 + 5, )
= 0 + 1 + 2 + 3 + 4 + 5
Note that, we actually delayed the evaluation of f(z, l
1
) in every step. (This
is the exact behavior in system support lazy-evaluation, for instance, Haskell.
However, in strict system such as standard ML, its not the case.) Actually,
they will be evaluated in sequence of {1, 3, 6, 10, 15} in each call.
Generally, folding-left can be expanded in form of
foldl(f, z, L) = f(f(...(f(f(z, l
1
), l
2
), ..., l
N
) (A.67)
724 APPENDIX A. LISTS
Or in inx manner as
foldl(f, z, L) = ((...(z
f
l
1
)
f
l
2
)
f
...) l
N
(A.68)
With folding from left dened, sum, product, and insertion-sort can be trans-
parently implemented by calling foldl as sum(L) = foldl(+, 0, L), product(L) =
foldl(+, 1, L), and sort(L) = foldl(insert, , L). Compare with the folding-
right version, they are almost same at rst glares, however, the internal imple-
mentation diers.
Imperative folding and generic folding concept
The tail-call nature of folding-left algorithm is quite friendly for imperative set-
tings, that even the compiler isnt equipped with tail-call recursive optimization,
we can anyway implement the folding in while-loop manually.
function Fold(f, z, L)
while L = do
z f(z, First(L) )
L Rest(L)
return z
Translating this algorithm to Python yields the following example program.
def fold(f, z, xs):
for x in xs:
z = f(z, x)
return z
Actually, Python provides built-in function reduce which does the very
same thing. (in ISO C++, this is provided as reduce algorithm in STL.) Almost
no imperative environment provides folding-right function because it will cause
stack overow problem if the list is too long. However, there still exist cases
that the folding from right semantics is necessary. For example, one denes a
container, which only provides insertion function to the head of the container,
but there is no any appending method, so that we want such a fromList tool.
fromList(L) = foldr(insertHead, empty, L)
Calling fromList with the insertion function as well as an empty initialized
container, can turn a list into the special container. Actually the singly linked-
list is such a container, which performs well on insertion to the head, but poor
to linear time if appending on the tail. Folding from right is quite nature when
duplicate a linked-list while keeps the elements ordering. While folding from
left will generate a reversed list.
In such cases, there exists an alternative way to implement imperative folding
right by rst reverse the list, and then folding the reversed one from left.
function Fold-Right(f, z, L)
return Fold(f, z, Reverse(L))
Note that, here we must use the tail-call version of reversing, or the stack
overow issue still exists.
One may think that folding-left should be chosen in most cases over folding-
right because its friendly for tail-recursion call optimization, suitable for both
A.6. FOLDING 725
functional and imperative settings, and its an online algorithm. However,
folding-right plays a critical role when the input list is innity and the binary
function f is lazy. For example, below Haskell program wraps every element in
an innity list to a singleton, and returns the rst 10 result.
take 10 $ foldr (x xs [x]:xs) [] [1..]
[[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]
This cant be achieved by using folding left because the outer most evaluation
cant be nished until all the list being processed. The details is specic to lazy
evaluation feature, which is out of the scope of this book. Readers can refer to
[13] for details.
Although the main topic of this appendix is about singly linked-list related
algorithms, the folding concept itself is generic which doesnt only limit to list,
but also can be applied to other data structures.
We can fold a tree, a queue, or even more complicated data structures as
long as we have the following:
The empty data structure can be identied for trivial edge case; (e.g.
empty tree)
We can traverse the data structure (e.g. traverse the tree in pre-order).
Some languages provide this high-level concept support, for example, Haskell
achieve this via monoid, readers can refer to [8] for detail.
There are many chapters in this book use the widen concept of folding.
A.6.3 folding in practice
We have seen that max, min, and insertion sort all can be realized in folding.
The brute-force solution for drunk jailer puzzle shown in mapping section can
also be designed by mixed use of mapping and folding.
Remind that we create a list of pairs, each pair contains the number of the
light, and the on-o state. After that we process from 1 to N, switch the light
if the number can be divided. The whole process can be viewed as folding.
fold(step, {(1, 0), (2, 0), ..., (N, 0)}, {1, 2, ..., N})
The initial value is the very rst state, that all the lights are o. The list to
be folding is the operations from 1 to N. Function step takes two arguments,
one is the light states pair list, the other is the operation time i. It then maps
on all lights and performs switching. We can then substitute the step with
mapping.
fold(
L,i
map(switch(i), L), {(1, 0), (2, 0), ..., (N, 0)}, {1, 2, ..., N})
Well simplify the notation, and directly write map(switch(i), l) for brevity
purpose. The result of this folding is the nal states pairs, we need take the
second one of the pair for each element via mapping, then calculate the sum-
mation.
sum(map(snd, fold(map(switch(i), L), {(1, 0), (2, 0), ..., (N, 0)}, {1, 2, ..., N})))
(A.69)
726 APPENDIX A. LISTS
There are materials provides plenty of good examples of using folding, espe-
cially in [1], folding together with fusion law are well explained.
concatenate a list of list
In previous section A.3.6 about concatenation, we explained how to concate-
nate two lists. Actually, concatenation of lists can be considered equivalent to
summation of numbers. Thus we can design a general algorithm, which can
concatenate multiple lists into one big list.
Whats more, we can realize this general concatenation by using folding.
As sum can be represented as sum(L) = foldr(+, 0, L), its straightforward to
write the following equation.
concats(L) = foldr(concat, , L) (A.70)
Where L is a list of list, for example {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}, ...}. Func-
tion concat(L
1
, L
2
) is what we dened in section A.3.6.
In some environments which support lazy-evaluation, such as Haskell, this
algorithm is capable to concatenate innite list of list, as the binary function
++ is lazy.
Exercise A.4
Whats the performance of concats algorithm? is it linear or quadratic?
Design another linear time concats algorithm without using folding.
Realize mapping algorithm by using folding.
A.7 Searching and matching
Searching and matching are very important algorithms. They are not only
limited to linked list, but also applicable to a wide range of data structures. We
just scratch the surface of searching and matching in this appendix. There are
dedicated chapters explain about them in this book.
A.7.1 Existence testing
The simplest searching case is to test if a given element exists in a list. A linear
time traverse can solve this problem. In order to determine element x exists in
list L:
If the list is empty, its obvious that the element doesnt exist in L;
If the rst element in the list equals to x, we know that x exists;
Otherwise, we need recursively test if x exists in the rest sub-list L

;
A.7. SEARCHING AND MATCHING 727
This simple description can be directly formalized to equation as the follow-
ing.
x L =
_
_
_
False : L =
True : l
1
= x
x L

: otherwise
(A.71)
This is denitely a linear algorithm which is bound to O(N) time. The
best case happens in the two trivial clauses that either the list is empty or the
rst element is what we are nding; The worst case happens when the element
doesnt exist at all or it is the last element. In both cases, we need traverse the
whole list. If the probability is equal for all the positions, the average case takes
about
N+1
2
steps for traversing.
This algorithm is so trivial that we left the implementation as exercise to
the reader. If the list is ordered, one may expect to improve the algorithm
to logarithm time but not linear. However, as we discussed, since list doesnt
support constant time random accessing, binary search cant be applied here.
There is a dedicated chapter in this book discusses how to evolve the linked list
to binary tree to achieve quick searching.
A.7.2 Looking up
One extra step from existence testing is to nd the interesting information stored
in the list. There are two typical methods to augment extra data to the element.
Since the linked list is chain of nodes, we can store satellite data in the node,
then provide key(n) to access the key of the node, rest(n) for the rest sub-list,
and value(n) for the augmented data. The other method, is to pair the key
and data, for example {(1, hello), (2, world), (3, foo), ...}. Well introduce how
to form such pairing list in later section.
The algorithm is almost as same as the existence testing, that it traverses
the list, examines the key one by one. Whenever it nds a node which has the
same key as what we are looking up, it stops, and returns the augmented data.
It is obvious that this is linear strategy. If the satellite data is augmented to
the node directly, the algorithm can be dened as the following.
lookup(x, L) =
_
_
_
: L =
value(l
1
) : key(l
1
) = x
lookup(x, L

) : otherwise
(A.72)
In this algorithm, L is a list of nodes which are augmented with satellite data.
Note that the rst case actually means looking up failure, so that the result is
empty. Some functional programming languages, such as Haskell, provide Maybe
type to handle the possibility of fail. This algorithm can be slightly modied to
handle the key-value pair list as well.
lookup(x, L) =
_
_
_
: L =
snd(l
1
) : fst(l
1
) = x
lookup(x, L

) : otherwise
(A.73)
Here L is a list of pairs, functions fst(p) and snd(p) access the rst part and
second part of the pair respectively.
Both algorithms are in tail-call manner, they can be transformed to imper-
ative looping easily. We left this as exercise to the reader.
728 APPENDIX A. LISTS
A.7.3 nding and ltering
Lets take one more step ahead, looking up algorithm performs linear search by
comparing the key of an element is equal to the given value. A more general
case is to nd an element matching a certain predicate. We can abstract this
matching condition as a parameter for this generic linear nding algorithm.
find(p, L) =
_
_
_
: L =
l
1
: p(l
1
)
find(p, L

) : otherwise
(A.74)
The algorithm traverses the list by examining if the element satises the
predicate p. It fails if the list is empty while there is still nothing found. This is
handled in the rst trivial edge case; If the rst element in the list satises the
condition, the algorithm returns the whole element (node), and user can further
handle it as he like (either extract the satellite data or do whatever); otherwise,
the algorithm recursively perform nding on the rest of the sub-list. Below is
the corresponding Haskell example program.
find _ [] = Nothing
find p (x:xs) = if p x then Just x else find p xs
Translating this to imperative algorithm is straightforward. Here we use
NIL to represent the fail case.
function Find(p, L)
while L = do
if p(First(L)) then
return First(L)
L Rest(L)
return NIL
And here is the Python example of nding.
def find(p, xs):
while xs is not None:
if p(xs.key):
return xs
xs = xs.next
return None
It is quite possible that there are multiple elements in the list which satisfy
the precondition. The nding algorithm designed so far just picks the rst one
it meets, and stops immediately. It can be considered as a special case of nding
all elements under a certain condition.
Another viewpoint of nding all elements with a given predicate is to treat
the nding algorithm as a black box, the input to this box is a list, while the
output is another list contains all elements satisfying the predicate. This can
be called as ltering as shown in the below gure.
This gure can be formalized in another form in taste of set enumeration.
However, we actually enumerate among list instead of a set.
filter(p, L) = {x|x L p(x)} (A.75)
Some environment such as Haskell (and Python for any iterable), supports
this form as list comprehension.
A.7. SEARCHING AND MATCHING 729
input filter p output
Figure A.4: The input is the original list {x
1
, x
2
, ..., x
N
}, the output is a list
{x

1
, x

2
, ..., x

M
}, that for x

i
, predicate p(x

i
) is satised.
filter p xs = [ x | x xs, p x]
And in Python for built-in list as
def filter(p, xs):
return [x for x in xs if p(x)]
Note that the Python built-in list isnt singly-linked list as we mentioned in
this appendix.
In order to modify the nding algorithm to realize ltering, the found ele-
ments are appended to a result list. And instead of stopping the traverse, all
the rest of elements should be examined with the predicate.
filter(p, L) =
_
_
_
: L =
cons(l
1
, filter(p, L

)) : p(l
1
)
filter(p, L

) : otherwise
(A.76)
This algorithm returns empty result if the list is empty for trivial edge case;
For non-empty list, suppose the recursive result of ltering the rest of the sub-
list is A, the algorithm examine if the rst element satises the predicate, it is
put in front of A by a cons operation (O(1) time).
The corresponding Haskell program is given as below.
filter _ [] = []
filter p (x:xs) = if p x then x : filter p xs else filter p xs
Although we mentioned that the next found element is appended to the
result list, this algorithm actually constructs the result list from the right most
to the left, so that appending is avoided, which ensure the linear O(N) perfor-
mance. Compare this algorithm with the following imperative quadratic real-
ization reveals the dierence.
function Filter(p, L)
L


while L = do
if p(First(L)) then
L

Append(L

, First(L)) Linear operation


L Rest(L)
As the comment of appending statement, its typically proportion to the
length of the result list if the tail position isnt memorized. This fact indicates
that directly transforming the recursive lter algorithm into tail-call form will
downgrade the performance from O(N) to O(N
2
). As shown in the below equa-
tion, that filter(p, L) = filter

(p, L, ) performs as poorly as the imperative


730 APPENDIX A. LISTS
one.
filter

(p, L, A) =
_
_
_
A : L =
filter

(p, L

, A {l
1
}) : p(l
1
)
filter

(p, L

, A) : otherwise
(A.77)
One solution to achieve linear time performance imperatively is to construct
the result list in reverse order, and perform the O(N) reversion again (refer to
the above section) to get the nal result. This is left as exercise to the reader.
The fact of construction the result list from right to left indicates the pos-
sibility of realizing ltering with folding-right concept. We need design some
combinator function f, so that filter(p, L) = foldr(f, , L). It requires that
function f takes two arguments, one is the element iterated among the list; the
other is the intermediate result constructed from right. f(x, A) can be dened
as that it tests the predicate against x, if succeed, the result is updated to
cons(x, A), otherwise, A is kept same.
f(x, A) =
_
cons(x, A) : p(x)
A : otherwise
(A.78)
However, the predicate must be passed to function f as well. This can
be achieved by using currying, so f actually has the prototype f(p, x, A), and
ltering is dened as following.
filter(p, L) = foldr(
x,A
f(p, x, A), , L) (A.79)
Which can be simplied by -conversion. For detailed denition of -conversion,
readers can refer to [2].
filter(p, L) = foldr(f(p), , L) (A.80)
The following Haskell example program implements this equation.
filter p = foldr f [] where
f x xs = if p x then x : xs else xs
Similar to mapping and folding, ltering is actually a generic concept, that
we can apply a predicate on any traversable data structures to get what we
are interesting. readers can refer to the topic about monoid in [8] for further
reading.
A.7.4 Matching
Matching generally means to nd a given pattern among some data structures.
In this section, we limit the topic within list. Even this limitation will leads
to a very wide and deep topic, that there are dedicated chapters in this book
introduce matching algorithms. So we only select the algorithm to test if a given
list exists in another (typically longer) list.
Before dive into the algorithm of nding the sub-list at any position, two
special edge cases are used for warm up. They are algorithms to test if a given
list is either prex or sux of another.
In the section about span, we have seen how to nd a prex under a certain
condition. prex matching can be considered as a special case in some sense.
A.7. SEARCHING AND MATCHING 731
That it compares each of the elements between the two lists from the beginning
until meets any dierent elements or pass the end of one list. Dene P L if
P is prex of L.
P L =
_
_
_
True : P =
False : p
1
= l
1
P

: otherwise
(A.81)
This is obviously a linear algorithm. However, We cant use the very same
approach to test if a list is sux of another because it isnt cheap to start from
the end of the list and keep iterating backwards. Arrays, on the other hand
which support random access can be easily traversed backwards.
As we only need the yes-no result, one solution to realize a linear sux
testing algorithm is to reverse both lists, (which is linear time), and use prex
testing instead. Dene L P if P is sux of L.
L P = reverse(P) reverse(L) (A.82)
With dened, it enables to test if a list is inx of another. The idea is
to traverse the target list, and repeatedly applying the prex testing till any
success or arrives at the end.
function Is-Infix(P, L)
while L = do
if P L then
return TRUE
L Rest(L)
return FALSE
Formalize this algorithm to recursive equation leads to the below denition.
infix?(P, L) =
_
_
_
True : P L
False : L =
infix?(P, L

) : otherwise
(A.83)
Note that there is a tricky implicit constraint in this equation. If the pattern
P is empty, it is denitely the inx of any target list. This case is actually
covered by the rst condition in the above equation because empty list is also
the prex of any list. In most programming languages support pattern matching,
we cant arrange the second clause as the rst edge case, or it will return false for
infix?(, ). (One exception is Prolog, but this is a language specic feature,
which we wont covered in this book.)
Since prex testing is linear, and it is called while traversing the list, this
algorithm is quadratic O(NM). where N and M are the length of the pattern
and target lists respectively. There is no trivial way to improve this position by
position scanning algorithm to linear even if the data structure changes from
linked-list to randomly accessible array.
There are chapters in this book introduce several approaches for fast match-
ing, including sux tree with Ukkonen algorithm, Knuth-Morris-Pratt algo-
rithm and Boyer-Moore algorithm.
Alternatively, we can enumerate all suxes of the target list, and check if
the pattern is prex of any these suxes. Which can be represented as the
732 APPENDIX A. LISTS
following.
infix?(P, L) = S suffixes(L) P S (A.84)
This can be represented as list comprehension, for example the below Haskell
program.
isInfixOf x y = (not null) [ s | s tails(y), x isPrefixOfs]
Where function isPrefixOf is the prexing testing function dened accord-
ing to our previous design. function tails generate all suxes of a list. The
implementation of tails is left as an exercise to the reader.
Exercise A.5
Implement the linear existence testing in both functional and imperative
approaches in your favorite programming languages.
Implement the looking up algorithm in your favorite imperative program-
ming language.
Realize the linear time ltering algorithm by rstly building the result
list in reverse order, and nally reverse it to resume the normal result.
Implement this algorithm in both imperative looping and functional tail-
recursion call.
Implement the imperative algorithm of prex testing in your favorite pro-
gramming language.
Implement the algorithm to enumerate all suxes of a list.
A.8 zipping and unzipping
It is quite common to construct a list of paired elements. For example, in the
naive brute-force solution for Drunk jailer puzzle which is shown in section
of mapping, we need to represent the state of all lights. It is initialized as
{(1, 0), (2, 0), ..., (N, 0)}. Another example is to build a key-value list, such as
{(1, a), (2, an), (3, another), ...}.
In Drunk jailer example, the list of pairs is built like the following.
map(
i
(i, 0), {1, 2, ..., N})
The more general case is that, There have been already two lists prepared,
what we need is a handy zipper method.
zip(A, B) =
_
: A = B =
cons((a
1
, b
1
), zip(A

, B

)) : otherwise
(A.85)
Note that this algorithm is capable to handle the case that the two lists
being zipped have dierent lengths. The result list of pairs aligns with the
shorter one. And its even possible to zip an innite list with another one with
A.8. ZIPPING AND UNZIPPING 733
limited length in environment support lazy evaluation. For example with this
auxiliary function dened, we can initialize the lights state as
zip({0, 0, ...}, {1, 2, ..., N}
In some languages support list enumeration, such as Haskell (Python pro-
vides similar range function, but it manipulates built-in list, which isnt linked-
list actually), this can be expressed as zip (repeat 0) [1..n]. Given a list
of words, we can also index them with consecutive numbers as
zip({1, 2, ...}, {a, an, another, ...})
Note that the zipping algorithm is linear, as it uses constant time cons op-
eration in each recursive call. However, directly translating zip into imperative
manner would down-grade the performance to quadratic unless the linked-list
is optimized with tail position cache or we in-place modify one of the passed-in
list.
function Zip(A, B)
C
while A = B = do
C Append(C, (First(A), First(B)))
A Rest(A)
B Rest(B)
return C
Note that, the appending operation is proportion to the length of the result
list C, so it will get more and more slowly along with traversing. There are three
solutions to improve this algorithm to linear time. The rst method is to use a
similar approach as we did in inx-testing, that we construct the result list of
pairs in reverse order by always insert the paired elements on head; then perform
a linear reverse operation before return the nal result; The second method is
to modify one passed-in list, for example A, in-place while traversing. Translate
it from list of elements to list of pairs; The third method is to remember the
last appending position. Please try these solutions as exercise.
The key point of linear time zipping is that the result list is actually built
from right to left, which is similar to the inx-testing algorithm. So its quite
possible to provide a folding-right realization. This is left as exercise to the
reader.
It is natural to extend the zipper algorithm so that multiple lists can be
zipped to one list of multiple-elements. For example, Haskell standard library
provides, zip, zip3, zip4, ..., till zip7. Another typical extension to zipper
is that, sometimes, we dont want to list of pairs (or tuples more generally),
instead, we want to apply some combinator function to each pair of elements.
For example, consider the case that we have a list of unit prices for every
fruit: apple, orange, banana, ..., as {1.00, 0.80, 10.05, ...}, with same unit of
Dollar; And the cart of customer holds a list of purchased quantity, for instance
{3, 1, 0, ...}, means this customer, put 3 apples, an orange in the cart. He doesnt
take any banana, so the quantity of banana is zero. We want to generate a
list of cost for the customer, contains how much should pay for apple, orange,
banana,... respectively.
The program can be written from scratch as below.
734 APPENDIX A. LISTS
paylist(U, Q) =
_
: U = Q =
cons(u
1
q
1
, paylist(U

, Q

)) : otherwise
Compare this equation with the zipper algorithm. It is easy to nd the
common structure of the two, and we can parameterize the combinator function
as f, so that the generic zipper algorithm can be dened as the following.
zipWith(f, A, B) =
_
: A = B =
cons(f(a
1
, b
1
), zipWith(f, A

, B

)) : otherwise
(A.86)
Here is an example that denes the inner-product (or dot-product)[14] by
using zipWith.
A B = sum(zipWith(, A, B)) (A.87)
It is necessary to realize the inverse operation of zipping, that converts a list
of pairs, to dierent lists of elements. Back to the purchasing example, It is
quite possible that the unit price information is stored in a association list like
U = {(apple, 1.00), (orange, 0.80), (banana, 10.05), ...}, so that its convenient
to look up the price with a given product name, for instance, lookup(melon, U).
Similarly, the cart can also be represented clearly in such manner, for example,
Q = {(apple, 3), (orange, 1), (banana, 0), ...}.
Given such a product - unit price list and a product - quantity list, how
to calculate the total payment?
One straight forward idea derived from the previous solution is to extract the
unit price list and the purchased quantity list, then calculate the inner-product
of them.
pay = sum(zipWith(, snd(unzip(P)), snd(unzip(Q)))) (A.88)
Although the denition of unzip can be directly written as the inverse of
zip, here we give a realization based on folding-right.
unzip(L) = foldr(
(a,b),(A,B)
(cons(a, A), cons(b, B)), (, ), L) (A.89)
The initial result is a pair of empty list. During the folding process, the
head of the list, which is a pair of elements, as well as the intermediate result
are passed to the combinator function. This combinator function is given as a
lambda expression, that it extracts the paired elements, and put them in front
of the two intermediate lists respectively. Note that we use implicit pattern
matching to extract the elements from pairs. Alternatively this can be done by
using fst, and snd functions explicitly as

p,P
(cons(fst(p), fst(P)), cons(snd(p), snd(P)))
The following Haskell example code implements unzip algorithm.
unzip = foldr (a, b) (as, bs) (a:as, b:bs) ([], [])
A.9. NOTES AND SHORT SUMMARY 735
Zip and unzip concepts can be extended more generally rather than only
limiting within linked-list. It is quite useful to zip two lists to a tree, where the
data stored in the tree are paired elements from both lists. General zip and
unzip can also be used to track the traverse path of a collection to mimic the
parent pointer in imperative implementations. Please refer to the last chapter
of [8] for a good treatment.
Exercise A.6
Design and implement iota (I) algorithm, which can enumerate a list with
some given parameters. For example:
iota(..., N) = {1, 2, 3, ..., N};
iota(M, N) = {M, M + 1, M + 2, ..., N}, Where M N;
iota(M, M +a, ..., N) = {M, M +a, M + 2a, ..., N};
iota(M, M, ...) = repeat(M) = {M, M, M, ...};
iota(M, ...) = {M, M + 1, M + 2, ...}.
Note that the last two cases demand generate innite list essentially. Con-
sider how to represents innite list? You may refer to the streaming and
lazy evaluation materials such as [5] and [8].
Design and implement a linear time imperative zipper algorithm.
Realize the zipper algorithm with folding-right approach.
For the purchase payment example, suppose the quantity association list
only contains those items with the quantity isnt zero, that instead of a
list of Q = {(apple, 3), (banana, 0), (orange, 1), ...}, it hold a list like Q =
{(apple, 3), (orange, 1), ...}. The banana information is ltered because
the customer doesnt pick any bananas. Write a program, taking the unit-
price association list, and this kind of quantity list, to calculate the total
payment.
A.9 Notes and short summary
In this appendix, a quick introduction about how to build, manipulate, transfer,
and searching singly linked list is briefed in both purely functional and imper-
ative approaches. Most of the modern programming environments have been
equipped with tools to handle such elementary data structures. However, such
tools are designed for general purpose cases, Serious programming shouldnt
take them as black-boxes.
Since linked-list is so critical that it builds the corner stones for almost
all functional programming environments, just like the importance of array to
imperative settings. We take this topic as an appendix to the book. It is quite
OK that the reader starts with the rst chapter about binary search tree, which
is a kind of hello world topic, and refers to this appendix when meets any
unfamiliar list operations.
736 APPENDIX A. LISTS
Bibliography
[1] Richard Bird. Pearls of Functional Algorithm Design. Cambridge Uni-
versity Press; 1 edition (November 1, 2010). ISBN: 978-0521513388
[2] Simon L. Peyton Jones. The Implementation of Functional Programming
Languages. Prentice-Hall International Series in Computer Since. Prentice
Hall (May 1987). ISBN: 978-0134533339
[3] Andrei Alexandrescu. Modern C++ design: Generic Programming and
Design Patterns Applied. Addison Wesley February 01, 2001, ISBN 0-
201-70431-5
[4] Benjamin C. Pierce. Types and Programming Languages. The MIT
Press, 2002. ISBN:0262162091
[5] Harold Abelson, Gerald Jay Sussman, Julie Sussman. Structure and In-
terpretation of Computer Programs, 2nd Edition. MIT Press, 1996, ISBN
0-262-51087-1
[6] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[7] Fethi Rabhi, Guy Lapalme. Algorithms: a functional programming ap-
proach. Second edition. Addison-Wesley, 1999. ISBN: 0201-59604-0
[8] Miran Lipovaca. Learn You a Haskell for Great Good! A Beginners
Guide. No Starch Press; 1 edition April 2011, 400 pp. ISBN: 978-1-59327-
283-8
[9] Joe Armstrong. Programming Erlang: Software for a Concurrent World.
Pragmatic Bookshelf; 1 edition (July 18, 2007). ISBN-13: 978-1934356005
[10] Wikipedia. Tail call. https://en.wikipedia.org/wiki/Tail call
[11] SGI. transform. http://www.sgi.com/tech/stl/transform.html
[12] ACM/ICPC. The drunk jailer. Peking University judge online for
ACM/ICPC. http://poj.org/problem?id=1218.
[13] Haskell wiki. Haskell programming tips. 4.4 Choose the appropriate fold.
http://www.haskell.org/haskellwiki/Haskell programming tips
[14] Wikipedia. Dot product. http://en.wikipedia.org/wiki/Dot product
737
738 BIBLIOGRAPHY
GNU Free Documentation License
Version 1.3, 3 November 2008
Copyright c 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
http://fsf.org/
Everyone is permitted to copy and distribute verbatim copies of this license
document, but changing it is not allowed.
Preamble
The purpose of this License is to make a manual, textbook, or other func-
tional and useful document free in the sense of freedom: to assure everyone
the eective freedom to copy and redistribute it, with or without modifying it,
either commercially or noncommercially. Secondarily, this License preserves for
the author and publisher a way to get credit for their work, while not being
considered responsible for modications made by others.
This License is a kind of copyleft, which means that derivative works of the
document must themselves be free in the same sense. It complements the GNU
General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software,
because free software needs free documentation: a free program should come
with manuals providing the same freedoms that the software does. But this
License is not limited to software manuals; it can be used for any textual work,
regardless of subject matter or whether it is published as a printed book. We
recommend this License principally for works whose purpose is instruction or
reference.
1. APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work, in any medium, that
contains a notice placed by the copyright holder saying it can be distributed
under the terms of this License. Such a notice grants a world-wide, royalty-free
license, unlimited in duration, to use that work under the conditions stated
herein. The Document, below, refers to any such manual or work. Any
member of the public is a licensee, and is addressed as you. You accept the
license if you copy, modify or distribute the work in a way requiring permission
under copyright law.
A Modied Version of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with modications
and/or translated into another language.
739
740 BIBLIOGRAPHY
A Secondary Section is a named appendix or a front-matter section
of the Document that deals exclusively with the relationship of the publishers
or authors of the Document to the Documents overall subject (or to related
matters) and contains nothing that could fall directly within that overall subject.
(Thus, if the Document is in part a textbook of mathematics, a Secondary
Section may not explain any mathematics.) The relationship could be a matter
of historical connection with the subject or with related matters, or of legal,
commercial, philosophical, ethical or political position regarding them.
The Invariant Sections are certain Secondary Sections whose titles are
designated, as being those of Invariant Sections, in the notice that says that
the Document is released under this License. If a section does not t the above
denition of Secondary then it is not allowed to be designated as Invariant.
The Document may contain zero Invariant Sections. If the Document does not
identify any Invariant Sections then there are none.
The Cover Texts are certain short passages of text that are listed, as
Front-Cover Texts or Back-Cover Texts, in the notice that says that the Doc-
ument is released under this License. A Front-Cover Text may be at most 5
words, and a Back-Cover Text may be at most 25 words.
A Transparent copy of the Document means a machine-readable copy,
represented in a format whose specication is available to the general public,
that is suitable for revising the document straightforwardly with generic text
editors or (for images composed of pixels) generic paint programs or (for draw-
ings) some widely available drawing editor, and that is suitable for input to
text formatters or for automatic translation to a variety of formats suitable for
input to text formatters. A copy made in an otherwise Transparent le format
whose markup, or absence of markup, has been arranged to thwart or discour-
age subsequent modication by readers is not Transparent. An image format is
not Transparent if used for any substantial amount of text. A copy that is not
Transparent is called Opaque.
Examples of suitable formats for Transparent copies include plain ASCII
without markup, Texinfo input format, LaTeX input format, SGML or XML us-
ing a publicly available DTD, and standard-conforming simple HTML, PostScript
or PDF designed for human modication. Examples of transparent image for-
mats include PNG, XCF and JPG. Opaque formats include proprietary formats
that can be read and edited only by proprietary word processors, SGML or
XML for which the DTD and/or processing tools are not generally available,
and the machine-generated HTML, PostScript or PDF produced by some word
processors for output purposes only.
The Title Page means, for a printed book, the title page itself, plus such
following pages as are needed to hold, legibly, the material this License requires
to appear in the title page. For works in formats which do not have any title
page as such, Title Page means the text near the most prominent appearance
of the works title, preceding the beginning of the body of the text.
The publisher means any person or entity that distributes copies of the
Document to the public.
A section Entitled XYZ means a named subunit of the Document whose
title either is precisely XYZ or contains XYZ in parentheses following text
that translates XYZ in another language. (Here XYZ stands for a specic sec-
tion name mentioned below, such as Acknowledgements, Dedications,
Endorsements, or History.) To Preserve the Title of such a sec-
BIBLIOGRAPHY 741
tion when you modify the Document means that it remains a section Entitled
XYZ according to this denition.
The Document may include Warranty Disclaimers next to the notice which
states that this License applies to the Document. These Warranty Disclaimers
are considered to be included by reference in this License, but only as regards
disclaiming warranties: any other implication that these Warranty Disclaimers
may have is void and has no eect on the meaning of this License.
2. VERBATIM COPYING
You may copy and distribute the Document in any medium, either commer-
cially or noncommercially, provided that this License, the copyright notices, and
the license notice saying this License applies to the Document are reproduced
in all copies, and that you add no other conditions whatsoever to those of this
License. You may not use technical measures to obstruct or control the reading
or further copying of the copies you make or distribute. However, you may
accept compensation in exchange for copies. If you distribute a large enough
number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you
may publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed
covers) of the Document, numbering more than 100, and the Documents license
notice requires Cover Texts, you must enclose the copies in covers that carry,
clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover,
and Back-Cover Texts on the back cover. Both covers must also clearly and
legibly identify you as the publisher of these copies. The front cover must
present the full title with all words of the title equally prominent and visible.
You may add other material on the covers in addition. Copying with changes
limited to the covers, as long as they preserve the title of the Document and
satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to t legibly, you
should put the rst ones listed (as many as t reasonably) on the actual cover,
and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more
than 100, you must either include a machine-readable Transparent copy along
with each Opaque copy, or state in or with each Opaque copy a computer-
network location from which the general network-using public has access to
download using public-standard network protocols a complete Transparent copy
of the Document, free of added material. If you use the latter option, you must
take reasonably prudent steps, when you begin distribution of Opaque copies
in quantity, to ensure that this Transparent copy will remain thus accessible at
the stated location until at least one year after the last time you distribute an
Opaque copy (directly or through your agents or retailers) of that edition to the
public.
It is requested, but not required, that you contact the authors of the Doc-
ument well before redistributing any large number of copies, to give them a
chance to provide you with an updated version of the Document.
742 BIBLIOGRAPHY
4. MODIFICATIONS
You may copy and distribute a Modied Version of the Document under the
conditions of sections 2 and 3 above, provided that you release the Modied
Version under precisely this License, with the Modied Version lling the role
of the Document, thus licensing distribution and modication of the Modied
Version to whoever possesses a copy of it. In addition, you must do these things
in the Modied Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that
of the Document, and from those of previous versions (which should, if
there were any, be listed in the History section of the Document). You
may use the same title as a previous version if the original publisher of
that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities respon-
sible for authorship of the modications in the Modied Version, together
with at least ve of the principal authors of the Document (all of its prin-
cipal authors, if it has fewer than ve), unless they release you from this
requirement.
C. State on the Title page the name of the publisher of the Modied Version,
as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modications adjacent to
the other copyright notices.
F. Include, immediately after the copyright notices, a license notice giving
the public permission to use the Modied Version under the terms of this
License, in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and re-
quired Cover Texts given in the Documents license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled History, Preserve its Title, and add to it
an item stating at least the title, year, new authors, and publisher of the
Modied Version as given on the Title Page. If there is no section Entitled
History in the Document, create one stating the title, year, authors, and
publisher of the Document as given on its Title Page, then add an item
describing the Modied Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for public
access to a Transparent copy of the Document, and likewise the network
locations given in the Document for previous versions it was based on.
These may be placed in the History section. You may omit a network
location for a work that was published at least four years before the Doc-
ument itself, or if the original publisher of the version it refers to gives
permission.
BIBLIOGRAPHY 743
K. For any section Entitled Acknowledgements or Dedications, Preserve
the Title of the section, and preserve in the section all the substance and
tone of each of the contributor acknowledgements and/or dedications given
therein.
L. Preserve all the Invariant Sections of the Document, unaltered in their text
and in their titles. Section numbers or the equivalent are not considered
part of the section titles.
M. Delete any section Entitled Endorsements. Such a section may not be
included in the Modied Version.
N. Do not retitle any existing section to be Entitled Endorsements or to
conict in title with any Invariant Section.
O. Preserve any Warranty Disclaimers.
If the Modied Version includes new front-matter sections or appendices
that qualify as Secondary Sections and contain no material copied from the
Document, you may at your option designate some or all of these sections as
invariant. To do this, add their titles to the list of Invariant Sections in the
Modied Versions license notice. These titles must be distinct from any other
section titles.
You may add a section Entitled Endorsements, provided it contains noth-
ing but endorsements of your Modied Version by various partiesfor example,
statements of peer review or that the text has been approved by an organization
as the authoritative denition of a standard.
You may add a passage of up to ve words as a Front-Cover Text, and a
passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover
Texts in the Modied Version. Only one passage of Front-Cover Text and one
of Back-Cover Text may be added by (or through arrangements made by) any
one entity. If the Document already includes a cover text for the same cover,
previously added by you or by arrangement made by the same entity you are
acting on behalf of, you may not add another; but you may replace the old one,
on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give
permission to use their names for publicity for or to assert or imply endorsement
of any Modied Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this
License, under the terms dened in section 4 above for modied versions, pro-
vided that you include in the combination all of the Invariant Sections of all
of the original documents, unmodied, and list them all as Invariant Sections
of your combined work in its license notice, and that you preserve all their
Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple
identical Invariant Sections may be replaced with a single copy. If there are
multiple Invariant Sections with the same name but dierent contents, make
the title of each such section unique by adding at the end of it, in parentheses,
744 BIBLIOGRAPHY
the name of the original author or publisher of that section if known, or else a
unique number. Make the same adjustment to the section titles in the list of
Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections Entitled History in
the various original documents, forming one section Entitled History; likewise
combine any sections Entitled Acknowledgements, and any sections Entitled
Dedications. You must delete all sections Entitled Endorsements.
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents
released under this License, and replace the individual copies of this License in
the various documents with a single copy that is included in the collection,
provided that you follow the rules of this License for verbatim copying of each
of the documents in all other respects.
You may extract a single document from such a collection, and distribute it
individually under this License, provided you insert a copy of this License into
the extracted document, and follow this License in all other respects regarding
verbatim copying of that document.
7. AGGREGATION WITH INDEPENDENT
WORKS
A compilation of the Document or its derivatives with other separate and
independent documents or works, in or on a volume of a storage or distribution
medium, is called an aggregate if the copyright resulting from the compilation
is not used to limit the legal rights of the compilations users beyond what the
individual works permit. When the Document is included in an aggregate,
this License does not apply to the other works in the aggregate which are not
themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the
Document, then if the Document is less than one half of the entire aggregate, the
Documents Cover Texts may be placed on covers that bracket the Document
within the aggregate, or the electronic equivalent of covers if the Document is
in electronic form. Otherwise they must appear on printed covers that bracket
the whole aggregate.
8. TRANSLATION
Translation is considered a kind of modication, so you may distribute trans-
lations of the Document under the terms of section 4. Replacing Invariant Sec-
tions with translations requires special permission from their copyright holders,
but you may include translations of some or all Invariant Sections in addition to
the original versions of these Invariant Sections. You may include a translation
of this License, and all the license notices in the Document, and any Warranty
Disclaimers, provided that you also include the original English version of this
License and the original versions of those notices and disclaimers. In case of a
disagreement between the translation and the original version of this License or
a notice or disclaimer, the original version will prevail.
BIBLIOGRAPHY 745
If a section in the Document is Entitled Acknowledgements, Dedica-
tions, or History, the requirement (section 4) to Preserve its Title (section 1)
will typically require changing the actual title.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as
expressly provided under this License. Any attempt otherwise to copy, modify,
sublicense, or distribute it is void, and will automatically terminate your rights
under this License.
However, if you cease all violation of this License, then your license from
a particular copyright holder is reinstated (a) provisionally, unless and until
the copyright holder explicitly and nally terminates your license, and (b) per-
manently, if the copyright holder fails to notify you of the violation by some
reasonable means prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is reinstated per-
manently if the copyright holder noties you of the violation by some reasonable
means, this is the rst time you have received notice of violation of this License
(for any work) from that copyright holder, and you cure the violation prior to
30 days after your receipt of the notice.
Termination of your rights under this section does not terminate the licenses
of parties who have received copies or rights from you under this License. If
your rights have been terminated and not permanently reinstated, receipt of a
copy of some or all of the same material does not give you any rights to use it.
10. FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions of the
GNU Free Documentation License from time to time. Such new versions will be
similar in spirit to the present version, but may dier in detail to address new
problems or concerns. See http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If
the Document species that a particular numbered version of this License or
any later version applies to it, you have the option of following the terms and
conditions either of that specied version or of any later version that has been
published (not as a draft) by the Free Software Foundation. If the Document
does not specify a version number of this License, you may choose any version
ever published (not as a draft) by the Free Software Foundation. If the Docu-
ment species that a proxy can decide which future versions of this License can
be used, that proxys public statement of acceptance of a version permanently
authorizes you to choose that version for the Document.
11. RELICENSING
Massive Multiauthor Collaboration Site (or MMC Site) means any World
Wide Web server that publishes copyrightable works and also provides promi-
nent facilities for anybody to edit those works. A public wiki that anybody can
edit is an example of such a server. A Massive Multiauthor Collaboration
(or MMC) contained in the site means any set of copyrightable works thus
published on the MMC site.
746 BIBLIOGRAPHY
CC-BY-SA means the Creative Commons Attribution-Share Alike 3.0 li-
cense published by Creative Commons Corporation, a not-for-prot corporation
with a principal place of business in San Francisco, California, as well as future
copyleft versions of that license published by that same organization.
Incorporate means to publish or republish a Document, in whole or in
part, as part of another Document.
An MMC is eligible for relicensing if it is licensed under this License, and
if all works that were rst published under this License somewhere other than
this MMC, and subsequently incorporated in whole or in part into the MMC,
(1) had no cover texts or invariant sections, and (2) were thus incorporated prior
to November 1, 2008.
The operator of an MMC Site may republish an MMC contained in the site
under CC-BY-SA on the same site at any time before August 1, 2009, provided
the MMC is eligible for relicensing.
ADDENDUM: How to use this License for your
documents
To use this License in a document you have written, include a copy of the
License in the document and put the following copyright and license notices just
after the title page:
Copyright c YEAR YOUR NAME. Permission is granted to copy,
distribute and/or modify this document under the terms of the GNU
Free Documentation License, Version 1.3 or any later version pub-
lished by the Free Software Foundation; with no Invariant Sections,
no Front-Cover Texts, and no Back-Cover Texts. A copy of the li-
cense is included in the section entitled GNU Free Documentation
License.
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
replace the with . . . Texts. line with this:
with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being
LIST.
If you have Invariant Sections without Cover Texts, or some other combina-
tion of the three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recom-
mend releasing these examples in parallel under your choice of free software
license, such as the GNU General Public License, to permit their use in free
software.
Index
8 queens puzzle, 611
AVL tree, 83
balancing, 88
denition, 83
deletion, 94
imperative insertion, 94
insertion, 86
verication, 93
BFS, 640
Binary Random Access List
Denition, 446
Insertion, 448
Random access, 451
Remove from head, 449
Binary search, 562
binary search tree, 29
data layout, 31
delete, 40
insertion, 33
looking up, 37
min/max, 38
randomly build, 44
search, 37
succ/pred, 38
traverse, 34
binary tree, 30
Binomial Heap
Linking, 376
Binomial heap, 371
denition, 374
insertion, 378
pop, 383
Binomial tree, 371
merge, 380
Boyer-Moor majority number, 578
Boyer-Moore algorithm, 597
Breadth-rst search, 640
Change-making problem, 652
Cock-tail sort, 355
Deep-rst search, 605
DFS, 605
Dynamic programming, 654
Fibonacci Heap, 386
decrease key, 399
delete min, 390
insert, 388
merge, 389
pop, 390
Finger Tree
Imperative splitting, 494
Finger tree
Append to tail, 478
Concatenate, 481
Denition, 467
Ill-formed tree, 473
Imperative random access, 492
Insert to head, 469
Random access, 486, 492
Remove from head, 472
Remove from tail, 479
Size augmentation, 486
splitting, 490
folding, 720
Grady algorithm, 641
Human coding, 641
in-order traverse, 35
Insertion sort
binary search, 52
binary search tree, 55
linked-list setting, 53
insertion sort, 49
insertion, 50
Kloski puzzle, 633
KMP, 583
747
748 INDEX
Knuth-Morris-Pratt algorithm, 583
LCS, 659
left child, right sibling, 375
List
append, 687
break, 715
concat, 696
concats, 726
cons, 681
Construction, 681
denition, 679
delete, 693
delete at, 693
drop, 713
drop while, 714
elem, 726
empty, 680
empty testing, 682
existence testing, 726
Extract sub-list, 713
lter, 728
nd, 728
fold from left, 723
fold from right, 720
foldl, 723
foldr, 720
for each, 708
get at, 683
group, 716
head, 680
index, 683
inx, 730
init, 684
insert, 690
insert at, 690
last, 684
length, 682
lookup, 727
map, 705, 706
matching, 730
maximum, 701
minimum, 701
mutate, 687
prex, 730
product, 697
reverse, 711
Reverse index, 685
rindex, 685
set at, 688
span, 715
split at, 713, 715
sux, 730
sum, 697
tail, 680
take, 713
take while, 714
Transformation, 705
unzip, 732
zip, 732
Longest common subsequence problem,
659
Maximum sum problem, 582
Maze problem, 605
Merge Sort, 529
Basic version, 530
Bottom-up merge sort, 551
In-place merge sort, 537
In-place working area, 538
Linked-list merge sort, 543
Merge, 530
Naive in-place merge, 537
Nature merge sort, 545
Performance analysis, 533
Work area allocation, 534
minimum free number, 10
MTF, 498
Paired-array list
Denition, 459
Insertion and appending, 460
Random access, 460
Removing and balancing, 461
Pairing heap, 403
denition, 404
delete min, 406
nd min, 404
insert, 404
pop, 406
top, 404
pairing heap
decrease key, 406
delete, 410
Parallel merge sort, 553
Parallel quick sort, 553
Peg puzzle, 614
post-order traverse, 35
pre-order traverse, 35
Queue
INDEX 749
Balance Queue, 430
Circular buer, 423
Incremental concatenate, 434
Incremental reverse, 432
Lazy real-time queue, 439
Paired-array queue, 429
Paired-list queue, 426
Real-time Queue, 432
Singly linked-list, 420
Quick Sort
2-way partition, 519
3-way partition, 521
Accmulated partition, 511
Accumulated quick sort, 512
Average case analysis, 514
Basic version, 506
Engineering improvement, 517
Handle duplicated elements, 517
Insertion sort fall-back, 528
One pass functional partition, 511
Performance analysis, 513
Strict weak ordering, 507
Quick sort, 505
partition, 508
range traverse, 40
red-black tree, 59, 64
deletion, 69
imperative insertion, 77
insertion, 65
red-black properties, 64
Saddelback search, 567
Selection algorithm, 558
selection sort, 347
minimum nding, 349
parameterize the comparator, 353
tail-recursive call minimum nd-
ing, 351
Sequence
Binary random access list, 446
Concatenate-able list, 463
nger tree, 467
Imperative binary access list, 456
numeric representation for binary
access list, 453
Paired-array list, 459
Subset sum problem, 664
Tail call, 698
Tail recursion, 698
Tail recursive call, 698
The wolf, goat, and cabbage puzzle,
620
Tounament knock out
explict innity, 364
tree reconstruction, 36
tree rotation, 62
Trounament knock out, 359
Water jugs puzzle, 624
word counter, 29

You might also like