0% found this document useful (0 votes)
561 views

Stratego Algorithms

This thesis is the analysis and implementation of an artifical player for the game Stratego. Stratego is a deterministic imperfect-information board game for two players. The use of Star1, Star2 and StarETC improve the reachable search depth.

Uploaded by

bobertstokes
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
561 views

Stratego Algorithms

This thesis is the analysis and implementation of an artifical player for the game Stratego. Stratego is a deterministic imperfect-information board game for two players. The use of Star1, Star2 and StarETC improve the reachable search depth.

Uploaded by

bobertstokes
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

COMPETITIVE PLAY IN STRATEGO

A.F.C. Arts
Master Thesis DKE 10-05

Thesis submitted in partial fulfilment of the requirements for the degree of Master of Science of Artificial Intelligence at the Faculty of Humanities and Sciences of Maastricht University

Thesis committee: Dr. M.H.M. Winands Dr. ir. J.W.H.M. Uiterwijk M.P.D. Schadd, M.Sc D.L. St-Pierre, M.Sc

Maastricht University Department of Knowledge Engineering Maastricht, The Netherlands March 2010

iii

In loving memory of my father

iv

Preface
This master thesis was written at the Department of Knowledge Engineering at Maastricht University. The subject is the board game Stratego. In this thesis we investigate the relative unknown algorithms called Star1, Star2, StarETC. I would like to thank the following people for their help and support. First of all, my supervisors Dr. Mark Winands and Maarten Schadd, MSc. for their guidance, insight and support. They have helped me with ideas, problems and writing this thesis. Second, the people at Airglow Studios for providing the idea of a thesis about Stratego. And I wish to thank the other thesis committee members for reading this thesis. Finally, I would like to thank my family and friends for their support. Sander Arts Maastricht, March 2010

vi

Preface

Summary
The focus of the thesis is the analysis and implementation of an artical player for the game Stratego. Stratego is a deterministic imperfect-information board game for two players. The problem statement of this thesis is: How can we develop informed-search methods in such a way that programs signicantly improve their performance in Stratego? To answer this problem statement the state-space and game-tree complexity is calculated. For the development of the program, the following backward pruning techniques are implemented: - pruning, iterative deepening, the history heuristic, transposition tables and the *minimax algorithms such as Star1, Star2 and StarETC. A forward pruning technique called multi-cut is tested. Additionally the evaluation function uses features as described by De Boer (2007). It is shown that Stratego is a complex game when compared to other games, such as chess and Hex. The game-tree complexity of 10535 exceeds the game-tree complexity of Go. The state-space equals 10115 . The use of Star1, Star2 and StarETC improve the reachable search depth compared to the expectimax algorithm that is normally used in games with non-determinism. Expectimax with the history heuristic, - and transposition tables is able to prune 74.06%. The results show that Star1 and Star2 can improve the node reduction by 75.18% compared to expectimax with -, History Heuristic and Transposition Tables. The *-minimax algorithms get their advantage from pruning in chance nodes. The performance gain gives the articial player a possibility to search deeper than an articial player without a *-minimax algorithm. The Multi-Cut algorithm does not perform signicant better. The best performance of multi-cut is a score percentage of 53.3% in self play. In the worst case the score percentage is 46.69%. The conclusion of the thesis is that informed-search techniques improve the game playing of an articial player for Stratego. However, because of the high complexity of the game, a intermediate or expert level of Stratego is hard to realize.

viii

Summary

Samenvatting
Het doel van deze thesis is de analyse en implementatie van een computer speler voor het spel Stratego. Stratego is een deterministisch onvolledig-informatie bordspel voor 2 spelers. De hoofdvraag van deze thesis is: Hoe kunnen we ge nformeerde zoektechnieken ontwikkelen zodat de prestatie van een programma signicant verbeterd in Stratego? Om deze vraag te beantwoorden worden de toestandsruimte-complexiteit en de zoekboom-complexiteit berekend. Voor de ontwikkeling van het programma zijn meerdere backward pruning technieken gebruikt: - snoe ng, iterative deepening, de History Heuristiek, Transpositie Tabellen en de *-minimax algoritmen zoals Star1, Star2 en StarETC. Er is een forward pruning techniek gebruikt, namelijk Multi-Cut. Ook zijn er nog evaluatie functie eigenschappen getest zoals beschreven door De Boer (2007). Het blijkt dat Stratego een complex spel in vergelijking met andere spellen, zoals schaak en Hex. De zoekboom-complexiteit van 10535 is groter dan die van Go. De toestandsruimte-complexiteit is 10115 . Het gebruik van Star1, Star2 en StarETC verbeteren de zoekdiepte vergeleken met het expectimaxalgoritme dat normaal gebruikt wordt in spellen met onvolledige informatie of kansknopen. Expectimax met de history heuristiek, - en transpositie tabellen kan tot 74.06% van de knopen snoeien. De resultaten laten zien dat Star1 met Star2 de reductie kan verbeteren met 74.18% in vergelijking met expectimax met - snoe ng, History Heuristiek en Transpositie Tabellen. De *-minimax algoritmen halen hun voordeel bij het snoeien van kansknopen. De prestatiewinst geeft de computer speler de kans dieper te zoeken dan een computer speler zonder *-minimax. Het programma met Multi-Cut presteert niet signicant beter. Het beste resultaat, in spellen tegen zichzelf, is een scorings-percentage van 53.3%. In het slechtste geval is het scorings-percentage 46.69%. De conclusie van deze thesis is dat ge nformeerde zoektechnieken het spel van een computer speler voor Stratego verbeteren. Hoewel, door de hoge complexiteit van het spel is het lastig om een speler van een gemiddeld of hoog niveau voor Stratego te maken.

Samenvatting

Algorithms
List of Algorithms
1 2 3 4 5 6 7 8 Minimax algorithm . . . . . . . . . . . . . . . . . . . . . . . . Negamax algorithm . . . . . . . . . . . . . . . . . . . . . . . . . Negamax algorithm with - pruning . . . . . . . . . . . . . . The negamax formulation of the expectimax method . . . . . The negamax formulation of the Star1 algorithm with uniform The negamax formulation of the Star2 algorithm with uniform Negamax with transposition tables . . . . . . . . . . . . . . . . The negamax formulation of the StarETC algorithm . . . . . . . . . . . . . . . . . . . . . . . . . chances chances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 16 18 18 21 25 27 30

xii

Samenvatting

List of Figures
2.1 2.2 3.1 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.1 5.2 5.3 6.1 Stratego board with coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notation of moves in Stratego . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average branching factor of Stratego per move number . . . . . . . . . . . . . . . . . . . . Game tree generated by the minimax algorithm with Iterative deepening until ply 3 . . . . . . . . . . - pruning in action. The principal variation is red An expectimax tree with uniform chances . . . . . A chance node . . . . . . . . . . . . . . . . . . . . . A chance node . . . . . . . . . . . . . . . . . . . . . The Star1 algorithm in action . . . . . . . . . . . . The Star2 algorithm in action . . . . . . . . . . . . Multi-cut pruning . . . . . . . . . . . . . . . . . . corresponding values of the . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 10 15 16 17 19 20 22 22 24 32 36 37 37 40 53 53 54 54 54 55 55 55 56 56 56

Example where the rank of the highest piece is not important . . . . . . . . . . . . . . . . Example where the Red player has a Miner and the Marshal left. If the Blue player attacks the Marshal with the Spy he wins, otherwise the Red player wins . . . . . . . . . . . . . . Example where the Red player controls all three lanes . . . . . . . . . . . . . . . . . . . . Example positions used in the test set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . set-up set-up set-up set-up set-up set-up set-up set-up set-up set-up set-up #1 . #2 . #3 . #4 . #5 . #6 . #7 . #8 . #9 . #10 . #11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A.1 Board A.2 Board A.3 Board A.4 Board A.5 Board A.6 Board A.7 Board A.8 Board A.9 Board A.10 Board A.11 Board

xiv

LIST OF FIGURES

List of Tables
2.1 3.1 5.1 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 Available Stratego pieces for each player . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complexity of several games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default values of the evaluation function for each piece . . . . . . . . . . . . . . . . . . . . Overhead of Iterative Deepening . . . . . . . . . . . . . . . . . . Node reduction by History Heuristic . . . . . . . . . . . . . . . . Node reduction by a Transposition Table . . . . . . . . . . . . . Node reduction by HH & TT . . . . . . . . . . . . . . . . . . . . . . Node reduction using HH & TT with Iterative Deepening and Deepening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Node reduction by Star1 . . . . . . . . . . . . . . . . . . . . . . . . Node reduction using Star2 with a probing factor of 1 . . . . . . . Node reduction by Star1 and Star2 with a probe factor of 1 . . . . Node reduction using dierent probe factor values for Star2 . . . . Node reduction using StarETC . . . . . . . . . . . . . . . . . . . . Node reduction using StarETC . . . . . . . . . . . . . . . . . . . . Number of nodes searched on 5-ply search . . . . . . . . . . . . . . . Node reduction using evaluation function features . . . . . . . . . . . Comparison of dierent evaluation function features . . . . . . . . . Self play results using Multi-Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . without . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iterative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 11 38 39 40 41 41 42 42 42 43 43 43 44 44 44 45 45

xvi

LIST OF TABLES

Contents
Preface Summary Samenvatting List of Algorithms List of Figures List of Tables Contents 1 Introduction 1.1 Games and AI . . . . . . . . . . 1.2 Stratego . . . . . . . . . . . . . . 1.3 Related Work . . . . . . . . . . . 1.4 Problem Statement and Research 1.5 Outline of the thesis . . . . . . . 2 The 2.1 2.2 2.3 Game of Stratego Rules . . . . . . . . . . . . Notation . . . . . . . . . . Strategies and Knowledge 2.3.1 Flag Position . . . 2.3.2 Miner . . . . . . . 2.3.3 The Spy . . . . . . Other Versions . . . . . . v vii ix xi xiii xv xvii 1 1 1 2 2 3 5 5 6 8 8 8 8 8 9 9 9 10 10 13 13 13 14 15 17 17 19 19 23

. . . . . . . . . . . . . . . . . . Questions . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

2.4

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

3 Complexity Analysis 3.1 Board-setup Complexity . . . . . . . 3.2 State-space Complexity . . . . . . . 3.3 Game-tree complexity . . . . . . . . 3.4 Stratego Compared to Other Games 4 Search 4.1 Introduction . . . . . . . . . 4.2 Minimax . . . . . . . . . . . 4.3 Iterative Deepening Search 4.4 Negamax . . . . . . . . . . 4.5 - Search . . . . . . . . . 4.6 Expectimax . . . . . . . . . 4.7 *-Minimax Algorithms . . . 4.7.1 Star1 . . . . . . . . 4.7.2 Star2 . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

xviii 4.8 Transposition Tables . . . . . . . . . 4.8.1 Zobrist Hashing . . . . . . . . 4.8.2 StarETC . . . . . . . . . . . 4.9 Move Ordering . . . . . . . . . . . . 4.9.1 Static Ordering . . . . . . . . 4.9.2 History Heuristic . . . . . . . 4.9.3 Transposition Table Ordering 4.10 Multi-Cut . . . . . . . . . . . . . . . 5 Evaluation Function 5.1 Piece Value . . . . . 5.2 Value of Information 5.3 Strategic Positions . 5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Contents . . . . . . . . . . . . . . . . . . . . . . . . 26 26 28 29 29 32 32 32 35 35 36 36 37 39 39 39 39 40 41 41 41 42 42 43 44 45 47 47 47 48 49 53

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

6 Experimental Results 6.1 Experimental Design . . . . . . . . . . . . . . . . 6.2 Pruning in MIN/MAX Nodes . . . . . . . . . . . 6.2.1 Iterative Deepening . . . . . . . . . . . . 6.2.2 History Heuristic . . . . . . . . . . . . . . 6.2.3 Transposition Table . . . . . . . . . . . . 6.2.4 History Heuristic and Transposition Table 6.3 Pruning in Chance Nodes . . . . . . . . . . . . . 6.3.1 Star1 . . . . . . . . . . . . . . . . . . . . 6.3.2 Star2 . . . . . . . . . . . . . . . . . . . . 6.3.3 StarETC . . . . . . . . . . . . . . . . . . 6.4 Evaluation Function Features . . . . . . . . . . . 6.5 Multi-cut . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

7 Conclusions 7.1 Answering the Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Answering the Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References A Board Setups

Chapter 1

Introduction
This chapter introduces the subject of the thesis, Stratego. A historical overview of the game is given in Section 1.2. Section 1.3 discusses the related work done in this area. The research questions of this thesis are presented in Section 1.4. Finally, 1.5 gives an outline of this thesis.

1.1

Games and AI

Games in dierent forms have fascinated people all over the world since civilization emerged. Board games come in all dierent kind of forms. For example, if chance is involved or how much information the player has. Games make it possible for people to challenge their intellect against other humans. With increasing computational power humans attempted to let computers play games. Chess was one of the rst games that received attention from science. The rst persons to describe a possible chess program were Shannon (1950) and Turing (1953). Research on abstract games such as chess has led to the chess engine Deep Blue, which defeated the World Champion of chess Kasparov in 1997, (Hsu, 2002). Computers improved their play in several games since the rst chess program (Van den Herik, Uiterwijk, and Van Rijswijck, 2002) appeared. Some games have already been completely solved, such as ConnectFour, (Allis, 1988), Checkers (Schaeer, 2007) and Fanorona (Schadd et al., 2008). Moreover, researchers investigated also less deterministic games. Stochastic (non-deterministic) games, such as Backgammon that included an element of chance and Poker, a game which includes chance and imperfect information.

1.2

Stratego1

The game of Stratego is a board game for two players. The players each have 40 pieces with military ranks that remain hidden for the opponent. The pieces can move around the board and attack opponents pieces. The player whose Flag is captured loses the game. The rank of a piece is only revealed when attacked or attacking. Stratego was created by Mogendor during World War II. It registered as a trademark in 1942 by the Dutch company Van Perlestein & Roeper Bosch NV. In 1958 the license was granted to Hausemann an Htte. The rst version of Stratego was distributed by Smeets and Schippers in 1946 (United States Court, o 2005). In 1961 the game was sublicensed to Milton Bradley, which was acquired by Hasbro in 1984, and rst published in 1961 in the United States. The game is popular in the Netherlands, Belgium, Germany, Britain, the United States of America and Ukraine. Since 1997, there is a world championship of Stratego every year. In 2007, 44 people participated in this championship and since 2007 the USA Stratego Federation holds the Computer Stratego World Championship (USA Stratego Federation, 2009).
1 STRATEGO

is a registered trademark of Hausemann & Htte. o

Introduction

1.3

Related Work

Some research has been done for the game of Stratego. There are several publications which describe their research about this topic. The rst publication is by Steng (2006) who has done research in ard a minimax-based approach of Stratego and created an algorithm called P-E-minimax. This algorithm implements a null-move like technique to calculate if a move results in a similar position. The evaluation function used is unknown. The second publication about Stratego is by De Boer (2007), former (human) World Champion Stratego. De Boer focuses on Stratego board set-ups and uses plans to play the game instead of search techniques. Like most of the current Stratego bots, this implementation does not use minimax-based algorithms but is based upon a more human-like approach (USA Stratego Federation, 2009). By selecting a strategy (attacking a piece, defending a piece, etc) the algorithm creates short-term plans to realise its goals. However, this implementation heavily depends on the evaluation function. A multi-agent implementation has been done by Treijtel (2000) & Ismail (2004). These publications approach Stratego using a rule based agent system to which assigns scores to the generated moves of the pieces. The main focus of these publication lies on the implementation of the agents and their behaviour. Furthermore there is a B.Sc thesis about opponent modelling in Stratego by Stankiewicz (2009) and Stankiewicz and Schadd (2009). This thesis researches the modelling of the opponent by examening the moves of the opponent and tries to discover the rank of the pieces by using these observations. Another B.Sc. thesis is by Mohnen (2009) which implements domain-dependent knowledge of Stratego in the evaluation function. The evaluation function is used to see if better domain knowledge results in a higher number of - cutos. The technique ChanceProbCut is applied in Stratego by Schadd, Winands, and Uiterwijk (2009). This papers describes a forward-pruning technique in expectimax which reduces the size of the search tree signicantly. The nal paper on Stratego is by Schadd and Winands (2009) where an adapted version of Quiesence Search is applied A game similar to Stratego is the game Si Guo Da Zhan (The Great Battle of Four Countries) (Xia, Zhu, and Lu, 2007). This game is intended for four players. Each player has an army with pieces of dierent ranks, similar as in Stratego. Kriegspiel (Ciancarini and Favini, 2007) is a chess game where the location of the opponents pieces is unknown. The standard chess set-up stays unchanged. However, the player will need to guess or estimate the location of the opponents pieces.

1.4

Problem Statement and Research Questions

In this thesis we investigate the relevance of informed search to improve computer game-play. The following problem statement guides the research: Problem Statement. How can we develop informed-search methods in such a way that programs signicantly improve their performance in Stratego? To understand the complexity of Stratego we can use the state-space and the game-tree complexity as measure and dene Stratego in the game-space (Allis, 1994). The position of Stratego in the game-space indicates if the solvability of the game and informs us if we can use a knowledge-based or search-based approach (Van den Herik et al., 2002). Research Question 1. What is the complexity of Stratego? Several informed search techniques exist in order to improve the game play. The most famous one is the minimax-based - search technique. Unfortunately, - is only applicable for deterministic perfect-information games. A derivative of minimax has been developed to address stochastic games called expectimax (Michie, 1966). It has been successfully been applied in the stochastic game of Backgammon (Hauk, Buro, and Schaeer, 2004). A direct successor of expectimax is the group of algorithms called *-minimax (Ballard, 1983), which is able to prune substantial parts of the search tree.

1.5 Outline of the thesis

The characteristics of Stratego indicate that *-minimax might be a successful approach for Stratego. This leads to our second research question: Research Question 2. To what extent can *-minimax techniques be utilized in order to realize competitive game play in Stratego?

1.5

Outline of the thesis

Chapter 1 explains the subject of the thesis and presents the research questions. Chapter 2 introduces the basics of Stratego, such as the pieces, the game board, the rules and strategies for the game. Chapter 3 gives an insight in the complexity of Stratego and answers the rst research question. This chapter also compares the complexity of Stratego with other games. Chapter 4 explains the search techniques and enhancements used in the implementation of Stratego. Chapter 5 gives an insight into the dierent aspects of the evaluation function, such as piece value, information value and strategic positions. Chapter 6 lists the tests and their results. These results will answer the second research question. Chapter 7 concludes the thesis and answers the research questions.

Introduction

Chapter 2

The Game of Stratego


In Section 2.1 the rules of Stratego will be discussed. In Section 2.2 the notation of Stratego will be introduced. It will also look at strategies and knowledge in Stratego in Section 2.3.

2.1

Rules

The rules of the game are described by Hasbro (2002). The goal of the game is to capture the opponents ag. The detailed rules are given below.

Start of the Game


Stratego is played on a 10 10 board with two lakes of size 2 2 in the middle of the board. The players place their 40 pieces with the back of the piece toward the opponent in a 4 10 area. One player takes the blue pieces, called the blue player, and the other player takes the red pieces, called the red player. The movable pieces are divided in ranks (from the lowest to the highest): Spy, Scout, Miner, Sergeant, Lieutenant, Captain, Colonel, Major, General, Marshal. The player also has two unmovable piece types, the Flag and the Bomb. Table 2.1 shows the number of pieces for each player. After the pieces of both players are placed on the board, the red player makes the rst move. A player can either choose to move a piece or attack an opponents piece. 1 1 5 4 3 1 Flag Spy Miners Lieutenants Colonels General 6 8 4 4 2 1 Bombs Scouts Sergeants Captains Majors Marshal

Table 2.1: Available Stratego pieces for each player

Movement
Alternately the Red player and the Blue player moves. Pieces are moved one square at the time to orthogonally-adjacent vacant squares. The Scout is an exception to this rule, as explained below. The lakes in the center of the board contain no squares, therefore a piece can neither move into nor cross the lakes. Only one piece can occupy a square. Each turn only one piece may be moved.

The Game of Stratego Scouts can be moved across any number of vacant squares either forward, backward or sideways. The player must move or attack in his turn.

Attack
A player can attack if a players piece is orthogonally adjacent to a square occupied by an opponents piece. A Scout can attack from any distance in a straight line provided that the squares between itself and the attacked piece are vacant. It is not required for a player to attack an adjacent opponents piece. The piece with the lowest rank loses the attack and will be removed from the board. If the attacking piece wins, it will be moved on to the square it attacked. If the attacking piece loses no piece will be moved. If a piece attacks an opponents piece of the same rank, both pieces are removed from the board. The Spy defeats the Marshal if it attacks the Marshal. Otherwise the Spy will lose the attack. Every movable piece, except the Miner, that attacks the Bomb will lose the attack and will be removed from the board. When a Miner attacks a Bomb, the Bomb will lose and will be removed from the board. The Miner now moves to the square which the Bomb occupied. The Bomb and Flag cannot attack.

Two-squares rule
To avoid repetition a players piece may not move back and forth on the same two-squares in a row for more than ve times. This is called the two-squares rule. If the player moves a new piece the two-squares rule is interrupted and starts again for the new piece. The actions of the opponent do not inuence the two-squares rule for the player. If the player moves to a square other than the previous square then the two-squares rule is interrupted and the two-squares rule applies to the current and previous square. In case of the Scout the rules are extended. The rst of the ve moves set the range where the twosquares rule will apply. Where a normal piece is only allowed to move ve times between two-squares, the Scout is only allowed to move ve times within the range of his rst move.

Chasing rule
An additional tournament rule dened by the International Stratego Federation (2009) is the chasing rule. This rule prevents the players to end up in a continuous chase of two pieces by non-stop threatening and evading. The threatening piece may not play a move such that the game ends up in a state that has been visited already during the chase. The only exception occurs when a threatening piece moves back to the previous square as long is does not violate the two-squares rule.

End of the game


The game ends when the Flag of one of the players is captured. The player whose Flag is captured loses the game. If one player is not able to move or attack, this player forfeits the game. If both players cannot move or attack, the game is a draw.

2.2

Notation

There is no ocial notation in the game of Stratego. Several notations are available but rely on perfect information of the initial piece set-up of both players. We will use a chess-like notation for Stratego.

2.2 Notation

10 9 8 7 6 5 4 3 2 1 A B C D E F G H J K
~ ~ ~ ~ ~ ~ ~ ~

Figure 2.1: Stratego board with coordinates We rst dene the game board in a chess form, see Figure 2.1, where the ten les are labeled A to K. To prevent confusion between the letters I and J, the letter I has been skipped in the notation. The ten ranks are numbered starting at 1. The last rank is labeled 0 instead of 10. A move will be indicated with the origin and the destination of the piece separated by a hyphen. E.g. a move of a piece from square d8 to square e8 is notated as: d8-e8, see Figure 2.2a. Attack moves will be denoted with x instead of a hyphen between the moves. An additional feature in Stratego would be the revealing of the pieces. The ranks will be notated in a numerical form for convenience from high to low starting with the number 1 for the Marshal. Only the Spy will be written down as S, the Bomb as B and the Flag as F. The rank of the piece will be placed after the square and a #. An attack of the Marshal at square d8 towards a piece at square e8, which is occupied by a Colonel will be written down in the following notation: d8#1xe8#4, see Figure 2.2b. Moves of pieces from which the rank is known will indicate the rank in a similar form, e.g., a Marshal moving from e8 to e7 will be notated as e8#1-e8, see Figure 2.2c.

10 9 8 7 6 5 4 3 2 1 a b c d e f g h j k
~ ~ ~ ~ ~ ~ ~ ~

10 9 8 7 6 5 4 3 2 1 a b c d e f g h j k
~ ~ ~ ~ ~ ~ ~ ~ 1 4

10 9 8 7 6 5 4 3 2 1 a b c d e f g h j k
~ ~ ~ ~ ~ ~ ~ ~ 1

(a) Normal move: d8-e8

(b) Attack move: d8#1xe8#8

(c) Normal move with known piece: e8#1-e7

Figure 2.2: Notation of moves in Stratego

The Game of Stratego

2.3

Strategies and Knowledge

This section will give an overview of human strategies and knowledge in the game of Stratego. The setup of the pieces and the play of the pieces have a great inuence on the game.

2.3.1

Flag Position

The goal of the game is to capture the opponents Flag. Therefore it is important to guard the Flag against the opponents pieces. The main strategy used is to place the Flag adjacent to the edge or in the corners. In this way the Flag can only be reached by two or three directions. When a player places Bombs around the Flag the opponent can only reach the Flag by using a Miner. A player could create a diversion by placing a similar structure guarding a non-important piece along the backline. This requires the player to play more moves to nd the opponents Flag.

2.3.2

Miner

When the opponents Flag is surrounded by Bombs the player must make sure it is still able to capture the Flag by removing the Bombs using his Miners. To prevent a loss on forehand some of the players Miners have to remain on the board until the possible location of the Flag has been discovered.

2.3.3

The Spy

The only movable piece that the Spy is able to defeat is the Marshal and the Marshal is the only movable piece that can defeat the General. If the General is defeated by the Marshal, the Marshal is known and the Spy can try to defeat the Marshal. To defeat the Marshal as quick as possible the Spy would need to be close to the General, preferable an adjacent square to attack immediately after the defeat of the General. However, if the General would travel around the board with an adjacent unknown piece, this piece is most likely the Spy.

2.4

Other Versions

There are a few variations on Stratego. There is a Stratego game made for 4 players, where each player has 21 pieces and all 4 players play on a 9 9 board, excluding the setup squares. A similar variant exists for 3 and 6 players. Hertog Jan Stratego, specially developed for the beer brewery Hertog Jan, has 12 pieces on an 10 10 board with similar rules. But it has 12 custom pieces, such as an Archer, a piece that can attack over a small distance and the Duke, which represents the Flag but can move. A 3D Stratego, called Stratego Fortress which includes multiple levels of board and trap doors and is set in medieval atmosphere. Players can build a custom fortress where they need to hide a treasure instead of the Flag. These days Stratego comes in dierent forms, besides the original Stratego, there have been themed variations, such as a Star Wars-theme, Lord of the Rings-theme and many more.

Chapter 3

Complexity Analysis
In this chapter we investigate dierent complexity measures for Stratego and compare the results to other games. In Section 3.1 we look at the complexity of choosing a board setup, in Section 3.2 we will analyze the state-space complexity of the game and in Section 3.3 we will analyze the game tree complexity, (Allis, 1994).

3.1

Board-setup Complexity

A player is able to place his own 40 pieces on the board. The player places his pieces in a 4 10 area. We can calculate the number of possible board setup congurations in Stratego by calculating all the combination to place the 40 pieces: 40 1, 1, 2, 3, 4, 4, 4, 5, 8, 1, 6, 1 = 40! (1!)4 2!3!(4!)3 5!8!6! (3.1)

By excluding mirrored setups the number of dierent board setups equals 533 1023 . Compared to the complexity of other games as in Table 3.1 we can see that the initial possible game states of Stratego exceeds the complexity of games such as checkers.

3.2

State-space Complexity

The state-space complexity is the collection of all reachable board congurations a game can have. This number is dicult to compute exactly, therefore we compute an upper bound by counting all the legal positions in Stratego, this also includes unreachable positions. There is a dierence between the concept of an illegal conguration and an unreachable position. An illegal conguration in Stratego would be a board where both ags are missing or a board where the two ags are placed next to each other. A legal but an unreachable position is a board conguration where all the lanes are blocked by bombs but a piece has passed the bombs. As a result of variable empty spaces and removed pieces we can use a similar but more complex formula than Formula 3.1 for the computation of the state-space complexity. To calculate this upper bound of the state-space complexity we need to summate over 24 variables. Every variable is one of the twelve either blue or red pieces. Every variable can vary between zero and the maximum amount of pieces of that type in the game. For every possible combination we need to calculate a formula similar as Formula 3.1: 92 # of free positions, # red ags, # red bombs, . . . , # blue ags, . . . , # blue marshals

(3.2)

However because we are dealing with variables in this formula we need to summate over the dierent variables as indicated by Formula 3.3. This Formula takes into account that Bombs may only be placed within the 4 10 start area and the fact that both players need their Flag to be on the game board. The computation was done using hashed values to decrease the computation time.

10

Complexity Analysis

40 40

39! 39! (39 y)! y! (39 z)! z!

red red spies scouts blue blue generals marshals

red blue bombs bombs

(3.3)

(90 y z i)! (90 y z i j w)! (90 y z)! (90 y z i)! i! (90 y z i j)! j! (90 y z i j w x)! x! The number of free squares depends on the number of pieces on the board. There are 92 (where 2 of them are always occupied by both the Flags) available squares minus the number of pieces on board. The upper bound of the state-space of Stratego is 10115 .

3.3

Game-tree complexity

Based on approximately 4,500 games from the Gravon Stratego (2009) database which did not result in a forfeit, the average number of moves in a Stratego game is calculated. This average is 381.134 ply, a ply is a turn for one player.

25

20 Branching factor

15

10

100

200

300

400

500 Move number

600

700

800

900

1000

Figure 3.1: Average branching factor of Stratego per move number The branching factor in the beginning of a Stratego game is approximate 23 and slowly decreases to 15 towards the end of the game. Figure 3.1 shows the average branching factor per move. These values are based upon the same games from Gravon Stratego. At around 150 moves the branching factor reaches its peak with an average of 25. Then the branching factor almost linearly decreases towards 15. After 700 moves the branching factor increases again and uctuates. Because there are fewer games that last longer than 700 moves the signicance of these branching factors decreases. The average branching factor of Stratego is approximately 21.739. And the game has an average of 30.363 chance nodes where each chance node has an average branching factor of 6.823. The gametree upper bound is the branching factor to the power of the average game length. In other words 21.739381.134 6.82330.363 10535 . The game-tree complexity is independent of the starting position.

3.4

Stratego Compared to Other Games

In Section 3.2 and 3.3 we computed the state-space complexity (10115 ) for Stratego and the game-tree complexity (10535 ). In Table 3.1 Stratego is compared to other games (Van den Herik et al., 2002) and it is clear that Stratego is a complex game. The game-tree complexity is large because of the high game

3.4 Stratego Compared to Other Games Game Nine Mens Morris Pentominoes Awari Kalah(6,4) Connect-Four Domineering (8 8) Dakon-6 Checkers Othello Qubic Draughts Chess Chinese Chess Hex (11 11) Shogi Renju (15 15) Go-Moku (15 15) Stratego Go (19 19) State-space 1010 1012 1012 1013 1014 1015 1015 1021 1028 1030 1030 1046 1048 1057 1071 10105 10105 10115 10172 Game-tree 1050 1018 1032 1018 1021 1027 1033 1031 1058 1034 1054 10123 10150 1098 10226 1070 1070 10535 10360

11

Table 3.1: Complexity of several games

length (381 ply). The high game-tree complexity requires us to use more search based approaches to play the game of Stratego.

12

Complexity Analysis

Chapter 4

Search
This chapter introduces the search techniques used to nd a best possible move in Stratego. Section 4.1 till Section 4.6 will introduce basic techniques such as minimax, iterative deepening search, - pruning and negamax. Section 4.7 will introduce the family of *-minimax algorithms developed by Ballard (1983). Furthermore the transposition table and the enhancements for *-minimax is discussed in Section 4.8 Finally we will discuss move ordering techniques in Section 4.9 and the forwardpruning technique multi-cut in Section 4.10.

4.1

Introduction

The decisions that a player can take during the course of a game can be represented as a tree, where the current position is the root node. When the player has multiple moves available the tree branches. If a node succeeds another node that node is called a child node. Terminal positions (i.e., positions where the game has ended) have no branches and are called leaf nodes. The computer can now back up from that leaf node until it nds a branch that was not visited yet. In this way the computer creates a game tree with branches and leaf nodes. When the player nds the result of a leaf node (win or loss) it can backup the value to the parent branch and eventually to the root node. It is now possible to create a path to a leaf node with the best results. This becomes more complicated when two players are involved in the game because the second player also wants to win. Therefore the second player will try to make the rst player lose the game. This means that, in the game-tree, the second player will choose a branch that results in a loss. However, in practice it is infeasible to create the whole game tree because of its size. To compensate this, the players will evaluate their position in the game tree. This evaluation function, depending on its correctness, can forecast whether the current branch is a heading towards a win, loss or draw.

4.2

Minimax

Computers can use the minimax algorithm (Von Neumann and Morgenstern, 1944) to deal with a game tree. The algorithm searches the game tree to nd the best possible move. The rst player will try to maximize the score, and is called the MAX player, while the second player, the MIN player, tries to minimize the score. The algorithm will recursively alternate between the MAX player and the MIN player until it reaches a terminal node. The game tree will be searched in a depth-rst way meaning that the algorithm will search the deepest unvisited branch until a terminal node is reached. The value of the terminal node is then returned to the parent node. When all the branches of a node have been visited, the player can now choose the best branch from that node and return the value of that branch to the parent node again up to the root node. As stated before, the size of the game tree is usually too large to search through the complete game tree. Instead of searching the game tree until the minimax algorithm nds a terminal node, it is possible to end the search when it has reached a certain depth. This is called a depth-limited search. If the algorithm has reached a certain depth without encountering a terminal node it cannot back up a terminal value

14

Search

(win, loss or draw). Therefore it needs to evaluate its current position using an evaluation function that estimates the value of the reached node. This value will then be backed up instead of the terminal value. The optimal game or principal variation is the game where the MAX player always selects the move with the highest evaluation value and the MIN player the move with the lowest evaluation value. An example of such a search tree limited to depth 3 can be seen in Figure 4.1. The top node is the root node and the bottom nodes are the leaf nodes. The MAX player will explore the left most branch. Now the MIN player explores a branch, again the left most branch. This continues until a leaf node is found. The MAX player nds a leaf node with a value of 6. Next, the MAX player explores the second branch and nds the value 7. The MAX player will choose the move with the highest value, max(6, 7) = 6. The value of 6 is returned to the MIN player and the MIN player will explore the second branch. Again, the MAX player will choose the highest value of the two leaf nodes, max(7, 8) = 7. The MIN player chooses the node with the lowest value min(7, 6) = 6. This process will be repeated until all the children of the root node are explored. Eventually the best move will give a value of 6. The pseudo code of minimax can be found in Algorithm 1. Algorithm 1 Minimax algorithm function minimax(depth, board) value = max-value(depth, board) return value function max-value(depth, board) if depth = 0 or terminal then return evaluate() end if max = for all moves do doM ove() value = min-value(depth 1, board) undoM ove() if value > max then max = value end if end for return max function min-value(depth, board) if depth = 0 or terminal then return evaluate() end if min = for all moves do doM ove() value = max-value(depth 1, board) undoM ove() if value < min then min = value end if end for return min

4.3

Iterative Deepening Search

In advance it is not known how much time the search will cost. Iterative deepening search (Russell and Norvig, 1995) used in combination with depth-rst search makes it possible to nd the best search

4.4 Negamax

15

MAX

MIN

10

MAX

-7

-8

-2

10

Figure 4.1: Game tree generated by the minimax algorithm with corresponding values of the nodes depth within a predetermined amount of time. The algorithm will increase the search depth until it runs out of time. It starts a search with a depth limit of 1. If the search is nished and there is time left the limit is increased by a certain amount (e.g. one). When there is no time left for the search, it backs up the best move found so far. Iterative deepening is also known as an anytime algorithm because it has a best move at anytime. At rst this search method seems wasteful, because it visits several states multiple times. However, the majority of the nodes are at depth the maximum search depth. The nodes generated at depths lower than the maximum search depth are a small fraction of the total number of nodes generated during the search. The nodes at the maximum search depth d are only generated once during a search. All the nodes with a depth lower than d are generated multiple times. Nodes at depth d - 1 are generated twice, rst when the maximum depth is d - 1 and second when the maximum depth equals d. Nodes at depth d - 2 are generated tree times, this continues until the root, which are generated d times. The total number of nodes generated at depth d with a branching factor of b, as dened by Russell and Norvig (1995) is (d)b + (d 1)b2 + + (1)bd . For d = 5 and b = 10 the total number of nodes generated is 50 + 400 + 3, 000 + 20, 000 + 100, 000 = 123, 450. If we compare this to a depth-rst search with a complexity of bd . This generates a total amount of 10 + 100 + 1, 000 + 10, 000 + 100, 000 = 111, 110 nodes. Without search enhancements such as ordering of the nodes (Section 4.9) and pruning (Section 4.5) iterative deepening performs worse than a depth-rst search. With search enhancements iterative deepening is able to select the optimal path found so far to continue its search at a new depth. This eventually leads to nodes that do not need to be visited anymore and a lower number of generated nodes. A visual representation of iterative deepening can be seen in Figure 4.2. Figure 4.2a depicts a situation where the algorithm nished the rst iteration and only the root node is visited. In Figure 4.2b the search method has now nished the second iteration, the algorithm can now sort the child node of the root nodes and possible prune a child node. The algorithm continues until it runs out of time or when all the nodes are visited.

4.4

Negamax

A variation on the minimax algorithm is the negamax (Knuth and Moore, 1975) algorithm. Both the MIN player and the MAX player choose their optimal line of play. The MIN and MAX player can be interchanged with respect to the line of play when the values are negated. Suppose the MAX player starts the search. The MIN player will make a move on the second turn. If we interchange the MIN player and MAX player on ply two we only have to negate the values to achieve the same result. However, the original MAX player has become the MIN player now. On the next ply we

16
MAX MAX

Search

MIN

MIN

MAX

MAX

MIN

MIN

(a) depth 0
MAX

(b) depth 1
MAX

MIN

MIN

MAX

MAX

MIN

MIN

(c) depth 2

(d) depth 3

Figure 4.2: Iterative deepening until ply 3

can interchange the players and negate the values again to compensate this problem. This process can be repeated until the end of the search. In practice this is a simplied version of the minimax algorithm where both players use the same code to choose the optimal line of play. For coding purposes this a more simple and ecient manner to implement the minimax algorithm. The pseudo code example is shown in Algorithm 2. By negating the outcome for each player we can reduce the amount of code needed to retrieve the same result. This results in better maintainable code.

Algorithm 2 Negamax algorithm function negamax(depth, board) if depth = 0 or terminal then return evaluate() end if max = for all moves do doM ove() value = negamax(depth 1, board) undoM ove() if value > max then max = value end if end for return max

4.5 - Search

17

7: =6

MAX

4: =6

10: = 1

MIN

2: =6

6: =7

MAX

6 1

-7 3

7 5

-8

-2 8

1 9

10

Figure 4.3: - pruning in action. The principal variation is red

4.5

- Search

A widely used technique for backwards-pruning nodes which are not interesting for the player is the - enhancement for negamax (Knuth and Moore, 1975). If we take another look at the game-tree in Figure 4.1, it is possible to prune some nodes that will not inuence the results of the game-tree. The fourth leaf node with a value of 8 is irrelevant for the result. We will show why, when the third leaf node is found the MAX player can play a move with value 7. Now we substitute the value of the fourth leaf node with x. The value of the MIN node can be computed by min(6, max(7, x)). This means that the MAX player will play a move with a value of at least 7. The value of 6 that the MIN player found by exploring the rst branch is always a better move for the MIN player then. Therefore the MIN player does not need to know the value of node x to determine his value and this branch can be pruned. A similar situation occurs at the second branch of the root node. The MAX player chooses to play the leaf node with the value 1. This value is returned to the MIN player before searching the second branch of the MIN player. Again we substitute the value of the second branch with a variable, y. The value of the root node can be computed by max(6, min(1, y)). The MIN player will play a move with a value of at most 1. However the MAX player can play a move with a value of 6 at the root. Therefore the second branch of the MIN player does not need to be explored and can be pruned. A visual representation can be seen in Figure 4.3 In practice the minimum feasible value for the MAX player is stored in and the maximum feasible value for the MIN player is stored in . These values are passed down in the search method. The pseudo code of the negamax implementation of - pruning is shown in Algorithm 3. A similar variant exists for minimax however it will take more code to produce the same result.

4.6

Expectimax

To deal with chance nodes we cannot use the standard negamax algorithm, instead the expectimax (Michie, 1966) algorithm can be used. The method is similar to negamax in the sense that it visits all nodes in the game tree. The dierence is that expectimax deals with chance nodes in addition to MIN and MAX nodes. To dene the value of a chance node we cannot simply select the highest or lowest value from one of the successors. Instead an expected value has to be calculated. Every child node has a weight equal to the probability that the child node occurs. The expectimax value of a node is the weighted sum of the successors values: E(x) = Pi E(i) (4.1)

18 Algorithm 3 Negamax algorithm with - pruning function negamax(depth, board, , ) if depth = 0 or terminal then return evaluate() end if for all moves do doM ove() = max(, negamax(depth 1, board, , )) undoM ove() {beta cuto} if then return end if end for return Algorithm 4 The negamax formulation of the expectimax method function negamax(depth, board) if depth = 0 or terminal then return evaluate() end if max = for all moves do doM ove() if chanceEvent() then value = expectimax(depth 1, board) else value = negamax(depth 1, board) end if undoM ove() if value > max then max = value end if end for return max function expectimax(depth, board) if depth = 0 or terminal then return evaluate() end if sum = 0 for all chance events do doChanceEvent(board, event) sum+ = probability(event) search(depth, board) undoChanceEvent(board, event) end for return sum

Search

here E(i) is the evaluation value of node i and Pi is the probability of node i. Algorithm 4 shows an example of expectimax used in a negamax implementation. Figure 4.4 shows an example of the expectimax algorithm (Hauk, 2004). The MAX player encounters a chance node. There are six possible (uniform) chances nodes, node B till G. The MIN player will choose the node with the lowest value. Therefore node B will get the value 5, node C will get 10, etc. In the next step we multiply the chance of every node, 1 , with the value of the node and summate 6

4.7 *-Minimax Algorithms the outcomes. This results that the value of node A equals 3.5.

19

MAX

-3.5
_ 1 6 _ 1 6 _ 1 6

CHANCE

_ 1 6

_ 1 6

_ 1 6

B -5

C -10

-10

MIN

MAX

-5

-10

-9

-10

-1

Figure 4.4: An expectimax tree with uniform chances

4.7

*-Minimax Algorithms

Expectimax is a suitable algorithm to handle chance events. However the negamax algorithm is able to prune a great number of nodes using the - enhancement. We would like a similar technique for expectimax but we cannot know the value of a chance node before we have visited all the children nodes. However, if the upper and lower bound on the game values are known and the and values of the parent node were passed down it is possible to calculate if the theoretical possible value for the chance node falls outside the bounds. Ballard (1983) introduced an - enhancement for expectimax called *-minimax. These type of nodes are referred to as *nodes by Ballard. The procedure for the standard MIN and MAX nodes will stay unchanged. Ballard has proposed two versions of this algorithm, called Star1 and Star2. These will be discussed in Section 4.7.1 and 4.7.2. An extended version called Star2.5 is a rened version of the Star2 algorithm and will be discussed in Section 4.8.2.

4.7.1

Star1

This section will introduce the idea of - pruning in chance nodes. With every new piece of information in a chance node, the theoretical bounds of the chance node can be narrowed until a cuto occurs or all child nodes are visited. The Star1 algorithm will rst be explained using uniform chances and afterwards non-uniform chances will be introduced. As an example, consider an evaluation function which is bounded by the interval [-10, 10]. The current chance node has four children (with equal probability) of which two have been visited. This situation is depicted in Figure 4.5. We can calculate the lower bound of the chance node , namely 1 1 4 (5 + 3 + 10 + 10) = 3. The upper bound of the chance node is 4 (5 + 3 + 10 + 10) = 7. If an value 7 was passed down to the chance node a pruning can take a place. Even in the best possible case the value of the complete chance node will not inuence the principal variation. Similarly, if a value 3 was passed down, a pruning occurs. The lower and upper bound of the evaluation function are denoted as L and U . V1 , V2 Vn are the values of the N successors of the current chance node, whose ith successor is being searched. The lower bound of the chance node is (V1 + V2 , + Vi1 ) + Vi + L (N i) N and the upper bound is (V1 + V2 , + Vi1 ) + Vi + U (N i) N (4.3) (4.2)

20

Search

CHANCE

1 _ 4

1 _ 4

1 _ 4

1 _ 4

Figure 4.5: A chance node When the and values are passed down from the parent node it is possible to get a pruning. Like in the - enhancement for negamax a cuto occurs when the upper bound of the chance node is less or equal to . (V1 + V2 , + Vi1 ) + Vi + U (N i) N or if the lower bound is larger or equal to . (V1 + V2 , + Vi1 ) + Vi + L (N i) N (4.5) (4.4)

This creates a possibility of an --like cuto in the chance node. Star1 gives also the possibility to calculate a new and for child nodes of a chance node. This method can be generalized for all chance nodes. This cuto will not inuence the principal variation. By rewriting Formulas 4.4 and 4.5 a new window for the child node is calculated which can trigger a cuto in the child node. If we denote Ai as the lower bound of the ith successor then Ai = N (V1 + V2 Vi1 ) U (N i) (4.6)

where is the same as passed down to the chance node. For the upper bound, denoted as Bi we have Bi = N (V1 + V2 + Vi1 ) L (N i). (4.7)

Again stays unchanged. Initially the value of A1 is N U (N 1) or N ( U ) + U and the value for B1 equals N L (N 1) of N ( L) + L. The upcoming values of A and B can computed using Formula 4.6 and Formula 4.7. Ballard uses an iterative method to update the values of A and B after calculating the initial values. Ai+1 = Ai + U Vi Bi+1 = Bi + L Vi (4.8) (4.9)

The implementation of these formulas in pseudo-code can be seen in Algorithm 5. The values A and B are reused every iteration and therefore have no index i. AX and BX represent the values that will be passed down to the children. These values are limited by the bounds of the evaluation function. We take a look at the example in Figure 4.6. The node A is entered with a lower bound () of 3 and an upper bound () of 4. We will write this as [3, 4]. The game values vary between L = 10 and U = 10. When the rst child of A is searched we calculate the upper and lower bound of the node. 1 equals 3 (3 L) = L and the value equals 1 (3 U ) = U , this means that the bounds for child B are 3 [10, 10]. After searching node B, we nd a value of 2. If we move on to the second child, C, we can 1 calculate the range of values for node A, a lower bound of 3 (2 + 2 L) = 1 18 = 6 and an upper 3 1 1 1 bound of 3 (2 + 2 U ) = 3 22 = 7 3 . This does not result in a cuto because 6 is not greater or equal

4.7 *-Minimax Algorithms

21

Algorithm 5 The negamax formulation of the Star1 algorithm with uniform chances function search(depth, board) if depth = 0 or terminal then return evaluate() end if max = for all moves do doM ove() if chanceEvent() then = max(, star1(depth 1, board, , )) else = max(, negamax(depth 1, board, , )) end if undoM ove() if then return end if end for return function star1(depth, board, , ) if depth = 0 or terminal then return evaluate() end if N = generateChanceEvents(); A = N ( U ) + U B = N ( L) + L vsum = 0 for i = 1, i N, i++ do AX = max(A, L) BX = min(B, U ) doChanceEvent(board, event) value = search(depth, board, BX, AX) undoChanceEvent(board, event) vsum+ = v if v A then vsum+ = U (N i) return vsum/N end if if v B then vsum+ = L (N 1) return vsum/N end if A+ = U value B+ = L value end for return vsum/N

22
MAX

Search

[3,4]

CHANCE

[-10,10]

B 2

[-3,10]

D ?

MIN

[-3,10]

E -8

F ?

G ?

CHANCE

Figure 4.6: A chance node to 4 and 7 1 is not smaller or equal to 3. Before visiting node C we calculate the new and values we 3 will pass down using Formula 4.6, Ai = 3 3 2 10 = 3 Bi = 3 4 2 + 10 = 20 (4.10)

(4.11)

The value for node C is 20, however the highest possible value of a leaf node is 10. Therefore the value will be set to 10 instead of 20. The node C will be passed down a window of [3, 10]. The rst node visited is the node E, which returns a value of 8. Since 8 3 this will cause a cuto at node C. In node A we check again if a cuto has occurred by using Formula 4.4 and Formula 4.5 (2 8 + 10) 1 =1 3 3 3 (4.12)

1 It is clear that 1 3 3 and there will also be a cuto at node A. The upper bound is not calculated because it is not important anymore. We cannot return the exact value of node A because node D is pruned. The original value is returned as the value of node A. This ensures that node A will automatically be pruned at a lower depth.

[-2, 2]
_ 1 6 _ 1 6 _ 1 6

[-62,62] [-47,57] [-27,57] [-17,47] [-8,36]

MAX

CHANCE
_ 1 6

_ 1 6

_ 1 6

B -5

-10

-10

MIN

MAX

-5

-10

-9

-10

-1

Figure 4.7: The Star1 algorithm in action

4.7 *-Minimax Algorithms

23

Figure 4.7 depicts the expectimax tree example of Figure 4.4, now visited by the Star1 algorithm. The evaluation function is still bounded by the bounds [10, 10]. The node A is entered with a window of [2, 2]. The bounds for the rst node, node B, are computed using Equations 4.6 and 4.7: A1 = 6 2 10 (6 1) = 62 and B1 = 6 2 10 (6 1) = 62. The values of A1 and B1 are limited to the bounds of the evaluation function, therefore the node B will be entered with the bounds [10, 10]. Node B returns the value 5 and the new window for node C can be calculated using Formula 4.8 and 4.9: A2 = 62 + 10 + 5 = 47 and B2 = 62 10 + 5 = 57. Again the window will be bounded by the evaluation function. When the search continues with the other children, the windows are updated for every child accordingly. Up until node F the windows will fall outside the bounds of the evaluation function. Node F will be entered with a window of [8, 36]. When node P is visited the nodes Q and R will be pruned because the value of node P, 10 falls outside the window of [8, 36]. This will also trigger a cuto in node A, which has a window of [2, 2] and therefore node G will also be pruned. In Figure 4.4 it was clear that node A has the value 3.5 and indeed falls outside the window [2, 2]. Up to known the theory and examples assume that the probabilities of each child are uniform. In practice this is not always the case therefore non-uniform probabilities are introduced into the Star1 algorithm. Formula 4.2 and Formula 4.3 can be rewritten to calculate the new lower and upper bound of the chance nodes. Here, Pi is the probability of node i occurring: (P1 V1 + P2 V2 , + Pi1 Vi1 ) + Pi Vi + L (1 P1 P2 Pi ) Pi and (P1 V1 + P2 V2 , + Pi1 Vi1 ) + Pi Vi + U (1 P1 P2 Pi ) . Pi Formula 4.6 and Formula 4.7 can be rewritten to calculate the values for Ai and Bi : Ai = U (1 P1 P2 Pi ) (P1 V1 + P2 V2 + + Pi1 Vi1 ) Pi (4.15) (4.14) (4.13)

where is the same as passed down to the chance node. For the upper bound, denoted as Bi Bi = L (1 P1 P2 Pi ) (P1 V1 + P2 V2 + + Pi1 Vi1 ) . Pi (4.16)

Again these formulas can be computed incrementally. To simplify the equations, Y = (1 P1 P2 Pi ) can be substituted where Y can be computed incrementally, Yi+1 = Yi Pi+1 and Y1 = 0. Secondly, X = (P1 V1 +P2 V2 + +Pi1 Vi1 ) is replaced, which can also be computed incrementally, Xi+1 = Xi + Pi Vi and where X1 = 0. By using X and Y it is possible to rewrite in a compact manner the incremental update of A and B: Ai = U Yi Xi L Yi Xi , Bi = . Pi Pi (4.17)

4.7.2

Star2

The Star1 algorithm is an enhancement for expectimax but still needs to visit a large number of nodes to realize a cuto. This is a result of the fact that Star1 assumes the worst-case scenario about unseen childrens value. Star1 does not and cannot use any information about the structure of the game tree. Therefore Ballard (1983) has also introduced the Star2 algorithm. In most search trees of board games the order of play is always the same, a MIN player follows a MAX player and vice versa. Games involving chance events can add an additional step between the MIN player and the MAX player, a chance event. Ballard (1983) refers to this type of trees as regular *-minimax trees. In negamax a MIN players follows a MAX player. Star2 can use this information to improve the lower bound before searching the children. Instead of assuming that the child has a value equal to the lower bound or upper bound of the evaluation function, a more accurate lower bound could be calculated by searching a k number of successors of the child node. This will improve the search window because it is not likely that all the values of the childs successors equal the lower and upper bound values, the

24

Search

terminal values of the evaluation function. This is called the probing phase of the algorithm. If k is equal to 0 than Star2 has no eect. The eectiveness of the probing phase depends on the selection of the successor that will be probed. This can be done in several ways, e.g. selecting a random move, the rst generated move. Using move ordering as described in Section 4.9 will most likely return the best move to probe. However, even a random move can reduce the search window because it has a high probability that it is not equal to the lower or upper bound of the evaluation function. In the worst case Star2 causes a search overhead. The probed values for each child node will be stored in the array W and will be used to check for cutos and calculate the new initial bounds. For simplicity it is assumed that the probabilities are uniform. The Formula 4.5 is modied in such a way that instead of using the lower bound, L , the probed values are used: (V1 + V2 + + Vi1 ) + Vi + (Wi+1 + + WN ) . N (4.18)

The values of W1 to Wi1 can be substituted by the real values of the corresponding nodes. In practice it is easier to retain the values W1 , . . . , Wi1 and replace them with the values V1 , . . . , Vi1 . Because of the probing phase is not possible to calculate the values for B using 4.3. Therefore the value of B will be calculated using the following formula: Bi = N (V1 + V2 + + Vi1 ) (Wi+1 + + WN ). (4.19)

The value of B is initialized in a similar manner as done in Star1 by adjusting Formula 4.19. B1 = N (W1 + W2 + + WN ) and can be updated incrementally by Bi+1 = Bi + Wi Vi . Algorithm 6 shows the pseudo-code of the Star2 algorithm. The probe method is a simple method that selects k number of children at random or by the means of move ordering and searches them. Another possibility is to adapt the standard search method to deal with probing. In Figure 4.8 (Hauk, 2004) the Star2 algorithm visits the search tree from Figure 4.4. By only visiting the probed nodes, Star2 creates a cuto before searching all the nodes of the child. Eventually the search window becomes [8, 62] or [8, 10] when the window is bound to the lower and upper bound. When node F returns the value 10 this value falls outside the window [8, 62] and a cuto occurs.

[-2, 2]
_ 1 6 _ 1 6 _ 1 6

[-62,62] [-47,62] [-27,62] [-17,62] [-8,62]

MAX

CHANCE
_ 1 6

_ 1 6

_ 1 6

B -5

-10

-10

MIN

MAX

-5

-10

-9

-10

-1

Figure 4.8: The Star2 algorithm in action For the probing phase only the calculation for B needs to be updated. This can be done iteratively: Bi = Wi Xi Pi (4.20)

where Wi = (Wi+1 + + WN ), the sum of the unvisited nodes not yet probed.

4.7 *-Minimax Algorithms

25

Algorithm 6 The negamax formulation of the Star2 algorithm with uniform chances function star2(depth, board, , ) if depth = 0 or terminal then return evaluate() end if N = generateChanceEvents(); W [N ] = array() A = N ( U ) + U B = N ( L) + L AX = max(A, L) vsum = 0 for i = 1, i N, i++ do BX = min(B, U ) value = probe(depth, board, BX, AX) vsum+ = value W [i] = value if value B then vsum+ = L (N 1) return vsum/N end if B+ = L value end for for i = 1, i N, i++ do B+ = W [i] AX = max(A, L) BX = min(B, U ) doChanceEvent(board, event) value = search(depth, board, BX, AX) undoChanceEvent(board, event) vsum+ = v if value A then vsum+ = U (N i) return vsum/N end if if value B then vsum+ = L (N 1) return vsum/N end if A+ = U value B = value end for return vsum/N

26

Search

4.8

Transposition Tables

During a search identical board positions can occur, called transpositions. Instead of searching the board position again it is possible to store the result of the previous search in a so-called transposition table (Greenblatt, Eastlake, and Crocker, 1967 and Slate and Atkin, 1977). The transposition table stores the board position, using a generated key with information of the board, such as the best move and the value of the best move. To prevent storing the complete board the transposition table stores a 64-bit hash value of the board position using the Zobrist hash (Zobrist, 1970). This will be discussed in Section 4.8.1. In Section 4.8.2 an improved version of the Star2 algorithm will be presented using transposition tables.

4.8.1

Zobrist Hashing

Zobrist hashing (Zobrist, 1970) is a method that generates a hash value for a game board. For each piece at each square the hash method needs a unique value. In Stratego there are 12 ranks and 3 possible states of every piece: not known and not moved; not known and moved; and known. There are 92 reachable squares on the game board. This gives a total of 12 3 92 = 3,312 unique 64-bit values. These values will be generated at the initialization of the computer player using a random generator. The game hash can change on four occasions, piece placement, piece movement, piece removal and change of player. 1. On piece placement, when players add the pieces to the game board, the new hash value is computed by a XOR operation on the board with the hash value of a certain piece at a certain square. 2. On piece removal the same procedure is executed. The hash value is computed by a XOR operation of the hash value of a certain piece at a certain position with the current hash value. 3. When a piece moves from a square to another square, the piece is rst removed from the old position and replaced at the new position. It is similar to a piece removal and a piece placement. 4. Every time the players change turns, the hash will be updated and thus keeping track of each players turn. Initially the empty board hash has the value 0. The advantage of the XOR operation is that it is a bitwise operation and therefore a fast operation for a machine. The computed hash value now serves as the key for the associated board position in the transposition table. Instead of using the complete 64 bit key, the rst n bits of the hash value are used as the key, the hash index. The remaining bits, the hash key, are used to discriminate between dierent positions mapping the same hash index.

Components of the table


A transposition table will need the following components to be eective (Breuker, 1998): key the remaining bits of the hash value move the best move found so far for the current position score the score associated with the best move ag denes the type of score: lower bound, upper bound or an exact value depth the relative depth at which the best move was found During a search the positions are retrieved from the transposition table before searching the board position and afterwards when the information of the position will be stored in the transposition table. If the position exists in the transposition table before the position is searched the algorithm can check if the information is useful, this depends on the ag and the depth.

4.8 Transposition Tables

27

Algorithm 7 Negamax with transposition tables function negamax(depth, board, , ) old = retrieve(board, depth) {if the position is not found, TTdepth will be -1} if T T depth depth then if f lag = EXACTVALUE then return score else if f lag = LOWERBOUND then = max(, T T value) else if f lag = UPPERBOUND then = min(, T T value) end if if then return end if end if if depth = 0 or terminal then return evaluate() end if bestvalue = for all moves do doM ove() value = negamax(depth 1, board, , ) undoM ove() = max(, value) if value bestV alue then bestValue = value end if if then break end if end for if bestValue old then f lag = UPPERBOUND else if bestValue then f lag = LOWERBOUND else f lag = EXACTVALUE end if store(board, depth, bestM ove, bestV alue, f lag) return

28

Search 1. When the remaining search depth is lower than the search depth of the transposition table entry the information can be used, because it guarantees that the information in the transposition table gives at least the same information as a search would do. If the value is an exact value this value can be returned immediately and the position does not have to be searched anymore. 2. When the remaining search depth is lower than the search depth of the transposition table entry and the value is an upper or lower bound, the value can be used to narrow the - window. The value can be adjusted if the value is a lower bound and the value is larger than . The same holds for when the value is an upper bound and the value is smaller than . Furthermore a cuto can occur when is larger or equal to . 3. When the remaining search depth is higher than the search depth of the transposition table entry the information in the transposition table is not useful, except the best move. The best move can be searched rst and could possibly lead to an early cuto. See also Section 4.9 for more information about move ordering.

After searching the position the information found can be stored in the transposition table. When the transposition table does not have an entry with the same key the information can be stored directly. However, when the entry already exists, called a collision, a choice has to be made which information is more important to preserve. Breuker, Uiterwijk, and Vd Herik (1994) have proposed several replacement schemes: 1. Deep In case of a collision, this scheme preserves the board position with the deepest subtree. The scheme assumes that a deeper search has resulted in more nodes visited than a shallower search and therefore gives a better estimate of the utility of the board position. 2. New In this case the transposition table will always replace the old entry with the information of the current search. This is based upon the observation that changes occur locally. 3. Old This scheme works in a similar way as the New replacement scheme. However this scheme will never replace any entry in the transposition table. 4. Big When the board position contains many forced moves or many cutos, in case of good move ordering, the depth of the search is not a good indicator of the amount of search done on the board position. Thus instead of keeping track of the depth, one can keep track of the amount of nodes visited during the search. This scheme will therefore preserve the entry with the largest subtree. A drawback is that the number of nodes visited in the subtree also has to be stored in the transposition table. The Deep scheme was implemented based upon the experiments by Breuker et al. (1994).

4.8.2

StarETC

Ballard has proposed several enhancements for the Star2 algorithm, among them the use of a transposition table in *-minimax (Ballard, 1983). The transposition table can be used in the process of probing for values (Veness, 2006). StarETC is the stochastic variant for Enhanced Transposition Cuto (ETC). The values for the chance node children may be retrieved from the transposition table and stored in the array W as described in Section 4.7.2. Instead of searching the child nodes successors a bound or exact value can be retrieved from the transposition table. However, the information in the transposition table is only used when the relative depth of the table entry has at least a depth of d - 1. The probe phase normally returns an exact value of the childs successor. The transposition table does not only store exact values but also lower and upper bounds. To use this information, Star2 also needs to keep track of the separate lower and upper bounds. The array W, as described in Section 4.7.2, is extended to two dimensions, one for each bound. When the value of the table entry is an exact value, the upper and lower bound of W are both equal to the entry value.

4.9 Move Ordering

29

The second advantage of the transposition table is a chance on a preliminary cuto. Every time a useful table entry is found and W is updated, the lower bound of the board position is calculated using: P Wlower and the upper bound is calculated using P Wupper . P is an array containing the probabilities of each chance event. When the lower bound is or the upper bound is a cuto occurs. Finally, it is possible to store extra information in the transposition table when the Star1 creates a cuto. In the case of a fail low the position is stored as an upper bound with a value equal to A. Algorithm 8 shows the pseudo-code for the probing enhancement for Star2. It makes use of two extra functions, lowerbound() and upperbound(), to calculate the lower and upper bound of the probed cq. retrieved values for the nodes using Formula 4.19 and a similar formula for the lower bound. Also there are two functions to calculate the successor values. Depending on the implementation of these four functions, the algorithm can handle uniform and non-uniform probabilities. The array of stored values is expanded with a separate lower and upper bound to deal with the information from the transposition table.

4.9

Move Ordering

As stated before, move ordering is important in - based search. It can reduce the search tree signicantly. When searching it is preferable to search the best moves rst because it will be more likely that they create a cuto. The moves can be ordered using two approaches, static or dynamic. Static move ordering happens according to some predened rules, mostly based on human strategies and knowledge of the game. The History Heuristic (Schaeer, 1989) and transposition table ordering are dynamic ways of ordering the moves. This means that the move ordering depends on the positions visited so far. The moves will be weighted depending on the ordering, the possible best move will have the highest value and the worst move has the lowest value. Moves suggested by dynamic ordering will have the highest value because they have proven to be a good move in previous searches. A move that is in the transposition table will be searched rst. Winning moves or capturing moves are increased with a large value ensuring a win or a capture of a piece. Next the value of the move is incremented with the value of the history heuristic. Finally the value is slightly increased or decreased with values determined by the static ordering.

4.9.1

Static Ordering

Static ordering uses information about the type of piece and the goal and destination of the move. In Stratego we have ordered the moves as follows 1. A move that results in a direct win, capturing the opponents Flag. 2. An attacking move with a sure win. 3. A move attacking an unknown piece. 4. A move towards the opponent 5. A move sideways 6. A move backwards 7. An attacking move with a sure loss The moves that result in a win, attack an unknown piece have a large positive inuence in the move ordering. Non-capturing moves towards the opponent cause a slight increase of the value, moving away from the opponent is penalized. Finally, a clear loss results in a large decrease of the moves value.

30 Algorithm 8 The negamax formulation of the StarETC algorithm function starETC(depth, board, , ) if depth = 0 or terminal then return evaluate() end if if retrieve(board, depth) == success then if entry = EXACTVALUE then return entry.value end if if entry = LOWERBOUND then if entry.value then return entry.value = max(, entry.value) end if end if if entry = UPPERBOUND then if entry.value then return entry.value = min(, entry.value) end if end if end if N = generateChanceEvents(); W [N ] = array() for i = 1, i N, i++ do if retrieve(event[i], depth 1) == success then if entry = LOWERBOUND OR EXACTVALUE then W [i].lower = entry.value if lowerbound(W ) then store(board, depth, LOW ERBOU N D, lowerbound(W ) return lowerbound(W ) end if else if entry = UPPERBOUND OR EXACTVALUE then W [i].upper = entry.value if upperbound(W ) then store(board, depth, U P P ERBOU N D, upperbound(W ) return upperbound(W ) end if end if if entry = LOWERBOUND then if entry.value then return entry.value = max(, entry.value) end if end if if entry = UPPERBOUND then if entry.value then return entry.value = min(, entry.value) end if end if end if end for Algorithm continued on next page

Search

4.9 Move Ordering

31

Algorithm 8 continued for i = 1, i N, i++ do B = successorM ax(W, ) BX = min(B, U ) value = probe(depth 1, board, BX, W [i].lower) W [i].lower = value if value B then store(board, depth, LOW ERBOU N D, lowerbound(W )) return lowerbound(W ) end if end for for i = 1, i N, i++ do A = successorM in(W, ) B = successorM ax(W, ) AX = max(A, L) BX = min(B, U ) doChanceEvent(board, event) value = search(depth, board, BX, AX) undoChanceEvent(board, event) W [i].lower = value W [i].upper = value if v A then store(board, depth, U P P ERBOU N D, upperbound(W )) return upperbound(W ) end if if v B then store(board, depth, LOW ERBOU N D, lowerbound(W )) return lowerbound(W ) end if end for store(board, depth, EXACT V ALU E, lowerbound(W )) return lowerbound(W )

32

Search

4.9.2

History Heuristic

Schaeer (1989) introduced this technique to increase the eectiveness of move ordering. When a certain move m is the best move in a certain position, possibly creating a cuto, it has the possibility of being the best move in another similar position. Therefore it would be interesting to start the search with move m. The History Heuristics stores all these move in a history table, where each entry is a move with associated value. Every time a move creates a cuto or is the best move after searching the tree, the table entry of the move is incremented with a value, e.g. 2d , d or 1 (Winands et al., 2006), where d is the depth of the search tree so far. The history table is emptied every time the search is initialized. Based upon the literature, our implementation of Stratego updates the values in the history heuristic are with the value d.

4.9.3

Transposition Table Ordering

When the transposition table has an entry of the current position visited with a higher depth, the information in the entry is used to return an exact value or adjust the lower and upper bound for the search. When the entry in the transposition table is not an exact value, the move from the entry can be searched rst and might lead to a cuto. If the search depth of the entry is lower than the search depth remaining for the current position, this information is not useful to update the lower and upper bounds but the best move of the entry may still be the best move when searched deeper and possibly creating a cuto (Breuker, 1998). Therefore the search algorithm will visit the transposition table move rst. When the transposition table entry is useful, the corresponding move will not be visited anymore.

n1

n2

...

nm

...

nk

Figure 4.9: Multi-cut pruning

4.10

Multi-Cut

Multi-Cut (Bjrnsson and Marsland, 2001) is a forward-pruning technique which may improve the o playing strength of a program. It is used in a combination with - pruning. The nodes of a tree, using a negamax implementation, can be identied as follows: 1. The pv-node is a node that is a part of the principal variation, e.g. the root node. At least one child of the pv-node must have a minimax value and is the next pv-node, all remaining children are then cut-nodes.

4.10 Multi-Cut

33

2. A child of a cut-node which has a value higher than the value of the principal variation value, becomes an all-node. 3. Every child of an all-node is a cut-node. In - it is possible that the rst move of a cut-node will create a cuto and the other children do not have to be searched anymore. The idea of multi-cut is that a cut-node has many successors that cause a cuto and therefore will be pruned. Multi-cut will visit some of the successors at a reduces search depth and determine if the node can be pruned in advance. Bjrnsson and Marsland (2001) combines Multi-Cut with a principal variation search and searches o the principal variation rst before applying multi-cut. Winands et al. (2005) created a Multi-Cut that is able to prune at all-nodes and shows it is safe to forward prune at all-nodes and gives a reduction of the search. This version of multi-cut is implemented in our Stratego player. In Multi-Cut, M children of a possible cut-node are searched with a depth of d 1 R, where d is the search depth remaining and r is a search reduction. When C moves of the total M moves return a value larger than the node is pruned in advance, where 1 C M . This is depicted in Figure 4.9 (Bjrnsson and Marsland, 2001). The dotted area below n1 the children after nM are the savings that o multi-cut creates when a successful pruning was created. If a cuto occurs, the node will return the value. Otherwise, the node will be searched in full-depth. When no cuto occurs the nodes visited by Multi-Cut are extra overhead but could lead to move ordering information for the remaining search.

34

Search

Chapter 5

Evaluation Function
A game with a full search-based approach would not need a complex evaluation function because the only outcomes the leaf nodes have is win, loss or draw. However, when a game is too complex, such as Stratego, only a search-based approach does not suce. Therefore a combination of search and knowledge, in the form of an evaluation function, is needed. An evaluation function can implement human knowledge and strategies of the game or computer generated information such as an end-game database. An end-game database for Stratego would be too large to create. The evaluation function of Stratego mostly depends on human knowledge.

5.1

Piece Value

An evaluation function returns a numerical value to express the utility of the current position. Therefore, a value is assigned to each position. A way to do this is by counting the material on the board. A piece value can be static, a predetermined value that does not change over time, or dynamic, a value that changes over time. In Stratego the importance of the pieces changes over time. For example, the Spy is important when the opponent still has its Marshal in play. When the Marshal is removed from the board the Spy is only useful to capture the Flag or sacrice the piece to discover the identity of an unknown piece. Therefore the Spy will lose a large deal of its value because it cannot use its special ability anymore, capturing the Marshal. De Boer (2007) suggested a formula for the relative values of pieces in Stratego: The Flag has the highest value of all The Spys value is half of the value of the Marshal If the opponent has the Spy in play, the value of the Marshal is multiplied with 0.8 When the player has less than three Miners, the value of the Miners is multiplied with 4 #lef t. The same holds for the Scouts Bombs have half the value of the strongest piece of the opponent on the board The value of every piece is incremented with only a few are left
1 #lef t

to increase their importance of the piece when

In the situation when the opponent has a Sergeant as highest rank, it does not matter for the player if he has a General or Colonel as highest rank, because they both defeat all the pieces left of the opponent. The value of General or Colonel should be relative to the number of pieces they can capture. In Figure 5.1 such situation is depicted. The blue General (the 2) is the highest ranked piece on the board. However it would as just be valuable if it was, for example, a Marshal or Major, because it would still defeat the same number pieces. Similar is a situation when the opponent would lose his Marshal and the players General becomes the highest rank on the game board. Initially the General would have a low value because it can be defeated

36

Evaluation Function

by the Marshal. When the Marshal is removed from the game board the General is undefeatable and its value needs to be changed accordingly. A player loses if his Flag is captured and therefore the piece will be assigned a static terminal value which is larger than the bounds of the evaluation function to ensure a cuto in case of a win or loss.

10 9 8 7 6 5 4 3 2 1
B 5

7 7 2 ~ ~ ~ ~ ~ ~ ~ ~ 7 B F

F B

A B C D E

G H

Figure 5.1: Example where the rank of the highest piece is not important

5.2

Value of Information

Initially the pieces of the opponent are unknown to the player and the other way around. Most Stratego players will try to keep the identity of their pieces unknown as long as possible (De Boer, 2007). With imperfect information the players can mislead the opponent to make a bad move. This prevents the opponent to play the best move possible. E.g. if there is a situation where the opponent has an unknown Marshal and a few low-ranked unknown pieces left. When the opponent knows which piece is the Marshal, his General could safely attack the remaining unknown pieces. In Figure 5.2 the Red player has two unknown pieces left, one is the Marshal and the other piece is a Miner. The Blue player is now in the situation where he can attack both pieces. When the Blue player uses the Spy to attack and it shows that the attacked piece is the Marshal, the game will result in a denite win for Blue. However, if the blue Spy attacks the Miner, the Blue player loses his Spy and the Red player can easily capture one of the opponents Miners. This results in a clear loss for Blue.

5.3

Strategic Positions

The game board of Stratego contains three small areas with a size of two by two connecting both sides of the game board, separated by water, these are called lanes. Every piece going to the opposite side needs to pass these lanes. In some situations it is benecial to control these lanes, preventing the opponent to cross the board. Especially in the end game, when Miners need to remove the Bombs surrounding the Flag. A player can defend these lanes with only one piece per lane and preventing a loss. In Figure 5.3 the Red player is outnumbered and can only hope for a draw. The Red player is defending the lanes such that the Blue player cannot use its Miners to approach the Flag. Therefore the player forces a draw.

5.4 Implementation

37

10 9 8 7 6 5 4 3 2 1

F B

7 ~ ~ ~ ~ 7 S B B F B ~ ~ ~ ~

A B C D E

G H

Figure 5.2: Example where the Red player has a Miner and the Marshal left. If the Blue player attacks the Marshal with the Spy he wins, otherwise the Red player wins

10 9 7 8 7 6 5 4 3 2 B 1 F
B 3 ~ ~ ~ ~ 4 5 B

B F B 7 ~ ~ ~ ~ 4 B

3 8

A B C D E

G H

Figure 5.3: Example where the Red player controls all three lanes

5.4

Implementation

The key points mentioned in the previous section have been implemented as dierent independent features in an evaluation function.

38

Evaluation Function First feature is multiplying the value of the Marshal (both player and opponent) with 0.8 if the opponent has a Spy on the game board. Second feature multiplies the value of the Miners with 4 #lef t if the number of Miners is less than three. Third feature sets the value of the Bomb to half the value of the piece with the highest value. Fourth feature sets divides the value of the Marshal by two if the opponent has a Spy on the board. Fifth feature gives a penalty to pieces that are known to the opponent Sixth feature increases the value of a piece when the player has a more pieces of the same type than the opponent.

Eventually the value of a piece is multiplied with the number of times that the piece is on the board, and summated over all the pieces. The default values of each piece can be found in Table 5.1. These values are based upon the M.Sc. thesis by Steng (2006). ard Marshal General Colonel Major Lieutenant Captain 400 200 100 75 50 25 Sergeant Miner Scout Spy Bomb Flag 15 25 30 200 20 10000

Table 5.1: Default values of the evaluation function for each piece

Chapter 6

Experimental Results
This chapter presents the experiments, and their results, done in this research. Section 6.1 describes the experimental design, Section 6.2 shows the results of using dierent pruning techniques in MIN/MAX nodes. Section 6.3 measures the performance of the *-minimax algorithms from Ballard (1983) and the Transposition Table enhancement by Veness (2006). Section 6.4 gives an overview of the performance of evaluation function features. Finally Section 6.5 presents the experiments and results of Multi-Cut.

6.1

Experimental Design

The implemented techniques are tested using two approaches. The rst approach is the testing the performance of the technique with respect to the number of nodes visited. The test set consists of 300 board positions, where the rst 100 board positions are taken from the begin game of Stratego, 100 midgame positions and 100 positions are endgame positions. See Figure 6.1 for example board congurations. The second approach is using self play, where two programs play a match against each other. Here 11 predened setups (see Appendix A), from which each player is appointed a board setup, are used.

6.2

Pruning in MIN/MAX Nodes

This section focusses on pruning in deterministic nodes. Section 6.2.1 shows the results of Iterative Deepening. The eect of the History Heuristic is shown in Section 6.2.2. The results of the Transposition Table is described in Section 6.2.3 and Section 6.2.4 shows the result of combination of the History Heuristic and the Transposition Table. The techniques are compared to a default program using - pruning by default or if not mentioned otherwise. For computational reasons, all experiments are performed up to 5 ply.

6.2.1

Iterative Deepening

Table 6.1 shows that Iterative Deepening (ID) without enhancements provides a little overhead, 4.10% on a 5-ply search, when compared to a xed depth search. However, the advantage of Iterative Deepening is that it can be stopped anytime during the search at depth n and then is able to return the best move on a n 1 depth search. Ply 1 2 3 4 5 Without ID 10,344 264,264 5,994,388 215,558,850 5,408,948,866 With ID 10,344 274,608 6,268,996 221,827,846 5,630,776,712 Overhead 0.00% 3.91% 4.58% 2.91% 4.10%

Table 6.1: Overhead of Iterative Deepening

40

Experimental Results

10 9 8 7 6 5 4 5 2 3 4 S
8 ~ ~ ~ ~ 4 9 1 9 3 5 9 ~ ~ 6 ~ ~ 7 6

10 9 8 7 6 5
~ ~ ~ ~ 1 ~ ~ 1 ~ ~ 4 4

4 9 6 8 4 3 9 3 9 3 5 2 B 6 B S 8 5 B 7 1 7 8 B 7 6 B F B 5 8 a b c d e f g h j k

2 B 9 3 5 6 9 5 4 7 B 1 B B 8 8 B 9 8 8 B F a b c d e f g h j k

(a) Begin game position

(b) Midgame position

10 9 8 7 6 5 4 3
5 ~ ~ ~ ~ 8 4 3 6 8 9 B 3 4 5

5 ~ ~ ~ ~ S 3 4 2

2 9 B B a b c

1 8 B 7 8 4 B F B d e f g h j

(c) Endgame position

Figure 6.1: Example positions used in the test set

6.2.2

History Heuristic

The History Heuristic (HH) sorts the nodes dynamically based upon previous search results. Stratego is a game with an average branching factor of around 30, where some moves lead to already seen board congurations and some moves do not result in progression of the game. Static move ordering is used to make an initial ordering based on domain knowledge (see Section 4.9.1. The dynamic move ordering improves the static move ordering. The History Heuristic improves the performance of Iterative Deepening as shown in Table 6.2. The overhead of Iterative Deepening is still noticeable at ply 2 where the overhead is 3.26%. The gain of the dynamic move ordering in combination with Iterative Deepening begins at ply 3. The History Heuristic improves the node reduction up to 55.46% on a 5-ply search. Ply 1 2 3 4 5 Without HH & without ID 10,344 264,264 5,994,388 215,558,850 5,408,948,866 With HH & with ID 10,344 273,141 5,322,698 141,060,175 2,409,129,708 Gain 0.00% -3.36% 11.21% 34.56% 55.46%

Table 6.2: Node reduction by History Heuristic The table reveals that the dynamic move ordering has a positive eect on top of the static ordering as described in Section 4.9. The performance of the History Heuristic increases every ply. By looking at the trend of the gain we can assume that the performance of the History Heuristic improves even more when searching deeper than 5 ply.

6.3 Pruning in Chance Nodes

41

6.2.3

Transposition Table

The Transposition Table (TT) is able to reduce the number of nodes by keeping track of previous visited board congurations, the corresponding upper bound, lower bound and the best move. Every time an equal board conguration appears, the program can immediately retrieve the previously calculated upper and lower bound. These values are compared to the current and values and a cuto can occur. If these values do not provide a cuto, the Transposition Table can provide a best move which is searched rst. In Table 6.3 shows the gain of Transposition Tables in combination with Iterative Deepening. Ply 1 2 3 4 5 Without TT & without ID 10,344 264,264 5,994,388 215,558,850 5,408,948,866 With TT & with ID 10,344 227,938 5,001,828 134,654,093 3,375,802,619 Gain 0.00% 13.75% 16.56% 37.53% 37.59%

Table 6.3: Node reduction by a Transposition Table The performance is already noticeable on a 2-ply search. This can be explained by the advantage of move ordering in combination with Iterative Deepening. The move ordering at the second iteration uses the information from the rst iteration and is already able to prune 13.75%. By looking at the trend of the gain we can assume that the performance of the Transposition Table improves even more when searching deeper than 5 ply.

6.2.4

History Heuristic and Transposition Table

In the next experiment the Transposition Table and the History Heuristic are combined. These techniques combine their strength to perform better. The combination of Transposition Tables, the History Heuristic and Iterative Deepening is able to reduce the number of nodes by 74.06% as shown in Table 6.4. Ply Without TT & without HH & without ID 10,344 264,264 5,994,388 215,558,850 5,408,948,866 With TT & with HH & with ID 10,344 224,387 4,123,300 87,384,008 1,403,260,491 Gain

1 2 3 4 5

0.00% 15.09% 31.21% 59.46% 74.06%

Table 6.4: Node reduction by HH & TT Table 6.5 shows the strength of Iterative Deepening. A 5-ply search with Iterative Deepening improves the node reduction by 32.65% when compared without Iterative Deepening. Iterative Deepening successfully uses information from lower depth searches to improve the move ordering and hence reduces the number of nodes. The eectiveness of Iterative Deepening with move ordering enhancements increases every ply. Again we may assume that the gain increases further when the search depth grows.

6.3

Pruning in Chance Nodes

Section 6.3.1 looks at the result of the Star1 algorithm. In Section 6.3.2 the result of Star2 with and without Star1 are shown. The results of the StarETC algorithm is shown in Section 6.3.3 The experiments with *-minimax make use of Iterative Deepening, the History Heuristic and the Transposition Table.

42 Ply With TT & with HH & without ID 10,344 262,796 4,658,616 100,656,880 2,083,669,650 With TT & with HH & with ID 10,344 224,387 4,123,300 87,384,008 1,403,260,491 Gain

Experimental Results

1 2 3 4 5

0.00% 14.62% 11.49% 13.19% 32.65%

Table 6.5: Node reduction using HH & TT with Iterative Deepening and without Iterative Deepening

6.3.1

Star1

The Star1 algorithm is able to backward prune chance nodes using an - window and information about the theoretical lower and upper bound of the evaluation function. The Star1 algorithm is able to prune 32.54% at the rst ply as can be seen in Table 6.6. This can be explained by the fact that a chance node is not a counted as a ply but the children of the chance node are. Therefore the Star1 algorithm is able to prune at a 1-ply search. On a 5-ply search Star1 reduces the node count by 90.13%. Ply 1 2 3 4 5 Default 10,344 264,264 5,994,388 215,558,850 5,408,948,866 With Star1 6,978 188,423 1,607,955 61,256,939 533,925,113 Gain 32.54% 28.70% 73.18% 71.58% 90.13%

Table 6.6: Node reduction by Star1

6.3.2

Star2

Star2 probes 1 or more children of a node to tighten the bounds of a chance node. This also helps Star1 to prune. With a good move ordering Star2 can be quite eective. Table 6.7 presents the results of the Star2 algorithm with a probe factor of 1. At depth 2 and 3 Star2 has no signicant eect on the search. The reason for this result is that Star2 probes children of nodes which do not give good information for Star2 to adjust the bounds and hence only increase the number of nodes. However, on a 5-ply search Star2 is able to prune 93.56%. Ply 1 2 3 4 5 Default 10,344 264,264 5,994,388 215,558,850 5,408,948,866 With Star2 & with Star1 6,978 190,389 1,650,350 36,888,491 348,235,823 Gain 32.54% 27.95% 72.47% 82.89% 93.56%

Table 6.7: Node reduction using Star2 with a probing factor of 1 In Table 6.8 the Star1 and Star2 algorithm are compared. For a 1-ply search Star2 is not useable. Therefore Star2 does not provide any gain compared to Star1. The results show the overhead of Star2 on a shallow search. It also shows the gain compared to Star1 on a deeper search. The gain increases to 39.78% on a 4-ply search but drops to 34.78. A 6 ply search is likely to improve the performance of Star2 again. The Star2 algorithm is able to probe a dierent number of children in every node. By default Star2 probes 1 child per chance node. Depending on the quality of the move ordering, the gain increases or decreases by incrementing the probe factor.

6.3 Pruning in Chance Nodes Ply 1 2 3 4 5 Without Star2 & with Star1 6,978 188,423 1,607,955 61,256,939 533,925,113 With Star2 & with Star1 6,978 190,389 1,650,350 36,888,491 348,235,823 Gain 0.00% -1.04% -2.64% 39.78% 34.78%

43

Table 6.8: Node reduction by Star1 and Star2 with a probe factor of 1 In Table 6.9 we compare dierent probe factors. The rst probe factor of 0 equals the Star1 algorithm and has therefore no gain. The values 1 and 2 have a positive gain when compared to Star1, 34.78% and 16.44% respectively. Higher values for the probe factor increase the node count up to 35.59%. A probe factor of 1 has the best reduction of nodes, this means that the move ordering (the History Heuristic and Transposition Table) performs well in combination with Star2. Probe factor (Star1) 0 1 2 3 4 5 Nodes 533,925,113 348,235,823 446,157,703 543,629,215 628,570,608 723,955,888 Gain 34.78% 16.44% -1.82% -17.73% -35.59%

Table 6.9: Node reduction using dierent probe factor values for Star2

6.3.3

StarETC

The StarETC algorithm makes use of information from chance nodes. StarETC stores the lower and upper bounds computed by the Star1 and Star2 algorithms, as similar as the Transposition Table. StarETC is able to reduce the number of nodes by 93.52% as can be seen in Table 6.10. Ply 1 2 3 4 5 Default 10,344 264,264 5,994,388 215,558,850 5,408,948,866 With StarETC 6,978 190,389 1,649,015 36,796,717 350,754,377 Gain 32.54% 27.95% 72.49% 82.93% 93.52%

Table 6.10: Node reduction using StarETC It performs almost identical like Star2 as can be seen in Table 6.11. The relative gain of StarETC compared to Star2 is almost 0. On a 5-ply search there seems to be a little additional overhead when compared to Star2. The StarETC is likely to perform better than Star2 on a deeper search.

Overview
The results of the node reduction experiments are summarized in Table 6.12. The table compares the techniques on a 5-ply search. Each technique makes use of Iterative Deepening. Every new experiment is extended with the previous mentioned techniques. The gain of the mean can be found in the tables previously mentioned. Worth noting is the fact that each technique improves the worst-case. The Star1 and Star2 improve the node reduction compared to - pruning with the History Heuristic and Transposition Table with an additional 75.18%.

44 Ply 1 2 3 4 5 Star2 6,978 190,389 1,650,350 36,888,491 348,235,823 StarETC 6,978 190,389 1,649,015 36,796,717 350,754,377 Gain 0.00% 0.00% 0.08% 0.25% -0.72%

Experimental Results

Table 6.11: Node reduction using StarETC Total 5,630,776,712 2,409,129,708 1,403,260,491 533,925,113 348,235,823 350,754,377 Mean 18,769,256 8,030,432 4,677,535 1,779,750 1,160,786 1,169,181 Median 2,243,814.5 716,116.0 573,530.5 294,901.0 230,650.5 231,281.0 Std. Dev. 53,356,557.85 24,486,486.38 14,574,204.75 5,144,435.40 3,640,867.21 3,670,205.21 Min 34 34 34 31 31 31 Max 559,492,043 229,001,460 127,506,016 45,979,029 34,938,971 33,607,267

Alpha-Beta History Heuristic Transposition Table Star1 Star2 StarETC

Table 6.12: Number of nodes searched on 5-ply search

6.4

Evaluation Function Features

This section looks at the performance of the features as presented in Section 5.4. First, the node reduction is compared and second, the strength of the evaluation features is compared to the default evaluation function. The search function is extend with all previous mentioned techniques: Iterative Deepening, History Heuristic, Transposition Table, Star1, Star2 and StarETC. In self play the program will play 121 games of Stratego. Each program has a search time of one second to compute a move. There are 11 predened board setups, which can be found in Appendix B. Every game starts with a unique combination of two setups (therefore results in a total of 112 = 121 games). The game ends by the game rules or after 1,200 plies. First, we compare the node reduction of the dierent features compared to the default version in Table 6.13. It shows that the third feature can reduce the nodes visited by the current evaluation function with 7.84%. The fth feature decreases the gain by 7.92%. The other features have no signicant increase or decrease impact on the performance. Extension None 1st feature 2nd feature 3rd feature 4th feature 5th feature 6th feature Nodes 350,754,377 344,586,412 351,781,359 323,241,276 350,270,127 378,522,770 344,157,660 Gain 1.76% -0.29% 7.84% 0.14% -7.92% 1.88%

Table 6.13: Node reduction using evaluation function features Second, we take a look at the performance of these features in self play. For evaluating the features, every feature has been tested individually against a program without any features. The results are shown in Table 6.14. The column Score percentage is the sum of the win percentage and half of the draw percentage. The features do not provide much improvement as compared to the default evaluation function. The second feature, where the weight of the Miners increases as their number on the board decreases, has the best performance with a 44.44% win rate and a 33.33 % loss rate. With the current number of tests it is not possible to state that this feature is signicant better than the default evaluation function.

6.5 Multi-cut Extension 1st feature 2nd feature 3rd feature 4th feature 5th feature 6th feature Win 38.89 % 44.44 % 36.11 % 27.78 % 25.00 % 25.00 % Loss 47.22% 33.33% 47.22% 52.78% 38.89% 52.78% Draw 13.89% 22.22% 16.67% 19.44% 36.11% 22.22% Score percentage 45.84% 55.55% 44.45% 37.50% 43.06% 36.11%

45

Table 6.14: Comparison of dierent evaluation function features

6.5

Multi-cut

In this section the program plays 121 games similar as described in Section 6.4. Again the search time has been set to one second. In this experiment the search function, with all pruning techniques active and without any function features, is extended using Multi-Cut. M is the maximum number of moves that Multi-Cut considers, C is the number of moves that result in a cut-o before Multi-Cut prunes the complete subtree. R is the search depth reduction. M 8 8 8 10 10 10 12 12 12 8 10 12 C 2 3 4 2 3 4 2 3 4 3 3 3 R 2 2 2 2 2 2 2 2 2 3 3 3 Wins 38.84% 45.45% 36.36% 42.15% 45.45% 37.19% 40.50% 41.32% 41.32% 42.98% 37.19% 41.32% Loss 42.15% 40.15% 42.98% 35.54% 39.67% 42.98% 39.67% 37.19% 41.32% 40.40% 42.98% 40.50% Draw 19.01% 14.05% 20.66% 22.31% 14.88% 19.83% 19.83% 21.49% 17.36% 16.53% 19.83% 18.18% Score percentage 48.36% 52.48% 46.69% 53.31% 52.89% 47.11% 50.42% 52.07% 50.00% 51.25% 47.11% 50.41%

Table 6.15: Self play results using Multi-Cut In Table 6.5 the results are shown. The column Score percentage is the sum of the win percentage and half of the draw percentage. Multi-Cut shows some improvement over the default search method, but not signicantly enough to state that Multi-Cut is an essential improvement. The best performing parameters (M,C,R) are (8,3,2 ),(10,2,2 ),(10,3,2 ),(12,3,2 ).

46

Experimental Results

Chapter 7

Conclusions
In this chapter the Research Questions and the Problem Statement, as stated in Chapter 1 are answered based upon the research done in this thesis. Section 7.1 gives answers to the Research Questions, Section 7.2 and future research ideas are given in Section 7.3

7.1

Answering the Research Questions

The rst Research Question that was stated is: Research Question 1. What is the complexity of Stratego? The complexity of the state-space and the game-tree complexity of Stratego have been calculated in Chapter 3. First the complexity of the board setup is 1023 . Second, the calculated upperbound of the state-space complexity is 10115 and the game-tree complexity equals 10534 . The state-space complexity is an upperbound because some positions are calculated that would never be reached in practice. The game-tree complexity is calculated based upon human played Stratego games. The complexity of Stratego is one of the highest complexities in board games (See Table 3.1). Based upon these numbers it is not likely that Stratego will be solved in the near future and that a search-based approach with an evaluation function can be used. The second Research Question was: Research Question 2. To what extent can *-minimax techniques be utilized in order to realize competitive game play in Stratego? Expectimax with the history heuristic, - and transposition tables is able to prune 74.06%. Additionally, Star1 and Star2 give a node reduction of 75.18%. The *-minimax algorithms get their advantage from pruning in chance nodes. Chapter 3 shows that Stratego has a relative low number of chance nodes. However an improvement in performance is quite drastic. The performance gain gives the articial player a possibility to search deeper than an computer player without a *-minimax algorithm.

7.2

Answering the Problem Statement

After answering our Research Questions it is possible to answer the Problem Statement: Problem Statement. How can we develop informed-search methods in such a way that programs signicantly improve their performance in Stratego? The answer to Research Question 1 shows that Stratego is a complex game. Therefore, it is not possible to solve the game. The program will have to use informed-search techniques to be able to play Stratego.

48

Conclusions

The answer to Research Question 2 shows that informed-search methods decrease the number of nodes that need to be visited compared to expectimax. This allows the player to search deeper in a given time frame and thus increase his potential to better evaluate search branches. However because of the relative high complexity, intermediate or even expert level game play is hard to realize.

7.3

Future Research

Most research in this thesis is based upon well-known techniques such as, expectimax and - pruning. Although the *-minimax algorithms have been developed 20 years ago there is still little research available on this topic. More research in dierent areas is needed to increase the further understanding of these algorithms. Perhaps it is possible to develop new techniques based upon the current research. There is little research in the eld of Stratego itself, the current papers available do not provide enough scientic insight into Stratego. Especially Stratego is a very complex game where humans still have the upper hand in playing. This gives insight into the fact that humans play games very dierently than articial players. For future research one could think of a dierent approach of search. Moreover more research could be done in the evaluation function for Stratego. Not only nding better weights and features but perhaps one could take into account a short-term plan or the structure in the game.

References
Allis, L.V. (1988). A knowledge-based approach to connect-four. The game is solved: White wins. M.Sc. thesis, Vrije Universiteit Amsterdam, The Netherlands. [1] Allis, L.V. (1994). Searching for Solutions in Games and Articial Intelligence. Ph.D. thesis, Rijksuniversiteit Limburg, The Netherlands. [2, 9] Ballard, B. W. (1983). The *-minimax search procedure for trees containing chance nodes. Articial Intelligence, Vol. 21, No. 3, pp. 327350. Ballard, B. [2, 13, 19, 23, 28, 39] Bjrnsson, Y and Marsland, T.A. (2001). Multi-cut alpha-beta-pruning in game-tree search. Theoretical o Computer Science, Vol. 252, Nos. 12, pp. 177196. [32, 33] Boer, V. de (2007). Invicible. A Stratego Bot. M.Sc. thesis, Technische Universiteit Delft, The Netherlands. [vii, ix, 2, 35, 36] Breuker, D.M. (1998). Memory versus search in games. Ph.D. thesis, Universiteit Maastricht, The Netherlands. [26, 32] Breuker, D.M., Uiterwijk, J.W.H.M., and Herik, H.J. vd (1994). Replacement Schemes for Transposition Tables. ICCA Journal, Vol. 17, pp. 183193. [28] Ciancarini, P. and Favini, G. P. (2007). A Program to Play Kriegspiel. ICGA Journal, Vol. 30, No. 1, pp. 324. [2] Gravon (2009). Project Gravon. http://www.gravon.de. [10] Greenblatt, D., Eastlake, D.E., and Crocker, S.D. (1967). The Greenblatt Chess Program. Chess Skills in Man and Machine, pp. 82118, ACM, New York, NY, USA. [26] Hasbro (2002). Stratego Rules. [5] Hauk, T. G. (2004). Search in Trees with Chance Nodes. M.Sc. thesis, University of Alberta, Canada. [18, 24] Hauk, T., Buro, M., and Schaeer, J. (2004). Rediscovering *-Minimax Search. Computers and Games (eds. H. J. van den Herik, Y. Bjrnsson, and N. S. Netanyahu), Vol. 3846 of Lecture Notes o in Computer Science, pp. 3550, Springer. [2] Herik, H. J. van den, Uiterwijk, J. W. H. M., and Rijswijck, J. van (2002). Games solved: Now and in the future. Articial Intelligence, Vol. 134, Nos. 12, pp. 277311. ISSN 00043702. [1, 2, 10] Hsu, F.-H. (2002). Behind Deep Blue: Building the Computer that Defeated the World Chess Champion. Princeton University Press, Princeton, NJ, USA. ISBN 0691090653. [1] International Stratego Federation (2009). Stratego Game Rules. http://www.isfstratego.com/images/ isfgamerules.pdf. [6] Ismail, M. (2004). Multi-agent stratego. B.Sc thesis, Rotterdam University, The Netherlands. [2] Knuth, D.E. and Moore, R.W. (1975). An Analysis of Alpha Beta Pruning. Articial Intelligence, Vol. 6, No. 4, pp. 293326. [15, 17]

50

References

Michie, D. (1966). Game-Playing and Game-Learning Automata. Advances in Programming and NonNumerical Computation, pp. 183200. [2, 17] Mohnen, J. (2009). Using Domain-Dependent Knowledge in Stratego. B.Sc thesis, Maastricht University, The Netherlands. [2] Neumann, J. von and Morgenstern, O. (1944). Theory of Games and Economic Behavorial Research. Princeton University Press, Princeton, New Yersey, USA, rst edition. [13] Russell, S. and Norvig, P. (1995). Articial Intelligence: A Modern Approach. Prentice-Hall, Englewood Clis, NJ. [14, 15] Schadd, M. P. D. and Winands, M. H. M. (2009). Quiescence Search for Stratego. BNAIC 2009 (eds. T. Calders, K. Tuyls, and M. Pechenizkiy), pp. 225232, Eindhoven, The Netherlands. [2] Schadd, M.P.D., Winands, M.H.M., Uiterwijk, J.W.H.M., Herik, H.J. van den, and Bergsma, M.H.J. (2008). Best Play in Fanorona leads to Draw. New Mathematics and Natural Computation, Vol. 4, No. 3, pp. 369387. [1] Schadd, M. P. D., Winands, M. H. M., and Uiterwijk, J. W. H. M. (2009). ChanceProbCut: Forward Pruning in Chance Nodes. IEEE Symposium on Computational Intelligence and Games (CIG 2009) (ed. P. L. Lanzi), pp. 178185. [2] Schaeer, J. (1989). The History Heuristic and Alpha-Beta Search Enhancements in Practice. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 11, No. 11, pp. 12031212. ISSN 01628828. [29, 32] Schaeer, J. (2007). Checkers Is Solved. Science, Vol. 317, No. 5844, pp. 15181522. [1] Shannon, C. E. (1950). Programming a computer for playing chess. Philosophical Magazine, Vol. 41, No. 1, pp. 256275. [1] Slate, J.D. and Atkin, L.R. (1977). CHESS 4.5: The Northwestern University Chess program. Chess Skill in Man and Machine (ed. P.W. Frey), pp. 82118, Springer-Verlag, New York, USA. [26] Stankiewicz, J.A. (2009). Opponent Modeling in Stratego. B.Sc thesis, Maastricht University, The Netherlands. [2] Stankiewicz, J.A. and Schadd, M. P. D. (2009). Opponent Modeling in Stratego. BNAIC 2009 (eds. T. Calders, K. Tuyls, and M. Pechenizkiy), pp. 233240, Eindhoven, The Netherlands. [2] Steng K. (2006). Utveckling av minimax-baserad agent fr strategispelet Stratego (in Swedish). M.Sc. ard, o thesis, Lund University, Sweden. [2, 38] Treijtel, C. (2000). Multi-Agent Stratego. M.Sc. thesis, Delft University of Technology, The Netherlands. [2] Turing, A. M. (1953). Chess. Faster than Thought: A Symposium on Digital Computing Machines (ed. B. V. Bowden), pp. 286295. Pitman, London, UK. [1] United States Court, District of Oregon (2005). Case No. 04-1344-KI, Estate of Gunter Sigmund Elkan vs. Hasbro Inc. [1] USA Stratego Federation (2009). Computer Stratego World Championship. http://www.strategousa.org. [1, 2] Veness, J. (2006). Expectimax Enhancements for Stochastic Game Players. M.Sc. thesis, University of New South Wales, Australia. [28, 39] Winands, M. H. M., Herik, H. J. van den, Uiterwijk, J. W. H. M., and Werf, E. C. D. van der (2005). Enhanced forward pruning. Information Science, Vol. 175, No. 4, pp. 315329. [33] Winands, M.H.M., Werf, E.C.D. van der, Herik, H.J. van den, and Uiterwijk, J.W.H.M. (2006). The Relative History Heuristic. Computers and Games, Lecture Notes in Computer Science, pp. 262272. [32]

References

51

Xia, Z., Zhu, Y., and Lu, H. (2007). Using the Loopy Belief Propagation in Siguo. ICGA Journal, Vol. 30, No. 4, pp. 209220. [2] Zobrist, A.L. (1970). A New Hashing Method with Application for Game Playing. Technical report, Computer Science Department, University of Wisconsin, Madison, WI, USA. [26]

Appendix A

Board Setups
The board set-ups are used in the self play experiments of 6.

10 9 8 7 6 5 4 8 3 3 2 B 1 F
1 6 2 B ~ ~ ~ ~ 4 S 8 7 3 5 B 4 B 4 8 7 0 6 B 8 ~ ~ ~ ~ 8 7 5 8 6 B 7 3 2 5 6 8 8 4 5 7

a b c d e f g h j Figure A.1: Board set-up #1

10 9 8 7 6 5 4 0 3 6 2 8 1 7
3 4 8 4 ~ ~ ~ ~ 7 B 8 B 2 5 B 6 8 8 3 B 6 2 B F ~ ~ ~ ~ 4 5 4 B 5 7 S 7 6 8 8 5 1 3 8 7

a b c d e f g h j Figure A.2: Board set-up #2

54

Appendix A: Board Setups

10 9 8 7 6 5 4 1 3 B 2 3 1 F
2 S B 4 ~ ~ ~ ~ 7 4 8 B 3 5 6 7 8 B 6 3 8 2 8 7 ~ ~ ~ ~ 4 6 B 5 7 B 6 5 8 8 4 8 0 5 8 7

a b c d e f g h j Figure A.3: Board set-up #3

10 9 8 7 6 5 4 8 3 7 2 6 1 4
6 8 B 5 ~ ~ ~ ~ 8 S 7 3 5 0 4 B 8 7 B F 6 8 7 B ~ ~ ~ ~ 2 B 6 3 4 1 B 5 8 5 8 2 8 7 4 3

a b c d e f g h j Figure A.4: Board set-up #4

10 9 8 7 6 5 4 6 3 8 2 2 1 4
B 5 3 B ~ ~ ~ ~ 8 6 B F 7 8 7 B B 4 3 8 5 6 0 7 ~ ~ ~ ~ 8 4 2 5 7 8 3 8 B 1 7 8 5 S 6 4

a b c d e f g h j Figure A.5: Board set-up #5

Appendix A: Board Setups

55

10 9 8 7 6 5 4 1 3 0 2 S 1 5
2 6 B 8 ~ ~ ~ ~ 7 5 B 6 4 3 7 8 8 4 8 8 B 7 4 6 ~ ~ ~ ~ 3 B 2 5 7 F 3 8 B 5 7 6 8 4 B 8

a b c d e f g h j Figure A.6: Board set-up #6

10 9 8 7 6 5 4 6 3 B 2 F 1 2
4 B B B ~ ~ ~ ~ 2 6 1 6 6 7 5 8 7 4 8 3 3 B 5 8 ~ ~ ~ ~ 5 7 4 7 4 3 8 B 8 7 8 8 0 S 8 5

a b c d e f g h j Figure A.7: Board set-up #7

10 9 8 7 6 5 4 2 3 B 2 7 1 5
7 8 6 8 ~ ~ ~ ~ 4 3 B 6 8 B F 7 3 5 B 4 8 5 8 7 ~ ~ ~ ~ 4 B 0 6 5 3 2 8 B 8 6 4 7 S 1 8

a b c d e f g h j Figure A.8: Board set-up #8

56

Appendix A: Board Setups

10 9 8 7 6 5 4 7 3 4 2 6 1 8
8 3 B 6 ~ ~ ~ ~ 2 8 5 7 4 0 4 6 6 3 7 5 8 B 8 B ~ ~ ~ ~ 7 5 2 8 7 B F 8 1 B B S 8 3 4 5

a b c d e f g h j Figure A.9: Board set-up #9

10 9 8 7 6 5 4 5 3 6 2 B 1 F
8 4 3 B ~ ~ ~ ~ 1 B 8 7 5 0 7 4 8 4 6 8 8 3 2 7 ~ ~ ~ ~ 5 7 S 7 8 B 5 6 4 2 B 8 3 B 6 8

a b c d e f g h j k Figure A.10: Board set-up #10

10 9 8 7 6 5 4 8 3 2 2 7 1 8
1 4 5 8 ~ ~ ~ ~ S 7 5 7 3 8 5 3 8 0 6 8 B 4 6 7 ~ ~ ~ ~ 6 2 B 4 6 B F B B 3 B 7 8 4 5 8

a b c d e f g h j k Figure A.11: Board set-up #11

You might also like