ML | Monte Carlo Tree Search (MCTS)
Last Updated :
01 Aug, 2025
Monte Carlo Tree Search (MCTS) is a algorithm designed for problems with extremely large decision spaces, like the game Go with its 10^{170}possible board states. Instead of exploring all moves, MCTS incrementally builds a search tree using random simulations (rollouts) to guide its decisions. It balances exploration of new possibilities and usage of known promising paths, effectively focusing computational effort where it matters most, making it highly efficient for complex decision-making tasks.
Consider a chess player deciding their next move, they could either pursue a line of play they know is good (exploitation) or investigate a new superior strategy (exploration). MCTS formalizes this decision-making process through statistical sampling and tree construction.
The Four-Phase Algorithm
MCTS consists of four distinct phases that repeat iteratively until a computational budget is exhausted:
MCTS stepsSelection Phase: Starting from the root node, the algorithm traverses down the tree using a selection policy. The most common approach employs the Upper Confidence Bounds applied to Trees (UCT) formula, which balances exploration and exploitation by selecting child nodes based on both their average reward and uncertainty.
Expansion Phase: When the selection phase reaches a leaf node that isn't terminal, the algorithm expands the tree by adding one or more child nodes representing possible actions from that state.
Simulation Phase: From the newly added node, a random playout is performed until reaching a terminal state. During this phase, moves are chosen randomly or using simple heuristics, making the simulation computationally inexpensive.
Backpropagation Phase: The result of the simulation is propagated back up the tree to the root, updating statistics (visit counts and win rates) for all nodes visited during the selection phase.
The selection phase relies on the UCB1 (Upper Confidence Bound) formula to determine which child node to visit next:
\text{UCB1}(i) = \bar{X}_i + c \sqrt{\frac{\ln(N)}{n_i}}
Where:
- \bar{X}_i is the average reward of node i
- c is the exploration parameter (typically √2)
- N is the total number of visits to the parent node
- n_i is the number of visits to node i
The first term encourages exploitation of nodes with high average rewards, while the second term promotes exploration of less-visited nodes. The logarithmic factor ensures that exploration decreases over time as confidence in the estimates increases.
Python Implementation
Here's a comprehensive implementation of MCTS for a simple game like Tic-Tac-Toe:
1. Importing Libraries
We will start by importing required libraries:
- math : to perform mathematical operations like logarithms and square roots for UCB1 calculations.
- random : to randomly pick moves during simulations (rollouts).
Python
import math
import random
2. MCTS Node Class
We create a MCTSNode class to represent each node (game state) in the search tree. This class contains methods for:
- __init__(): Initializes board state, parent node, move taken, children, visits, wins and untried moves.
- get_actions(): Returns a list of all empty cells as possible moves.
- is_terminal(): Checks if the game is over (winner or no moves left).
- is_fully_expanded(): Checks if all possible moves have been explored.
- check_winner(): Determines if any player has won the game.
Python
class MCTSNode:
def __init__(self, state, parent=None, action=None):
self.state = state # Current board
self.parent = parent # Parent node
self.action = action # Move leading to this node
self.children = [] # List of children
self.visits = 0 # Visit count
self.wins = 0 # Win count
self.untried_actions = self.get_actions() # Available moves
def get_actions(self):
"""Return all empty cells."""
return [(i, j) for i in range(3) for j in range(3) if self.state[i][j] == 0]
def is_terminal(self):
"""Check if the game has ended."""
return self.check_winner() is not None or not self.get_actions()
def is_fully_expanded(self):
return len(self.untried_actions) == 0
def check_winner(self):
"""Find winner (1 or 2) or None."""
for i in range(3):
if self.state[i][0] == self.state[i][1] == self.state[i][2] != 0:
return self.state[i][0]
if self.state[0][i] == self.state[1][i] == self.state[2][i] != 0:
return self.state[0][i]
if self.state[0][0] == self.state[1][1] == self.state[2][2] != 0:
return self.state[0][0]
if self.state[0][2] == self.state[1][1] == self.state[2][0] != 0:
return self.state[0][2]
return None
3. Expansion, Selection, Rollout and Backpropagation
We now define methods that enable the core MCTS operations:
- expand() : Adds a new child node for an untried move.
- best_child() : Selects the most promising child using the UCB1 formula, balancing exploration and exploitation.
- rollout() : Plays random moves from the current state until the game ends, simulating the outcome.
- backpropagate() : Updates the node's statistics (wins and visits) and propagates them back up to the root.
Python
def expand(self):
"""Add one of the remaining actions as a child."""
action = self.untried_actions.pop()
new_state = [row[:] for row in self.state]
player = self.get_current_player()
new_state[action[0]][action[1]] = player
child = MCTSNode(new_state, parent=self, action=action)
self.children.append(child)
return child
def get_current_player(self):
"""Find whose turn it is."""
x_count = sum(row.count(1) for row in self.state)
o_count = sum(row.count(2) for row in self.state)
return 1 if x_count == o_count else 2
def best_child(self, c=1.4):
"""Select child with best UCB1 score."""
return max(self.children, key=lambda child:
(child.wins / child.visits) +
c * math.sqrt(math.log(self.visits) / child.visits))
def rollout(self):
"""Play random moves until the game ends."""
state = [row[:] for row in self.state]
player = self.get_current_player()
while True:
winner = self.check_winner_for_state(state)
if winner: return 1 if winner == 1 else 0
actions = [(i, j) for i in range(3) for j in range(3) if state[i][j] == 0]
if not actions: return 0.5 # Draw
move = random.choice(actions)
state[move[0]][move[1]] = player
player = 1 if player == 2 else 2
def check_winner_for_state(self, state):
"""Same winner check for rollout."""
return MCTSNode(state).check_winner()
def backpropagate(self, result):
"""Update stats up the tree."""
self.visits += 1
self.wins += result
if self.parent:
self.parent.backpropagate(result)
4. Implementing the MCTS Search
Now we implement the mcts_search() function, which performs:
- Selection : choose a promising node.
- Expansion : add new nodes for unexplored moves.
- Simulation (Rollout) : play random games.
- Backpropagation : update nodes with results.
Python
def mcts_search(root_state, iterations=500):
root = MCTSNode(root_state)
for _ in range(iterations):
node = root
# Selection
while not node.is_terminal() and node.is_fully_expanded():
node = node.best_child()
# Expansion
if not node.is_terminal():
node = node.expand()
# Simulation
result = node.rollout()
# Backpropagation
node.backpropagate(result)
return root.best_child(c=0).action # Return best move
5. Play the Tic-Tac-Toe Game
We define the play_game() function, where:
- Player 1 (MCTS) chooses the best move using MCTS.
- Player 2 plays randomly for demonstration purposes.
Python
def play_game():
board = [[0]*3 for _ in range(3)]
current_player = 1
print("MCTS Tic-Tac-Toe Demo")
print("0 = empty, 1 = X, 2 = O\n")
for turn in range(9):
for row in board: print(row)
print()
if current_player == 1:
move = mcts_search(board, iterations=500)
print(f"MCTS plays: {move}")
else:
empty = [(i, j) for i in range(3) for j in range(3) if board[i][j] == 0]
move = random.choice(empty)
print(f"Random plays: {move}")
board[move[0]][move[1]] = current_player
if MCTSNode(board).check_winner():
for row in board: print(row)
print(f"Player {current_player} wins!")
return
current_player = 1 if current_player == 2 else 2
print("Draw!")
6. Run the game
Python
Output:
Sample run outputWhen running the above implementation, MCTS demonstrates strong performance even against optimal play in Tic-Tac-Toe. With 1000 iterations per move, the algorithm can identify winning opportunities and avoid losing positions effectively. The quality of play improves significantly as the number of iterations increases.
AlphaGo, which uses MCTS combined with neural networks, achieved superhuman performance in Go by performing millions of simulations per move. Monte Carlo's strength lies in its ability to focus computational resources on the most promising areas of the search space.
Practical Applications Beyond Games
MCTS has found applications in numerous domains outside of game playing:
1. Planning and Scheduling: The algorithm can optimize resource allocation and task scheduling in complex systems where traditional optimization methods struggle.
2. Neural Architecture Search: MCTS guides the exploration of neural network architectures, helping to discover optimal designs for specific tasks.
3. Portfolio Management: Financial applications use MCTS for portfolio optimization under uncertainty, where the algorithm balances risk and return through simulated market scenarios.
Limitations and Edge Cases
1. Sample Efficiency: The algorithm requires a lot of simulations to achieve reliable estimates, particularly in complex domains. This can be computationally expensive when quick decisions are needed.
2. High Variance: Random simulations can produce inconsistent results, especially in games with high variance outcomes. Techniques like progressive widening and RAVE (Rapid Action Value Estimation) help mitigate this issue.
3. Tactical Blindness: MCTS may miss short-term tactical opportunities due to its reliance on random playouts. In chess, for example, the algorithm might overlook a forced checkmate sequence if the simulations fail to explore the variations.
4. Exploration-Exploitation Balance: The UCB1 formula requires careful tuning of the exploration constant. Too much exploration leads to inefficient search, while too little can cause the algorithm to get trapped in local optima.
Similar Reads
Machine Learning Tutorial Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.Do you
5 min read
Introduction to Machine Learning
Python for Machine Learning
Machine Learning with Python TutorialPython language is widely used in Machine Learning because it provides libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and Keras. These libraries offer tools and functions essential for data manipulation, analysis, and building machine learning models. It is well-known for its readability an
5 min read
Pandas TutorialPandas (stands for Python Data Analysis) is an open-source software library designed for data manipulation and analysis. Revolves around two primary Data structures: Series (1D) and DataFrame (2D)Built on top of NumPy, efficiently manages large datasets, offering tools for data cleaning, transformat
6 min read
NumPy Tutorial - Python LibraryNumPy is a core Python library for numerical computing, built for handling large arrays and matrices efficiently.ndarray object â Stores homogeneous data in n-dimensional arrays for fast processing.Vectorized operations â Perform element-wise calculations without explicit loops.Broadcasting â Apply
3 min read
Scikit Learn TutorialScikit-learn (also known as sklearn) is a widely-used open-source Python library for machine learning. It builds on other scientific libraries like NumPy, SciPy and Matplotlib to provide efficient tools for predictive data analysis and data mining.It offers a consistent and simple interface for a ra
3 min read
ML | Data Preprocessing in PythonData preprocessing is a important step in the data science transforming raw data into a clean structured format for analysis. It involves tasks like handling missing values, normalizing data and encoding variables. Mastering preprocessing in Python ensures reliable insights for accurate predictions
6 min read
EDA - Exploratory Data Analysis in PythonExploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
6 min read
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advance Machine Learning Technique
Machine Learning Practice