Bandit_Algorithms_in_Hyperparameter_Tuning
Bandit_Algorithms_in_Hyperparameter_Tuning
---------------------------------------
A decision-making framework where a gambler must choose among multiple slot machines ("arms"),
each with an unknown probability of reward. The goal is to maximize the total reward over time by
balancing:
-------------------------------
In machine learning, each "arm" is a hyperparameter configuration, and the reward is the
performance (e.g., accuracy, loss). Bandit-based methods help find good configurations efficiently.
Examples:
- Successive Halving: Evaluates many configurations with few resources, drops poor performers
early.
balance.
Used in:
- Ray Tune
- Optuna
- Ax