Reinforcement_Learning_Basics_and_Beyond
Reinforcement_Learning_Basics_and_Beyond
The agent's objective is to learn a policy (a mapping from states to actions) that
maximizes the expected cumulative reward, also known as the return.
Types of RL Algorithms:
1. Model-Free vs. Model-Based:
- Model-Free: Learns directly from interactions (e.g., Q-Learning, DQN).
- Model-Based: Learns a model of the environment and plans accordingly.
2. Value-Based Methods:
- Estimate the value of actions in given states (e.g., Q-Learning).
- Deep Q-Networks (DQN) combine Q-Learning with deep neural networks.
3. Policy-Based Methods:
- Directly optimize the policy (e.g., REINFORCE, PPO).
- Suitable for high-dimensional action spaces.
4. Actor-Critic Methods:
- Combine value and policy-based approaches.
- Use an actor (policy) and a critic (value function) to stabilize learning.
Challenges in RL:
- Exploration vs. exploitation dilemma
- Sample inefficiency
- Credit assignment over long time horizons
- Safety and ethical concerns in real-world deployment