Lect.2
Lect.2
References:
1-Reinforcement Learning : An Introduction by Sutton & Barto
2-Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions by Warren B. Powell
Grades
▪ Final exam : 50%
▪ Midterm exam : 20 %
▪ Project + Section + Quizzes : 30%
▪ TOTAL : 100%
Introduction to Reinforcement Learning (RL)
▪ Overview of reinforcement learning
✓ Machine learning and main problems solved by it
✓ Types of machine learning
✓ Reinforcement learning
✓ Reinforcement learning vs other types of machine learning
✓ Challenges of Reinforcement learning
✓ Applications of reinforcement learning
▪ Components of reinforcement learning: agents, environments, rewards
▪ Markov decision processes (MDPs)
Reinforcement Learning
▪ Goal oriented learning by interacting with an environment.
▪ It is used for interactive problems where it is often impractical to obtain examples of desired behavior that are both
correct and representative of all the situations in which the agent must act.
▪ It is also, an area of machine learning concerned with how agents take actions in an environment to maximize
cumulative reward.
▪ The agent must be able to sense the state of the environment to some extent and must be able to take actions that
affect the state.
▪ The agent also must have a goal or goals relating to the state of the environment.
▪ The formulation of reinforcement learning is intended to include just these three aspects sensation, action, and goal
in their simplest possible forms without trivializing any of them.
▪ Reinforcement learning problems involve learning the following:
✓ What to do
✓ How to map situations to actions to maximize a numerical reward signal.
▪ The reinforcement signal is provided by the environment and used to evaluate the action (scalar signal) rather than
telling the learning system how to perform correct actions.
▪ The learner is not told which actions to take, as in many forms of machine learning, but instead must discover which
actions yield the most reward by trying them out.
Reinforcement Learning (Cont.)
▪ Features of reinforcement learning problems.
✓ Closed-loop problems in an essential way, because the learning system’s actions influence its later inputs.
✓ Not having direct instructions as what actions to take.
✓ The consequences of actions, including reward signals, play out over extended time periods.
▪ Characteristics of Reinforcement Learning
✓ No supervisor: only a reward signal
✓ Delayed asynchronous feedback (Feedback is delayed, not instantaneous)
✓ Time really matters (sequential data, continual learning)
✓ Agent’s actions affect the subsequent data it receives (inherent non-stationarity)
Reinforcement Learning vs other types of learning
▪ Reinforcement learning vs Supervised learning
✓ Supervised learning is the learning from a training set of labeled examples provided by a knowledgeable external supervisor.
✓ Each example is a description of a situation together with a label of the correct action the system should take in this situation,
that is often to identify a category to which the situation belongs.
✓ The object of this type of learning is to generalize its responses to act correctly in situations not present in the training set.
✓ This is an important kind of learning, but alone it is not adequate for learning from interaction.
✓ In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and represent all the
situations in which the agent has to act expect learning to be most beneficial, so, an agent must be able to learn from its own
experience.
▪ Reinforcement learning vs Unsupervised learning
✓ Unsupervised learning, is about finding structure hidden in collections of unlabeled data.
✓ Although one might be tempted to think of reinforcement learning as a kind of unsupervised learning because it does not rely
on examples of correct behavior, reinforcement learning is trying to maximize a reward signal instead of trying to find hidden
structure.
✓ Uncovering structure in an agent’s experience can certainly be useful in reinforcement learning, but by itself does not address
the reinforcement learning agent’s problem of maximizing a reward signal.
▪ Note:
✓ Supervised and Unsupervised learning appear to exhaustively classify machine learning paradigms, but they do not.
✓ We therefore consider reinforcement learning to be a third machine learning paradigm, alongside of supervised learning,
unsupervised learning, and perhaps other paradigms as well.
Reinforcement Learning vs other types of learning
Applications of Reinforcement Learning
▪ Self Driving Cars
▪ Robots (walking, navigation, manipulation)
▪ Manage portfolios (Investment portfolios)
▪ Gaming
▪ Health Care ( Dynamic Treatment Regimes (DTR) )
▪ NLP ( Text Summarization, Question Answering, and Machine Translation )
▪ News Recommendation
▪ Industry Automation
Challenges of Reinforcement Learning
▪ Trade-off between Exploration and Exploitation
✓ Exploration : the agent's tendency to try new actions and discover new states.
✓ Exploitation : the agent's tendency to use the best-known actions and maximize the immediate rewards.
✓ Both exploration and exploitation are essential for learning, but they can also conflict with each other.
✓ Trade-off problem
• To obtain a lot of reward, the agent must prefer actions tried in the past and found to be effective in producing reward
(Exploitation).
• But to discover such actions, it must try actions that it has not selected before (Explore).
• Neither exploration nor exploitation can be pursued exclusively without failing at the task.
• The agent must exploit what it already knows to obtain reward, but it also must explore to make better action selections
in the future (Trade-off).
✓ The agent must try a variety of actions and progressively favor those that appear to be best.
✓ Example:
• To achieve balance, RL algorithms must use adaptive and intelligent strategies that depend on the agent's confidence,
curiosity, and goals.
▪ Stochastic environments
✓ Many real-world environments are stochastic, meaning that they are subject to randomness or uncertainty.
✓ This can make it difficult for RL agents to learn effectively.
Elements of Reinforcement learning: Agent and Environment
Elements of Reinforcement learning: Agent and Environment
▪ Beyond the agent and the environment, there exists four sub-elements of a reinforcement learning system: a
policy, a reward signal, a value function, and a model of the environment (optionally).
✓ Agent
• An independent computer program or system that is designed to perceive its environment, make decisions and take
actions to achieve a specific goal or set of goals.
• It interacts with its environment by perceiving its surroundings via sensors, then acting through actuators or effectors.
• It operates autonomously, meaning it is not directly controlled by a human operator.
• In reinforcement learning the agent at each step t do:
▪ Executes action At
▪ Receives observation Ot
▪ Receives scalar reward Rt
▪ Note: t increments at environment step
✓ Environment
• All things outside the agent that are perceived / sensed / interacted by the agent and demonstrate the solved problem.
• In reinforcement learning the environment at each step t do:
▪ Receives action At
▪ Emits observation Ot+1
▪ Emits scalar reward Rt+1
▪ Note: t increments at environment step
Elements of Reinforcement learning: Agent and Environment (Cont.)
✓ State and History
▪ The history is the sequence of observations, actions, rewards: 𝑯𝒕 = 𝑂𝟏; 𝑅1; 𝐴1 … 𝑨𝒕−𝟏 ; 𝑶𝒕 ; 𝑹𝒕
• All observable variables up to time t
• The sensorimotor stream of a robot or embodied agent
▪ What happens next depends on the history:
• The agent selects actions
• The environment selects observations/rewards
▪ State 𝑆𝑡 is the information used to determine what happens next
• Formally, state is a function of the history: 𝑺𝒕 = 𝑓(𝑯𝒕 )
✓ Environment State
• The environment state 𝑺𝒆𝒕 is the environment’s private representation
• Whatever data the environment uses to pick the next observation/reward
• The environment state is not usually visible to the agent
• Even if 𝑺𝒆𝒕 is visible, it may contain irrelevant information
Elements of Reinforcement learning: Agent and Environment (Cont.)
✓ Agent State
▪ The agent state 𝑺𝒂 𝒕 the internal representation owned by agent 𝑎
• Whatever information the agent uses to select next action
• The agent state is the information used by reinforcement learning algorithms
▪ Generally, it can be any function of history 𝑺𝒂𝒕 = 𝑓 ( 𝑯𝒕 )