0% found this document useful (0 votes)
2 views

Lect.2

The document outlines the course content for AI-318: Reinforcement Learning at Menoufia University, covering topics such as dynamic programming, Monte Carlo methods, and applications in various fields. It discusses the fundamental concepts of reinforcement learning, including the roles of agents, environments, and reward signals, as well as the challenges faced in this area. Additionally, it compares reinforcement learning with supervised and unsupervised learning, highlighting its unique characteristics and applications.

Uploaded by

yk63378
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lect.2

The document outlines the course content for AI-318: Reinforcement Learning at Menoufia University, covering topics such as dynamic programming, Monte Carlo methods, and applications in various fields. It discusses the fundamental concepts of reinforcement learning, including the roles of agents, environments, and reward signals, as well as the challenges faced in this area. Additionally, it compares reinforcement learning with supervised and unsupervised learning, highlighting its unique characteristics and applications.

Uploaded by

yk63378
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Menoufia University

Faculty of Artificial Intelligence


AI-318 : Reinforcement Learning

Presented BY DR. Marwa Sharaf El-Din


CSE Dept, Faculty of Electronic Engineering, Menoufia University.
Course Contents
▪ Introduction to Reinforcement Learning
▪ Dynamic Programming
▪ Monte Carlo Methods
▪ Temporal Difference Learning
▪ Function Approximation
▪ Policy Gradient Methods
▪ Exploration and Exploitation
▪ Multi-agent Reinforcement Learning
▪ Model-based Reinforcement Learning
▪ Hierarchical Reinforcement Learning
▪ Transfer learning in reinforcement learning
▪ Applications of Reinforcement Learning
▪ Reinforcement Learning in Practice (Frameworks and case Studies)

References:
1-Reinforcement Learning : An Introduction by Sutton & Barto
2-Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions by Warren B. Powell
Grades
▪ Final exam : 50%
▪ Midterm exam : 20 %
▪ Project + Section + Quizzes : 30%
▪ TOTAL : 100%
Introduction to Reinforcement Learning (RL)
▪ Overview of reinforcement learning
✓ Machine learning and main problems solved by it
✓ Types of machine learning
✓ Reinforcement learning
✓ Reinforcement learning vs other types of machine learning
✓ Challenges of Reinforcement learning
✓ Applications of reinforcement learning
▪ Components of reinforcement learning: agents, environments, rewards
▪ Markov decision processes (MDPs)
Reinforcement Learning
▪ Goal oriented learning by interacting with an environment.
▪ It is used for interactive problems where it is often impractical to obtain examples of desired behavior that are both
correct and representative of all the situations in which the agent must act.
▪ It is also, an area of machine learning concerned with how agents take actions in an environment to maximize
cumulative reward.
▪ The agent must be able to sense the state of the environment to some extent and must be able to take actions that
affect the state.
▪ The agent also must have a goal or goals relating to the state of the environment.
▪ The formulation of reinforcement learning is intended to include just these three aspects sensation, action, and goal
in their simplest possible forms without trivializing any of them.
▪ Reinforcement learning problems involve learning the following:
✓ What to do
✓ How to map situations to actions to maximize a numerical reward signal.
▪ The reinforcement signal is provided by the environment and used to evaluate the action (scalar signal) rather than
telling the learning system how to perform correct actions.
▪ The learner is not told which actions to take, as in many forms of machine learning, but instead must discover which
actions yield the most reward by trying them out.
Reinforcement Learning (Cont.)
▪ Features of reinforcement learning problems.
✓ Closed-loop problems in an essential way, because the learning system’s actions influence its later inputs.
✓ Not having direct instructions as what actions to take.
✓ The consequences of actions, including reward signals, play out over extended time periods.
▪ Characteristics of Reinforcement Learning
✓ No supervisor: only a reward signal
✓ Delayed asynchronous feedback (Feedback is delayed, not instantaneous)
✓ Time really matters (sequential data, continual learning)
✓ Agent’s actions affect the subsequent data it receives (inherent non-stationarity)
Reinforcement Learning vs other types of learning
▪ Reinforcement learning vs Supervised learning
✓ Supervised learning is the learning from a training set of labeled examples provided by a knowledgeable external supervisor.
✓ Each example is a description of a situation together with a label of the correct action the system should take in this situation,
that is often to identify a category to which the situation belongs.
✓ The object of this type of learning is to generalize its responses to act correctly in situations not present in the training set.
✓ This is an important kind of learning, but alone it is not adequate for learning from interaction.
✓ In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and represent all the
situations in which the agent has to act expect learning to be most beneficial, so, an agent must be able to learn from its own
experience.
▪ Reinforcement learning vs Unsupervised learning
✓ Unsupervised learning, is about finding structure hidden in collections of unlabeled data.
✓ Although one might be tempted to think of reinforcement learning as a kind of unsupervised learning because it does not rely
on examples of correct behavior, reinforcement learning is trying to maximize a reward signal instead of trying to find hidden
structure.
✓ Uncovering structure in an agent’s experience can certainly be useful in reinforcement learning, but by itself does not address
the reinforcement learning agent’s problem of maximizing a reward signal.
▪ Note:
✓ Supervised and Unsupervised learning appear to exhaustively classify machine learning paradigms, but they do not.
✓ We therefore consider reinforcement learning to be a third machine learning paradigm, alongside of supervised learning,
unsupervised learning, and perhaps other paradigms as well.
Reinforcement Learning vs other types of learning
Applications of Reinforcement Learning
▪ Self Driving Cars
▪ Robots (walking, navigation, manipulation)
▪ Manage portfolios (Investment portfolios)
▪ Gaming
▪ Health Care ( Dynamic Treatment Regimes (DTR) )
▪ NLP ( Text Summarization, Question Answering, and Machine Translation )
▪ News Recommendation
▪ Industry Automation
Challenges of Reinforcement Learning
▪ Trade-off between Exploration and Exploitation
✓ Exploration : the agent's tendency to try new actions and discover new states.
✓ Exploitation : the agent's tendency to use the best-known actions and maximize the immediate rewards.
✓ Both exploration and exploitation are essential for learning, but they can also conflict with each other.
✓ Trade-off problem
• To obtain a lot of reward, the agent must prefer actions tried in the past and found to be effective in producing reward
(Exploitation).
• But to discover such actions, it must try actions that it has not selected before (Explore).
• Neither exploration nor exploitation can be pursued exclusively without failing at the task.
• The agent must exploit what it already knows to obtain reward, but it also must explore to make better action selections
in the future (Trade-off).
✓ The agent must try a variety of actions and progressively favor those that appear to be best.
✓ Example:
• To achieve balance, RL algorithms must use adaptive and intelligent strategies that depend on the agent's confidence,
curiosity, and goals.
▪ Stochastic environments
✓ Many real-world environments are stochastic, meaning that they are subject to randomness or uncertainty.
✓ This can make it difficult for RL agents to learn effectively.
Elements of Reinforcement learning: Agent and Environment
Elements of Reinforcement learning: Agent and Environment
▪ Beyond the agent and the environment, there exists four sub-elements of a reinforcement learning system: a
policy, a reward signal, a value function, and a model of the environment (optionally).
✓ Agent
• An independent computer program or system that is designed to perceive its environment, make decisions and take
actions to achieve a specific goal or set of goals.
• It interacts with its environment by perceiving its surroundings via sensors, then acting through actuators or effectors.
• It operates autonomously, meaning it is not directly controlled by a human operator.
• In reinforcement learning the agent at each step t do:
▪ Executes action At
▪ Receives observation Ot
▪ Receives scalar reward Rt
▪ Note: t increments at environment step
✓ Environment
• All things outside the agent that are perceived / sensed / interacted by the agent and demonstrate the solved problem.
• In reinforcement learning the environment at each step t do:
▪ Receives action At
▪ Emits observation Ot+1
▪ Emits scalar reward Rt+1
▪ Note: t increments at environment step
Elements of Reinforcement learning: Agent and Environment (Cont.)
✓ State and History
▪ The history is the sequence of observations, actions, rewards: 𝑯𝒕 = 𝑂𝟏; 𝑅1; 𝐴1 … 𝑨𝒕−𝟏 ; 𝑶𝒕 ; 𝑹𝒕
• All observable variables up to time t
• The sensorimotor stream of a robot or embodied agent
▪ What happens next depends on the history:
• The agent selects actions
• The environment selects observations/rewards
▪ State 𝑆𝑡 is the information used to determine what happens next
• Formally, state is a function of the history: 𝑺𝒕 = 𝑓(𝑯𝒕 )
✓ Environment State
• The environment state 𝑺𝒆𝒕 is the environment’s private representation
• Whatever data the environment uses to pick the next observation/reward
• The environment state is not usually visible to the agent
• Even if 𝑺𝒆𝒕 is visible, it may contain irrelevant information
Elements of Reinforcement learning: Agent and Environment (Cont.)
✓ Agent State
▪ The agent state 𝑺𝒂 𝒕 the internal representation owned by agent 𝑎
• Whatever information the agent uses to select next action
• The agent state is the information used by reinforcement learning algorithms
▪ Generally, it can be any function of history 𝑺𝒂𝒕 = 𝑓 ( 𝑯𝒕 )

✓ Information State (Markov State)


▪ It contains all useful information from the history.
▪ Definition of Markov property:
• Agent state 𝒔𝒕 is Markov if and only if P[𝒔𝒕+𝟏 | 𝒔𝒕 ] = P[𝒔𝒕+𝟏 | 𝒔𝟏 , ..., 𝒔𝒕 ]
• Current state already captures the information of the past states.
▪ The future is independent of the past given the present
𝑯𝟏:𝒕 → 𝒔𝒕 → 𝑯𝒕+𝟏:∞
▪ Once the state is known, the history may be thrown away
• The state is a sufficient statistic of the future
▪ The environment state 𝑺𝒆𝒕 is Markov
▪ The history 𝑯𝒕 is Markov
Elements of Reinforcement learning: Agent and Environment (Cont.)
Elements of Reinforcement learning: Policy and Model
▪ Beyond the agent and the environment, there exists four sub-elements of a reinforcement learning system: a
policy, a reward signal, a value function, and a model of the environment (optionally).
✓ Policy
• Defines the agent’s way of behaving at a given time.
• Maps perceived states of the environment to actions to be taken when in those states.
Elements of Reinforcement learning: Policy and Model
Maze Example
Summary
Any Questions?

You might also like