SSRN 4768234

Uploaded by

Al-Shuka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views6 pages

SSRN 4768234

Uploaded by

Al-Shuka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Balancing a Cart Pole Using Reinforcement

Learning in OpenAI Gym Environment

Shaili Mishra Anuja Arora
Department of Computer Science Department of Computer Science
Jaypee Institute of Information and Jaypee Institute of Information and
Technology Technolgy
http://orchid.org/0000-0002-2628-4879 http://orchid.org/0000-0001-2515-1300
[email protected]
[email protected]
Abstract— Reinforcement Learning (RL) is a subcategory of Reinforcement Learning (RL) executes through the concept of
machine learning. The special feature of reinforcement learning the feedback process, where the agent has to interact with the
that distinguishes it from other machine learning approaches is environment and perform the action on the particular state for
the self-training of the agent from obtained information and optimal reward function according to requirements. This
feedback from the environment. The appropriate action process in the loop is a trial-and-error procedure where the
selection guided the agent towards the better optimal solution. agent decides which particular action on that specific state
The agent has no prior knowledge about the environment, the helps achieve the target state in the system.
agent has to explore each aspect of the environment based on
feedback. This principal advantage of RL algorithms suits After performing action on the current state next state is
complex optimal control problems such as a Cart pole problem generated by the environment and as reward feedback is
inverted pendulum problem and robotics where no prior returned to the agent. After performing this procedure in the
information on the system dynamics is available. In this paper, loop, the agent learns the best action on the state for achieving
the traditional mechanical Cartpole system is controlled by a maximum cumulative reward [11].
using Q-learning models, and for evaluation measures, Mean
Square error (MSE) and Mean Absolute Error (MAE) are Mostly mechanical and underactuated systems are very
applied in algorithms with the introduction of OpenAI Gym suitable for reinforcement learning-based research areas due
environment. to their dynamic and complex nature. In Reinforcement
Learning, the agent evaluates each aspect of real-time systems
Keywords— Reinforcement Learning, Cart Pole, OpenAI to ensure its optimal performance. So, for the real time
Gym, Q Learning systems reinforcement learning algorithms such as Q-learning
and deep Q-learning are very popular among researchers. The
I. I NTRODUCTION proficiency of RL has impelled towards the various domains
Reinforcement Learning is a training method of machine and solved the domain-specific challenges efficiently. The
learning involving reward and penalty for desired behavior applications RL are mostly in these fields’ robotics, game
and for undesired behavior respectively. The Reinforcement playing, self-driving cars, and resource management. Drug
Learning agent through consciousness interprets its discovery and financial trading. In robotics, RL trains the
environment, takes actions, and modifies its learning agent to learn complex tasks such as grasping objects, and
paradigms about its environment. obstacle detection. Similarly, in the gaming areas, the RL
agents got expertise similar to human experts. For
The agent selects a particular action according to the
autonomous driving, the RL agent makes real-time decisions
interaction with the robust and dynamic environment. Such based on traffic navigation scenarios and takes decisions
robust and dynamic controllers are widely used in real-world
according. Another application of RL is resource
problems as PID controllers and fuzzy controllers where
management, The RL agent is responsible for optimizing the
frequent adjustments are required for efficient performance of resources as inventory management, traffic navigation, and
applications. When RL is applied to such a robust system, the
improving efficiency by optimizing energy distribution.
agent has to interact with the unknown environment and try to
Similarly. RL trained the agent for identifying potential drug
achieve maximum cumulative reward [2]. The traditional candidates and analyzing market data and based on these
methods for these mechanical systems are made up of existing
optimized data performed the strategy for maximum returns.
physics-based concepts and mathematical formulations. But
these traditional procedures are executed manually by twisting Reinforcement Learning computes the optimal solution
control parameters which induces various issues and errors in which is the maximum result in the minimum time for
the execution of mechanical systems. [1, 2]. complex and dynamic problem domains. The agent
understands the environment after performing the same
According to the learning process, methods, and
procedure repeatedly and training its knowledge about the
applications, machine learning algorithms are divided into
supervised learning, unsupervised learning, semi-supervised, environment.
and reinforcement learning. In supervised machine learning, In previous research work, several studies were performed
the mapping between some input data and output data is to solve the traditional control problem of cart pole balancing.
already available, known as labeled data and by using these However, most studies considered the theoretical aspects of
predefined labeled data the machine trains itself and then physics for solving the cart pole problem. So, for further
predicts output for future input values. research, reinforcement learning approaches attract the
attention of the researcher for such physics-based problems.
The supervised learning algorithm is suitable for In this research work, the concept of Q learning is proposed
classification and regression problems. While in unsupervised with two reward functions mean square error (MSE) and
learning, the algorithm tries to discover patterns between
Mean Absolute error (MAE). The fast and stable convergence
unlabeled datasets given as input. [13] Unsupervised learning of the cart pole balancing problem obtained by Q-learning
applied to clustering and association problems. with the MSE and MAE as reward function evaluated and

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Electronic copy available at: https://ssrn.com/abstract=4768234

performed more efficiently than traditional physics-based (𝑀+𝑚)𝑔 sin 𝜃− cos𝜃[𝐹+𝑚𝑙𝜃 ̇2 sin 𝜃]
concepts. 𝜃̈ = 4 (3)
( )( 𝑀+𝑚)𝑙−𝑚𝑙𝑐𝑜𝑠2𝜃
3
The following sections are about the research problem -the
{𝐹+𝑚𝑙[𝜃̇ 2 sin 𝜃− 𝜃̈ cos𝜃]}
cart pole balancing, an overview of Reinforcement Learning, 𝑥̈ = ()
OpenAI Gym and Q Learning, and reward functions. The cart (𝑀+𝑚)

pole system is defined in Section II in detail, and Section III is

about RL concepts. OpenAI Gym explained in Section IV and The Cart pole mechanical system is highly dynamic in
Section V defined different reward functions and the nature. The agent has to control the input parameters in such a
conclusion in Section VI. manner that the pendulum is balanced around its center of
mass above the moving cart. In the simulation of the cart pole
II. THE CART-POLE BALANCING PROBLEM action space is defined as {LEFT, RIGHT}, which means the
cart can move horizontally either in Left or Right direction.
By using the Reinforcement learning algorithms, the RL
agent is trained to balance the pole joint with the cart at some The mathematical formulation for state space parameters
pivot point which moves horizontally on the surface. The cart of nonlinear dynamics of the cart pole mechanism is defined
pole system mainly has two components a simple cart and a in equation 5.[2]
vertical bar. The pole on the cart was fixed at some pivot point
and the cart can move in left or write directions as explained 𝑥̇
𝑥̇ {𝐹+𝑚𝑙[𝜃̇ 2 sin 𝜃− 𝜃̈ cos 𝜃]}
in Fig 1. (𝑀+𝑚)
In the Cart Pole Environment, the agent explores all the [𝑥̈ ̇ ] = (5)
𝜃 𝜃̇
possible actions and their corresponding rewards. Then, 𝜃̈
̇ sin 𝜃]
(𝑀+𝑚)𝑔 sin 𝜃− cos𝜃[𝐹+𝑚𝑙𝜃 2
update its policy towards optimal reward achievement. The [
4
( )(𝑀+𝑚)𝑙−𝑚𝑙𝑐𝑜𝑠2𝜃 ]
3
goal of this problem is to find a control policy for balancing
the pole in an upward direction by using bidirectional force
applied to the cart[12].
III. REINFORCEMENT LEARNING
A. Basic Concepts
Reinforcement learning is a subfield of machine learning
that focuses on self-determining decision-making abilities
following the time constraint. In Reinforcement Learning
(RL), the observation parameters are dependent upon an
agent’s behavior, which means the agent should be efficient
enough to learn and can improve its performance by learning
rather than showing false impressions and negative feedback
all the time. So, the agent tries to learn about the anonymous
environment by following the trial-and-error procedure for
learning optimal action state space and achieving its goals.
Reinforcement Learning includes two major entities
Agent and Environment and Actions, reward, and observation
Fig. 1. Cart Pole Dynamics and Control Parameters (Adapted from [1][3]) are their communication channels, as shown in Fig 2. So, the
reinforcement learning entities and their communications are
In Fig 1, the dynamic system of the classical cart pole is as follows:
explained. The cart is moving on a fixed frictionless surface
horizontally due to force F, 𝜃 is the deviation of the pole from • The Agent: An agent is an entity that interacts with the
the pivot point [3]. So, for the cart pole system state space environment by performing some actions, afterward
parameters are defined with the help of a four-dimensional taking observations, and finally receives rewards as
vector {𝓍, 𝓍̇ , 𝜃, 𝜃̇} where x is the horizontal distance traveled feedback from the environment.
by the cart and 𝓍̇ is the linear acceleration of the cart. The cart • The Environment: The environment refers to the external
pole system’s mathematical formulation is defined in system through which an agent interacts. The agent
Equation 1. communicates with the environment and receives these
information rewards generated by the environment,
(𝑀 + 𝑚)𝓍̈ + 𝜖𝓍̇ + 𝑚𝑙𝜃̈ cos𝜃 − 𝑚𝑙𝜃̇ 2 sin 𝜃 = 𝐹(𝑡) (1) actions performed by the agent on the environment, and
observations the agent receives from the environment.
4
𝑚𝑙𝑥̈ cos 𝜃 + 𝑚𝑙 2 𝜃̈ − 𝑚𝑔𝑙 sin𝜃 = 0 (2) • Reward: Reward is a numerical value, received from the
3
environment. Reward values either be positive or
In the above-defined Equation 1 and Equation 2, x(t) is the negative. The main purpose of reward is to train the agent
distance traveled by the cart pole on a non-friction surface to achieve the final goal. The
from the centre point, and 𝓍̇ and 𝓍̈ represent the velocity of
the cart and acceleration respectively. For the mathematical • Actions: Actions are moves that can an agent perform in
computation of the angular acceleration of the pole 𝜃̈ and the environment. In the RL, there are two types of actions
linear acceleration of cart 𝓍̈ formulas defined in Equation 3 discrete or continuous actions performed on the
and Equation 4 were applied. environment. The agent chooses the actions according to

Electronic copy available at: https://ssrn.com/abstract=4768234

their current states and tries to learn the most effective preferences for the immediate rewards. The value of
actions to achieve their goal in the given environment. the discount factor lies between [0,1].
• Observations: The observation is information about the
environment on that particular time stamp. Observation
provides information about the current state of the agent IV. OPENAI GYM AND Q-LEARNING
on that particular time stamp, possible action space for The OpenAI gym is a standard application programming
that current state, and other environmental information. interface for solving reinforcement learning for environments
as classic control and toy text, Atari games, 2D and 3D robots.
OpenAI Gym provides the interface for several classical
control engineering environments. These interfaces test the
efficiency of reinforcement learning so that proposed
algorithms can be applied to mechanical systems such as
robots, medical fields, etc.
In this paper, For the Cart pole problem, OpenAI Gym is
used. In the environment, a pole is attached by a pivot point to
a frictionless cart. The pendulum is placed in the upward
direction and the cart moves left and right on the surface. In
Fig. 2. The reinforcement learning process (Adapted from [4]) the Cart pole, the agent trying to keep the pole upright.
Initially, the pendulum starts from an upward direction and the
The AI agent selects an action from the action spacemoves system aims to prevent the pole from falling after applying
toward a new state and receives a reward from the force on the cart. The action space of the crat pole is two
environment as feedback. After repeating these steps, the discrete values (0,1), 0 represents push the cart in the left
agent learns which action is best in a particular state to obtain direction and 1 means push the cart in the right direction
the maximum cumulative reward. As shown in Fig 2, in each according to Figure 3. After performing an action on state, the
iteration the agent receives current state s from the environment produces an observation state space which
environment, then after applying an action a the agent’s state consists cart’s position, cart’s velocity, the pole angle, and the
changes. After repeatedly performing this process, the agent angular velocity of the pole. The cart position lies between (-
learns from the obtained experience regarding state, action, 4.8, 4.8) and the termination condition is (-2.4, 2.4). The pole
and corresponding next state and reward. This knowledge angle observed between (±24°) and episodes terminates when
helps the agent to achieve the cumulative reward to achieve the pole lies outside (±12°) range. The +1 reward is assigned
the goal. The main goal of the Reinforcement learning for balancing the pole in the upward direction on the cart as
algorithm is to compute the optimal policy for the given long as possible.[8]
problem.[5, 7]

B. Markov Decision Process (MDP)

The Markov Decision Process (MDP) represents the
mathematical framework for dynamic decision-making
situations where the performance of the system is influenced
by random factors and uncertain system parameters [6]. The
MDP framework consists of four key terms such as state S,
action A, Transition Probability P, and reward R. So, The
MDP for the cart pole problem is < 𝑆, 𝐴, 𝑃, 𝑅, 𝛾 >, where-
• State (S): The state space parameters in the cart pole
problem represent the current status of the agent which
includes cart position, cart velocity, pole angle, and
pole angular velocity.
• Action (A): The action set A is about all possible Fig. 3. Action state parameters for Cart pole mechanism [2]
movements that control the dynamics of the cart and
pole. In the cart pole environment, the movement of Q-learning is a value-based reinforcement learning
the cart is only possible on the left or right. algorithm in which the environment is not familiar to the agent
and the agent has to figure out the best actions for obtaining
• Transition probability (P): P is about the probability an optimal solution. In the Q-learning method the samples
distribution given for the current state over the next (𝑆, 𝐴, 𝑅, 𝑆 ′ ) generated by following policy to maximize
possible potential successor state. Q (𝑆 ′ , 𝐴′ ) values for achieving the desired target. For the
• Reward (R): Reward R is a numerical value associated formulation of the Q value, the ɛ-greedy policy is applied for
with a state-action pair that converges the agent’s samples (𝑆, 𝐴, 𝑅, 𝑆 ′ ) defined in equation (6)
learning process towards the optimal maximum
cumulative sum of reward. 𝑄(𝑆, 𝐴) = 𝑅(𝑆, 𝐴) + 𝛾𝑚𝑎𝑥𝐴𝑄(𝑆 ′, 𝐴) (6)

• Discount factor (𝛾): The discount factor 𝛾 determines Where, Q (S, A) is the Q-value at state S for action A, and for
the influence of the future rewards and determines the computing Q- value immediate reward R(S, A) and maximum

Electronic copy available at: https://ssrn.com/abstract=4768234

Q-value from the next state S’ required. gamma (γ) is a Where, N is the total number of samples and 𝑦𝑖 and 𝑦̂𝑖 are
discount factor that decides the importance of future the predicted value and actual value of the particular
rewards[7, 10]. The value of Q (S’, A) depends upon future sample.
Q-values, as defined in equation (7) VI. EXPERIMENTS AND RESULTS
𝑄(𝑆, 𝐴) = 𝛾𝑄(𝑆 ′ ,𝐴)
This section provides the details of experiments and their
+ 𝛾 2 𝑄(𝑆 ′′ ,𝐴) …… 𝛾 𝑛𝑄(𝑆 ′′….𝑛 ,𝐴) (7) outcomes after performing various reward functions described
in previous sections.
For computing Q-value for action 𝐴𝑡 at state 𝑆𝑡 value of
maximum Q action 𝑎𝑟𝑔𝑚𝑎𝑥𝐴′ . 𝑄(𝑆 ′, 𝐴′ ) for state S’ is A. Hyperparameters Setting
required followed by the concepts of exploitation. To update The results of both reward functions MSE and MAE for
cart pole problem solved by the Q learning approach are
the value of Q-value equation (8) is defined
presented in this section. The validation of the training process
𝑄(𝑆𝑡 ,𝐴𝑡 ) = 𝑄(𝑆𝑡 , 𝐴𝑡 ) + for both reward functions is performed by varying
∝∗ [𝑅 𝑡+1 + 𝛾 ∗ 𝑚𝑎𝑥𝐴𝑡+1𝑄(𝑆𝑡+1 ,𝐴𝑡+1 ) hyperparameters and their adjustments for better convergence.
− 𝑄(𝑆𝑡 ,𝐴𝑡 )] (8) The details of hyperparameters are stated in Table I.
TABLE I. THE Q LEARNING ALGORITHMS P ARAMETER DETAILS FOR
In reinforcement learning, the agent has to choose whether to CART POLE OPEN AI ENVIRONMENT
continue with the current knowledge about the state, actions, Parameters Value
and rewards or to explore other options. Exploration is a Gamma 0.99
greedy approach where the agent focuses on improving their Episodes 100
knowledge about the environment for long-term benefits. epsilon 0.99
Activation function Tanh, linear
While in exploitation, the agent tries to compute maximum
Learning rate 1e-2
rewards by exploiting current knowledge rather than
knowledge gathering. So, in exploration, the agent
persistently gathers information to obtain optimal results B. Performance of various reward functions in Q learning
while in exploitation, the agent optimizes the decision based To evaluate the performance of the Cartpole environment,
on current information available. two reward functions Mean Square Error and Mean Absolute
error implemented as reward functions in Q learning
algorithms. In the training procedure 100 episodes were
V. REWARDS generated and based on that samples mean and median were
computed for reward functions as shown in Table II
In reinforcement learning, reward refers to feedback or
numerical value that is generated by the environment and TABLE II. THE C ART POLE’S R EWARD F UNCTIONS
received by the agent after taking action on a particular state.
The reward function helps the agent to learn about the Reward Mean Max Min Median
Functions
environment and update its knowledge about the system. The
MSE 26.61386 88 9 22
primary goal of the agent is to choose the action state pairs in
MAE 25.36634 85 8 20
a way that leads toward the maximum or minimum reward. In
this paper, for the performance measures of Q learning for the
cart pole problem, Mean Squared Error (MSE) and Mean Figure 4 and Figure 5 show the reward functions MSE and
Absolute Error (MAE) are applied to guide the RL agent for MAE for Q learning algorithms applied to the environment of
optimal decision-making policy. the Cart pole. The violin plot is a combination of a box plot
and a probability density function. The white dot in the box in
• Mean Squared Error Loss (MSE):
Figure 4 depicts the median for a specific reward function and
Mean Squared Error measures the average value of the the distribution of the reward function is defined by violin
squared difference between the predicted and the actual plots. The boxplots are used for uniform distribution while the
value. Mathematical formulation for computation of mean violin plot reveals their different distribution. The violin plot
square error is defined in equation (9) shown in Figure 4 proves that Mean Square Error gave
1 weightage to each outlier value while MAE ignored them.
𝑀𝑆𝐸 = ∑𝑁 ̂𝑖 ) 2
𝑖=1(𝑦𝑖 − 𝑦 (9) The shape of violin plots shows that for MSE and MAE
𝑁
reward is distributed near the mean value.[9]
Where, 𝑦𝑖 and 𝑦̂𝑖 represents the predicted value and actual
value of the particular sample and N is the total no. of
samples.

• Mean Absolute Error Loss (MAE)

MAE evaluates the average of absolute difference between
observation entities to the prediction entities. The formula
is defined in equation (10):
1
𝑀𝐴𝐸 = ∑𝑁
𝑖=1|𝑦𝑖 − 𝑦
̂𝑖 | (10)
𝑁

Electronic copy available at: https://ssrn.com/abstract=4768234

Figure 5(b): MAE Q learning performance

Fig. 4. MSE Q-learning plots(0 = MSE Reward, 1= MAE Reward)

Figure 5(c): Comparative plot for MSE Q learning and MAE Q
In the figure 5(a), the performance of MSE is depicted. Learning performance
Initially, up to 20 episodes value of the reward function varies
between 35 and 7 and the maximum reward for MSE is 88 at
episode 72. Figure 5(b) shows the reward and episode plots VII. CONCLUSION
for MAE reward functions. This graph shows the maximum Reinforcement Learning approaches implement the
reward is 85 but MAE ignores the outlier values. mathematical methodology for computing optimal solutions
Figure 5(c) is a comparative line plot of both reward to determine the best decision-making strategies for agents.
functions MSE and MAE. Table II shows the comparative The agent is trained for future perspective according to those
details about both MSE and MAE reward functions which strategies and seeks the best solution in a specific scenario. In
this paper, Q-learning with Mean Squared Error (MSE) and
include mean, median, max, and min for both MSE and MAE.
Mean Absolute Error (MAE) as reward functions are applied
to the cart pole system. The performance evaluation of both
proposed approaches is based on balancing of pole with
maximum reward. According to that Q-Learning with MAE
ignores the outliner values while Q-Learning with MSE gives
importance to outlier values. In future work, more RL models
can be applied to the Cart Pole problem and compare their
performance.
REFERENCES
[1] Mishra, S., & Arora, A. (2023). A Huber reward function-driven deep
reinforcement learning solution for cart-pole balancing problem.
Neural Computing and Applications, 35(23), 16705-16722.
[2] Mishra, S., & Arora, A. (2022). Double Deep Q Network with Huber
Reward Function for Cart-Pole Balancing Problem. International
Figure 5(a): MSE Q learning performance Journal of Performability Engineering, 18(9), 644.
[3] Kumar, S.: Balancing a cartpole system with reinforcement learning–a
tutorial. arXiv preprint arXiv:2006.04938 (2020)
[4] Gym, O., Sanghi, N.: Deep reinforcement learning with python.
[5] Samsuden, M. A., Diah, N. M., & Rahman, N. A. (2019, October). A
review paper on implementing reinforcement learning technique in
optimising games performance. In 2019 IEEE 9th international
conference on system engineering and technology (ICSET) (pp. 258-
263). IEEE.
[6] Jia, J., & Wang, W. (2020, October). Review of reinforcement learning
research. In 2020 35th Youth Academic Annual Conference of Chinese
Association of Automation (YAC) (pp. 186-191). IEEE.

Electronic copy available at: https://ssrn.com/abstract=4768234

[7] Shi, Q., Lam, H. K., Xiao, B., & Tsai, S. H. (2018). Adaptive PID in computing, communications and informatics (ICACCI) (pp. 26-32).
controller based on Q‐ learning algorithm. CAAI Transactions on IEEE.
Intelligence Technology, 3(4), 235-244. [11] Ladosz, P., Weng, L., Kim, M., & Oh, H. (2022). Exploration in deep
[8] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., reinforcement learning: A survey. Information Fusion, 85, 1-22.
Tang, J., & Zaremba, W. (2016). Openai gym. arXiv preprint [12] Huang, X. (2022). Opponent cart-pole dynamics for reinforcement
arXiv:1606.01540. learning of competing agents. Acta Mechanica Sinica, 38(5), 521540.
[9] Ada, S. E., & Ugur, E. (2023). Meta-World Conditional Neural [13] Mothanna, Y., & Hewahi, N. (2022, November). Review on
Processes. arXiv preprint arXiv:2302.10320. Reinforcement Learning in CartPole Game. In 2022 International
[10] Nagendra, S., Podila, N., Ugarakhod, R., & George, K. (2017, Conference on Innovation and Intelligence for Informatics,
September). Comparison of reinforcement learning algorithms applied Computing, and Technologies (3ICT) (pp. 344-349). IEEE.
to the cart-pole problem. In 2017 international conference on advances

Electronic copy available at: https://ssrn.com/abstract=4768234

The Consultant Next Door
100% (1)
The Consultant Next Door
255 pages
Comparative Analysis of RL Models
No ratings yet
Comparative Analysis of RL Models
47 pages
Comprehensive Survey of Reinforcement Learning From Algorithms To Practical Challenges
No ratings yet
Comprehensive Survey of Reinforcement Learning From Algorithms To Practical Challenges
79 pages
Amt 2201
No ratings yet
Amt 2201
51 pages
Instant Ebooks Textbook Peaceful Warrior The Graphic Novel Dan Millman Download All Chapters
100% (4)
Instant Ebooks Textbook Peaceful Warrior The Graphic Novel Dan Millman Download All Chapters
55 pages
Recent Advances in Automation, Robotics and Measuring Techniques
No ratings yet
Recent Advances in Automation, Robotics and Measuring Techniques
724 pages
Reinforcement Learning-Based Mobile Robot Navigation
No ratings yet
Reinforcement Learning-Based Mobile Robot Navigation
22 pages
Reinforcement Learning For Sequential Decision and Optimal Control
No ratings yet
Reinforcement Learning For Sequential Decision and Optimal Control
67 pages
Reinforcement Learning For Embedded Robotics
No ratings yet
Reinforcement Learning For Embedded Robotics
9 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
Report ML Aat g1 Final
No ratings yet
Report ML Aat g1 Final
8 pages
Impact of RL in Robot Control
No ratings yet
Impact of RL in Robot Control
20 pages
Survey of Model-Based Reinforcement Learning: Applications On Robotics
No ratings yet
Survey of Model-Based Reinforcement Learning: Applications On Robotics
21 pages
RL Week - 1
No ratings yet
RL Week - 1
53 pages
ML Unit2
No ratings yet
ML Unit2
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
w7 - Reinforcement Learning
No ratings yet
w7 - Reinforcement Learning
5 pages
CarTrade Tech - CR - 20 Dec 2021
No ratings yet
CarTrade Tech - CR - 20 Dec 2021
64 pages
Paper Fiuri
No ratings yet
Paper Fiuri
17 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
27 pages
DownloadFull TextResearchReportPDFPp1 18 Signed
No ratings yet
DownloadFull TextResearchReportPDFPp1 18 Signed
18 pages
Chemical Engineering Communications
No ratings yet
Chemical Engineering Communications
34 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
RL Chap 5
No ratings yet
RL Chap 5
21 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
19 pages
RL Unit - Iii
No ratings yet
RL Unit - Iii
20 pages
Rural Tourism Entrepreneurship As Strategy For Economic 17svbo66ip
No ratings yet
Rural Tourism Entrepreneurship As Strategy For Economic 17svbo66ip
12 pages
The Age of Discovery and Geopolitical Thought
No ratings yet
The Age of Discovery and Geopolitical Thought
11 pages
ARTICLEONnlp
No ratings yet
ARTICLEONnlp
18 pages
Unit 5 ML
No ratings yet
Unit 5 ML
49 pages
Mathcad - DC Motor Velocity
No ratings yet
Mathcad - DC Motor Velocity
34 pages
Algorithms For Reinforcement Learning - Szepesvari
No ratings yet
Algorithms For Reinforcement Learning - Szepesvari
98 pages
Reinforcement Learning (RL) : Agent
No ratings yet
Reinforcement Learning (RL) : Agent
35 pages
Seminar Report
No ratings yet
Seminar Report
12 pages
Reinforcement Learning 1
No ratings yet
Reinforcement Learning 1
14 pages
A Better Return On Self-Awareness
No ratings yet
A Better Return On Self-Awareness
8 pages
RL
No ratings yet
RL
1 page
Module Iii
No ratings yet
Module Iii
8 pages
EC6405 Control Systems Engineering
No ratings yet
EC6405 Control Systems Engineering
30 pages
RL
No ratings yet
RL
94 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Optimal Position Control of A DC Motor Using LQG With EKF: Aravind M. A., Niranjan Saikumar, Dinesh N. S
No ratings yet
Optimal Position Control of A DC Motor Using LQG With EKF: Aravind M. A., Niranjan Saikumar, Dinesh N. S
6 pages
Playbook Executive Briefing Reinforcement Learning
No ratings yet
Playbook Executive Briefing Reinforcement Learning
20 pages
Funnel Infographics
No ratings yet
Funnel Infographics
8 pages
Torque-Bounded Admittance Control Realized by A Set-Valued Algebraic Feedback
No ratings yet
Torque-Bounded Admittance Control Realized by A Set-Valued Algebraic Feedback
14 pages
PD Control Based On Reinforcement Learning Compensation For A DC Servo Drive
No ratings yet
PD Control Based On Reinforcement Learning Compensation For A DC Servo Drive
6 pages
Quiz 1. Introduction To Accounting - Attempt Review
No ratings yet
Quiz 1. Introduction To Accounting - Attempt Review
11 pages
Answer All The Following Questions: Problem 1: (25 Marks) : Advantages
No ratings yet
Answer All The Following Questions: Problem 1: (25 Marks) : Advantages
10 pages
Coca Cola - Killian Farrell & Luis Honsel
No ratings yet
Coca Cola - Killian Farrell & Luis Honsel
1 page
Reinforcement Learning Control of A SAG Mill Grinding Circuit
No ratings yet
Reinforcement Learning Control of A SAG Mill Grinding Circuit
13 pages
Explicit Force Control Vs Impedance Control For Micromanipulation
No ratings yet
Explicit Force Control Vs Impedance Control For Micromanipulation
9 pages
Exp-14 Reinforcement Learning
No ratings yet
Exp-14 Reinforcement Learning
11 pages
Chapter 1 Introduction RL Report Kiran
No ratings yet
Chapter 1 Introduction RL Report Kiran
2 pages
Green Virgo Project File-1-2
No ratings yet
Green Virgo Project File-1-2
15 pages
Final
No ratings yet
Final
18 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
Reinforcement Learning Enhanced
No ratings yet
Reinforcement Learning Enhanced
3 pages
RLAlgs in MDPs
No ratings yet
RLAlgs in MDPs
98 pages
Research Article: Adaptive PID Controller Using RLS For SISO Stable and Unstable Systems
No ratings yet
Research Article: Adaptive PID Controller Using RLS For SISO Stable and Unstable Systems
6 pages
Using Q-Learning To Automatically Tune Quadcopter PID Controller Online For Fast Altitude Stabilization
No ratings yet
Using Q-Learning To Automatically Tune Quadcopter PID Controller Online For Fast Altitude Stabilization
6 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
Economics - Sample Paper
No ratings yet
Economics - Sample Paper
20 pages
RL PyTexas 2017 PDF
No ratings yet
RL PyTexas 2017 PDF
29 pages
03 04 Lessonarticle
No ratings yet
03 04 Lessonarticle
5 pages
Back Up Your Files To The Cloud
No ratings yet
Back Up Your Files To The Cloud
1 page
Reinforcement Learning: A Tutorial
No ratings yet
Reinforcement Learning: A Tutorial
17 pages
Algorithms 17 00269
No ratings yet
Algorithms 17 00269
2 pages
Thesis Reinforcement Learning
100% (2)
Thesis Reinforcement Learning
5 pages
Brushless DC Motor Speed Control System of The Walking Aids Machine
No ratings yet
Brushless DC Motor Speed Control System of The Walking Aids Machine
4 pages
A Concise Introduction To Reinforcement Learning: February 2018
No ratings yet
A Concise Introduction To Reinforcement Learning: February 2018
12 pages
Chapter 3 - Highway Capacity and Level of Service
100% (2)
Chapter 3 - Highway Capacity and Level of Service
74 pages
The Papir Hogy Nem
No ratings yet
The Papir Hogy Nem
13 pages
UNIT V Reinforcement Learning
No ratings yet
UNIT V Reinforcement Learning
8 pages
UNIT-V-Reinforcement Learning
No ratings yet
UNIT-V-Reinforcement Learning
4 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
REPSOL Industrial Catalog
No ratings yet
REPSOL Industrial Catalog
57 pages
从LQR角度看RL和控制
No ratings yet
从LQR角度看RL和控制
28 pages
ML Assign Shubham
No ratings yet
ML Assign Shubham
13 pages
Referencia 5
No ratings yet
Referencia 5
8 pages
Marris'S Theory of Managerial Enterprise
100% (1)
Marris'S Theory of Managerial Enterprise
14 pages
Math 337 - Elementary Differential Equations: Lecture Notes - Laplace Transforms: Part A
No ratings yet
Math 337 - Elementary Differential Equations: Lecture Notes - Laplace Transforms: Part A
26 pages
Prac#5 Checking and Adjustment of Level Instrument
No ratings yet
Prac#5 Checking and Adjustment of Level Instrument
1 page
Reinforcement Learning Basics and Beyond
No ratings yet
Reinforcement Learning Basics and Beyond
1 page
Simultaneous Closed-Loop Automatic Tuning Method For Cascade Controllers
No ratings yet
Simultaneous Closed-Loop Automatic Tuning Method For Cascade Controllers
8 pages
Automatic Control, Basic Course FRTF05: Reglerteknik
No ratings yet
Automatic Control, Basic Course FRTF05: Reglerteknik
6 pages
Reinforcement Learning: Pablo Zometa - Department of Mechatronics - GIU Berlin 1
No ratings yet
Reinforcement Learning: Pablo Zometa - Department of Mechatronics - GIU Berlin 1
12 pages
Hamming Code 11
100% (3)
Hamming Code 11
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
3 pages
1.1 Nielsen - RMS - Training - Final
No ratings yet
1.1 Nielsen - RMS - Training - Final
52 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Student Vendor Proposal 2018
No ratings yet
Student Vendor Proposal 2018
14 pages
Explorers, or Boys Messing About
100% (1)
Explorers, or Boys Messing About
10 pages
Group A
No ratings yet
Group A
1 page
2nd Periodical Test 20232024
No ratings yet
2nd Periodical Test 20232024
5 pages
Pecha Kucha - Randy Boynes
No ratings yet
Pecha Kucha - Randy Boynes
19 pages
Algorithms For Reinforced Learning
No ratings yet
Algorithms For Reinforced Learning
98 pages
Use of Placebo Controls in Clinical Research - Trials2012
No ratings yet
Use of Placebo Controls in Clinical Research - Trials2012
7 pages
Mathematics-I - Bridge Course
No ratings yet
Mathematics-I - Bridge Course
30 pages
New For Old: Keep Your Bioflo 110 or 115 Vessels and Save On A New Bioflo 120
No ratings yet
New For Old: Keep Your Bioflo 110 or 115 Vessels and Save On A New Bioflo 120
2 pages
Group 8
No ratings yet
Group 8
31 pages
Corporate Finance What Is It?: Aswath Damodaran
No ratings yet
Corporate Finance What Is It?: Aswath Damodaran
14 pages
Advanced Channel Theory
100% (3)
Advanced Channel Theory
24 pages
Reinforcement Learning: A Practical Guide to Algorithms
From Everand
Reinforcement Learning: A Practical Guide to Algorithms
Trilokesh Khatri
No ratings yet
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet