UNIT V reinforcement learning
UNIT V reinforcement learning
Marketing personalization
Optimization challenges
Financial predictions
For instance, a child may discover that they receive parental praise when they
help a sibling or clean but receive negative reactions when they throw toys or yell.
Soon, the child learns which combination of activities results in the end reward.
Key concepts
Algorithm basics
Model-based RL
1. It takes actions within the environment and notes the new state and reward
value
2. It associates the action-state transition with the reward value.
Once the model is complete, the agent simulates action sequences based on the
probability of optimal cumulative rewards. It then further assigns values to the
action sequences themselves. The agent thus develops different strategies within
the environment to achieve the desired end goal.
Example
Model-free RL
Model-free RL is best to use when the environment is large, complex, and not
easily describable. It’s also ideal when the environment is unknown and changing,
and environment-based testing does not come with significant downsides.
The agent doesn’t build an internal model of the environment and its dynamics.
Instead, it uses a trial-and-error approach within the environment. It scores and
notes state-action pairs—and sequences of state-action pairs—to develop a
policy.
Example
Consider a self-driving car that needs to navigate city traffic. Roads, traffic
patterns, pedestrian behavior, and countless other factors can make the
environment highly dynamic and complex. AI teams train the vehicle in a
simulated environment in the initial stages. The vehicle takes actions based on its
current state and receives rewards or penalties.
Over time, by driving millions of miles in different virtual scenarios, the vehicle
learns which actions are best for each state without explicitly modeling the entire
traffic dynamics. When introduced in the real world, the vehicle uses the learned
policy but continues to refine it with new data.
In supervised learning, you define both the input and the expected associated
output. For instance, you can provide a set of images labeled dogs or cats, and the
algorithm is then expected to identify a new animal image as a dog or cat.
In contrast, RL has a well-defined end goal in the form of a desired result but no
supervisor to label associated data in advance. During training, instead of trying to
map inputs with known outputs, it maps inputs with possible outcomes. By
rewarding desired behaviors, you give weightage to the best outcomes.
Reinforcement learning vs. unsupervised learning
While reinforcement learning (RL) applications can potentially change the world,
it may not be easy to deploy these algorithms.
Practicality
Interpretability
Like any field of science, data science also looks at conclusive research and
findings to establish standards and procedures. Data scientists prefer knowing
how a specific conclusion was reached for provability and replication.
With complex RL algorithms, the reasons why a particular sequence of steps was
taken may be difficult to ascertain. Which actions in a sequence were the ones
that led to the optimal end result? This can be difficult to deduce, which causes
implementation challenges.
With Amazon SageMaker, developers and data scientists can quickly and easily
develop scalable RL models. Combine a deep learning framework (like TensorFlow
or Apache MXNet), an RL toolkit (like RL Coach or RLlib), and an environment to
mimic a real-world scenario. You can use it to create and test your model.
With AWS RoboMaker, developers can run, scale, and automate simulation with
RL algorithms for robotics without any infrastructure requirements.
Get hands-on experience with AWS DeepRacer, the fully autonomous 1/18th
scale race car. It boasts a fully configured cloud environment that you can use to
train your RL models and neural network configurations.