0% found this document useful (0 votes)
5 views

case

Reinforcement Learning (RL) is a machine learning technique that allows agents to learn through trial and error by receiving feedback in the form of rewards and punishments. It has various applications across industries, including autonomous vehicles, data center cooling, traffic light control, healthcare, robotic surgeries, image processing, and robotics, showcasing its ability to optimize complex decision-making processes. Key algorithms like Q-learning and SARSA are commonly used in RL, which is distinguished from other learning methods by its focus on maximizing cumulative rewards in dynamic environments.

Uploaded by

Mohammed Tawheed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

case

Reinforcement Learning (RL) is a machine learning technique that allows agents to learn through trial and error by receiving feedback in the form of rewards and punishments. It has various applications across industries, including autonomous vehicles, data center cooling, traffic light control, healthcare, robotic surgeries, image processing, and robotics, showcasing its ability to optimize complex decision-making processes. Key algorithms like Q-learning and SARSA are commonly used in RL, which is distinguished from other learning methods by its focus on maximizing cumulative rewards in dynamic environments.

Uploaded by

Mohammed Tawheed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

BMS INSTITUTE OF TECHNOLOGY & MANAGEMENT

YELAHANKA, BANGALORE-64

DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING


CASE STUDY MATERIAL IA-3
Subject: Artificial Intelligence and Machine Learning
Code: 18CS71
Reinforcement Learning

Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an
interactive environment by trial and error using feedback from its own actions and experiences.
Though both supervised and reinforcement learning use mapping between input and output, unlike supervised
learning where the feedback provided to the agent is correct set of actions for performing a task, reinforcement
learning uses rewards and punishments as signals for positive and negative behavior.
As compared to unsupervised learning, reinforcement learning is different in terms of goals. While the goal in
unsupervised learning is to find similarities and differences between data points, in the case of reinforcement
learning the goal is to find a suitable action model that would maximize the total cumulative reward of the
agent. The figure below illustrates the action-reward feedback loop of a generic RL model.

➔ How to formulate a basic Reinforcement Learning problem?


Some key terms that describe the basic elements of an RL problem are:

• Environment — Physical world in which the agent operates


• State — Current situation of the agent
• Reward — Feedback from the environment
• Policy — Method to map agent’s state to actions
• Value — Future reward that an agent would receive by taking an action in a particular state
-> What are some of the most used Reinforcement Learning algorithms?
Q-learning and SARSA (State-Action-Reward-State-Action) are two commonly used model-free RL
algorithms. They differ in terms of their exploration strategies while their exploitation strategies are similar.
While Q-learning is an off-policy method in which the agent learns the value based on action a* derived from
the another policy, SARSA is an on-policy method where it learns the value based on its current action a
derived from its current policy. These two methods are simple to implement but lack generality as they do not
have the ability to estimates values for unseen states.
Different Reinforcement Learning applications and learn how they are shaping the future of AI across
all industries.
Autonomous cars:
Vehicle driving in an open context environment should be backed by the machine learning model trained with
all possible scenes and scenarios in the real world.
However—The collection of these varieties of scenes is a complicated problem to solve. How can we ensure
that a self-driving car has already learned all possible scenarios and safely masters every situation?
The answer to this is Reinforcement Learning.
Reinforcement Learning models are trained in a dynamic environment by learning a policy from its own
experiences following the principles of exploration and exploitation that minimize disruption to traffic. Self-
driving cars have many aspects to consider depending on which it makes optimal decisions.
Driving zones, traffic handling, maintaining the speed limit, avoiding collisions are significant factors.

Many simulation environments are available for testing Reinforcement Learning models for autonomous
vehicle technologies.
DeepTraffic is an open-source environment that combines the powers of Reinforcement Learning, Deep
Learning, and Computer Vision to build algorithms used for autonomous driving launched by MIT. It simulates
autonomous vehicles such as drones, cars, etc.

Carla is another excellent alternative that has been developed to support the development, training and
validation of autonomous driving systems. It replicates the urban layouts, buildings, vehicles to train the self-
driving cars in real-time simulated environments very close to reality.
Autonomous driving uses Reinforcement Learning with the help of these synthetic environments to target the
significant problems of Trajectory optimization and Dynamic pathing.
Reinforcement Learning agents are trained in these dynamic environments to optimize trajectories. The agents
learn motion planning, route changing, decision and position of parking and speed control, etc.
Datacenters cooling:
We are in this era where AI can help us tackle some of the world’s most challenging physical problems—
such as energy consumption. With the entire world at the edge of virtualization and cloud-based applications,
large-scale commercial and industrial systems like data centers have a large energy consumption to keep the
servers running.
Interesting Fact: Google data centers using machine learning algorithms have reduced the amount of energy
for cooling by up to 40 percent.

Researchers in this domain have proved that a few hours of exploration enables data-driven, model-based
learning.
This approach of a Reinforcement Learning agent with little or no prior knowledge can effectively and safely
regulate conditions on a server floor efficiently compared to the existing PID controllers. The data collected
by thousands of sensors within the data centers have attributes like temperatures, power, setpoints, etc.—that
are fed to be used to train the deep neural networks for datacentre cooling.
Due to the difficulty of directly solving this problem through conventional machine learning algorithms due
to the lack of varied datasets, deep Q-learning Network (DQN)- based methods are broadly used to conquer
this challenge.

Traffic light control


With the increase of urbanization and the increase in the number of cars per household, traffic congestion has
become an enormous problem, especially in metropolitan areas.
Reinforcement Learning is a trending data-driven approach for adaptive traffic signal control. These models
are trained with the objective of learning a policy using a value function that optimally controls the traffic light
based on the current status of the traffic.
The decision-making needs to be dynamic depending upon the arrival rate of traffic from different directions,
which ought to vary at different times of the day. The conventional way of handling traffic seems to be limited
due to this non-stationary behavior. Also, the policy π trained for an intersection with x lanes cannot be re-used
in an intersection with y lanes.
Reinforcement Learning (RL) is a trending approach due to its data-driven nature for adaptive traffic signal
control in complex urban traffic networks.
There are some limitations in applying deep Reinforcement Learning algorithms to transportation networks,
like an exploration-exploitation dilemma, multi-agent training schemes, continuous action spaces, signal
coordination, etc.

Healthcare:
Choosing medicines is hard. It is even more challenging when the patient has been on medication for years,
and no improvements have been seen.
Recent research shows that a patient suffering from chronic disease tries different medicines before giving
up. We must find the right treatments and map them to the right person.
The healthcare sector has always been an early adopter and a significant beneficiary of technological
advancements. This industry has seen a significant tilt towards Reinforcement Learning in the past few years,
especially in implementing dynamic treatment regimes (DTRs) for patients suffering from long-term
illnesses.
It has also found its application in automated medical diagnosis, health resource scheduling, drug discovery
and development, and health management.
Robotic surgeries
A powerful Reinforcement Learning application in decision-making is the use of surgical bots that can
minimize errors and any variations and will eventually help increase the surgeons' efficiency. One such robot
is Da Vinci, which allows surgeons to perform complex procedures with greater flexibility and control than
conventional approaches.
The critical features served are aiding surgeons with advanced instruments, translating hand movements of the
surgeons in real-time, and delivering a 3D high-definition view of the surgical area.

Image processing
Reinforcement Learning is data-intensive and is well-versed in interacting with a dynamic and initially
unknown environment. The current solutions offered in Image Processing by supervised and unsupervised
neural networks focus more on the classification of the objects identified. However, they do not acknowledge
the interdependency among different entities and the deviation from the human perception procedure.
It is used in the following subfields of Image Processing.
Object detection and Localization
The RL approach learns multiple searching policies by maximizing the long-term reward, starting with the
entire image as a proposal, allowing the agent to discover multiple objects sequentially.
It offers more diversity in search paths and can find multiple objects in a single feed and generate bounding
boxes or polygons. This paper on Active Object Localization with Deep Reinforcement Learning validates its
effectiveness.
Scene Understanding
Artificial vision systems based on deep convolutional neural networks consume large, labeled datasets to learn
functions that map the sequence of images to human-generated scene descriptions. Reinforcement Learning
offers rich and generalizable simulation engines for physical scene understanding.
This paper shows a new model based on pixel-wise rewards (pixelRL) for image processing. In pixelRL, an
agent is attached to each pixel responsible for changing the pixel value by taking action. It is an effective
learning method that significantly improves the performance by considering the future states of the own pixel
and neighbor pixels.
Reinforcement learning is one of the most modern machine learning technologies in which learning is carried
out through interaction with the environment. It is used in computer vision tasks like feature detection, image
segmentation, object recognition, and tracking.
Here are some other examples where Reinforcement Learning is used in image processing:-
• Robots equipped with visual sensors from which they learn the state of the surrounding environment
• Scanners to understand the text
• Image pre-processing and segmentation of medical images like CT Scans
• Traffic analysis and real-time road processing by video segmentation and frame-by-frame image
processing
• CCTV cameras for traffic and crowd analytics etc.
Robotics:
Robots operate in a highly dynamic and ever-changing environment, making it impossible to predict what will
happen next. Reinforcement Learning provides a considerable advantage in these scenarios to make the robots
robust enough and help acquire complex behaviors adaptively in different scenarios.
It aims to remove the need for time-consuming and tedious checks and replaces them with computer vision
systems ensuring higher levels of quality control on the production assembly line.
Robots are used in warehouse navigation mainly for part supplies, quality testing, packaging, automizing the
complete process in the environment where other humans, vehicles, and devices are also involved.
All these scenarios are complex to handle by the traditional machine learning paradigm. The robot should be
intelligent and responsive enough to walk through these complex environments. It is trained to have object
manipulation knowledge for grasping objects of different sizes and shapes depending upon the texture and
mass of the object embedded with the power of image processing and computer vision.
Let us quickly walk through some of the use-cases in this field of robotics that Reinforcement Learning offers
solutions for.
Product assembly
Computer vision is used by multiple manufacturers to help improve their product assembly process and to
completely automate this and remove the manual intervention from this entire flow. One central area in the
product assembly is object detection and object tracking.
Defect Inspection
A deep Reinforcement learning model is trained using multimodal data to easily identify missing pieces, dents,
cracks, scratches, and overall damage, with the images spanning millions of data points.
Using V7’s software, you can train object detection, instance segmentation, and image classification models to
spot defects and anomalies.
Inventory management
The inventory management in big companies and warehouses has become automated with the inventions in
the field of computer vision to track stock in real-time. Deep reinforcement learning agents can locate empty
containers, and ensure that restocking is fully optimised.

You might also like