0% found this document useful (0 votes)
15 views9 pages

Improving Deep Reinforcement L

Uploaded by

yusupovazam74
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views9 pages

Improving Deep Reinforcement L

Uploaded by

yusupovazam74
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 14, No. 11, 2023

Improving Deep Reinforcement Learning Training


Convergence using Fuzzy Logic for Autonomous
Mobile Robot Navigation
Abdurrahman bin Kamarulariffin, Azhar bin Mohd Ibrahim*, Alala Bahamid
Department of Mechatronics Engineering, International Islamic University Malaysia, Kuala Lumpur, Malaysia

Abstract—Autonomous robotic navigation has become including supervised, unsupervised, and reinforcement
hotspot research, particularly in complex environments, where learning, have played a pivotal role in robotics research,
inefficient exploration can lead to inefficient navigation. Previous enabling learning, adaptation, and effective detection and
approaches often had a wide range of assumptions and prior classification. Deep reinforcement learning (DRL), a fusion of
knowledge. Adaptations of machine learning (ML) approaches, RL and deep neural networks, has emerged as a powerful
especially deep learning, play a vital role in the applications of approach for decision-making tasks involving high-
navigation, detection, and prediction about robotic analysis. dimensional inputs [5], [6].This article aims to delve into the
Further development is needed due to the fast growth of urban application of RL techniques, specifically Q-learning and deep
megacities. The main problem of training convergence time in
Q-networks, for mobile robot path planning. By seamlessly
deep reinforcement learning (DRL) for mobile robot navigation
refers to the amount of time it takes for the agent to learn an
integrating these techniques with widely used frameworks
optimal policy through trial and error and is caused by the need such as ROS, Gazebo, and OpenAI, a robust and autonomous
to collect a large amount of data and computational demands of navigation system can be developed, leading to improved
training deep neural networks. Meanwhile, the assumption of performance, optimized routes, and efficient obstacle
reward in DRL for navigation is problematic as it can be difficult avoidance in complex environments. The evaluation of this
or impossible to define a clear reward function in real-world system will undoubtedly contribute to the advancement of
scenarios, making it challenging to train the agent to navigate autonomous robotics. The trial-and-error learning process
effectively. This paper proposes a neuro-symbolic approach that inherent in RL offers immense potential for building human-
combine the strengths of deep reinforcement learning and fuzzy level agents and has been extensively explored in various
logic to address the challenges of deep reinforcement learning for domains [7] [8]. Deep learning (DL), characterized by its
mobile robot navigation in terms of training time and the ability to extract meaningful patterns and classifications from
assumption of reward by incorporating symbolic representations raw sensory data through deep neural networks, has
to guide the learning process, and inferring the underlying revolutionized the field of machine learning. When combined
objectives of the task which is expected to reduce the training with RL, in the form of DRL, this integration has shown
convergence time. remarkable success in tackling challenges associated with
sequential decision-making [9], [10]. Notably, DRL excels in
Keywords—Autonomous navigation; deep reinforcement
scenarios involving a vast number of states, making it an ideal
learning; mobile robots; neuro-symbolic; Fuzzy Logic
candidate for addressing navigation complexities.
I. INTRODUCTION Nevertheless, achieving optimal navigation remains an
ongoing challenge, necessitating further optimization and
Advancements in robot navigation have spurred the effective handling of high-dimensional data. Reinforcement
development of algorithms that leverage basic rules and learning methods offer valuable approaches for learning and
environmental mapping to optimize path planning. Rule-based planning navigation, empowering agents to interact with their
methods, such as Fuzzy logic and Neuro-fuzzy techniques, environment and make autonomous decisions. Various studies
have been extensively explored to enhance navigation have proposed agent-based DRL approaches for navigation,
decisions and tracking performance under uncertain conditions successfully simulating diverse scenarios without the need for
[1], [2]. While these methods offer valuable insights, they intricate rule-based systems or laborious parameter tuning.
often require extensive justification and may not fully meet However, there is still room for improvement in terms of
the demands for efficient and accurate path planning. achieving the shortest and fastest routes. To enhance
To address this challenge, researchers have turned to bio- navigation performance and optimize evacuation paths,
inspired approaches, such as genetic algorithms and swarm researchers have explored techniques such as look-ahead
optimization, which draw inspiration from biological behavior crowded estimation and Q-learning, which have demonstrated
and incorporate prior knowledge to simulate human cognitive superior results compared to other RL algorithms [6].
processes [3], [4]. One particularly promising area in Additionally, CNN-based robot-assisted evacuation systems
navigation research is reinforcement learning (RL), which have been developed to maximize pedestrian outflow by
enables autonomous agents to learn and make sequential extracting specific features from high-dimensional images.
decisions in complex environments. Machine learning models, Furthermore, iterative, and incremental learning strategies,

935 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 11, 2023

like vector quantization with Q learning (VQQL), have been reinforcement learning algorithms. The main focus of this
proposed to expedite the learning process and optimize project is to apply reinforcement learning techniques to mobile
navigation by gradually improving interactions among agents robot path planning. Unlike traditional approaches that rely on
[11], [12]. These advancements in DRL continue to show SLAM or mapping techniques, the project aims to enable the
great promise in addressing the speed of agent learning and robot to learn the optimal path through a reward and
optimizing navigation processes. In the realm of task planning, punishment system. By using reinforcement learning
the ability to find a series of steps that transform initial algorithms such as Q-learning, SARSA, and DQN, the robot
conditions into desired states is crucial. Task planning can learn to navigate its environment efficiently and safely. To
becomes especially important when atomic actions alone facilitate communication between the simulation and the
cannot accomplish a task. Neuro-symbolic task planning has robot, ROS integration is implemented. This integration
emerged as an effective approach, allowing for the allows the robot to receive sensor data, send control
incorporation of restrictions, guidelines, and requirements in commands, and interact with the simulation environment
each activity. However, traditional task planners often rely on seamlessly. By leveraging the capabilities of ROS, the
detailed hand-coded explanations, limiting their scalability. To reinforcement learning algorithms can effectively interface
overcome this limitation, a combination of deep learning and with the robot's actions and observations [15]–[17].The
symbolic planning, known as a neuro-symbolic approach, has reinforcement learning algorithms receive feedback through a
shown potential by leveraging visual information instead of reward and punishment system based on the robot's
hand-coded explanations [3], [13], [14]. However, collecting performance in reaching the goal while avoiding collisions
image data for neuro-symbolic models in robotic applications and obstacles. The training aims to optimize the robot's
is a labor-intensive process that involves steps such as creating decision-making and path planning abilities. Performance
problem instances, defining initial and goal states, operating analysis is conducted to assess the effectiveness of the trained
robots, and capturing scene images. The challenges associated reinforcement learning models. Metrics such as the time taken
with data collection have hindered the widespread adoption of to reach the goal, collision occurrences with static and
neuro-symbolic models in robot task planning. Neuro- dynamic obstacles, and the number of pathing alterations are
symbolic models excel in reasoning, providing explanations measured and analyzed. These metrics provide insights into
and manipulating complex data structures. Conversely, the path planning efficiency, collision avoidance capabilities,
numerical models, such as neuronal models, are preferred for and adaptability of the reinforcement learning approach. In
pattern recognition due to their generalization and learning conclusion, the methodology of this project involves using
abilities. A unified strategy proposes that the characteristic simulation tools (Gazebo, ROS, and OpenAI Gym) to evaluate
properties of symbolic artificial intelligence can emerge from the application of reinforcement learning algorithms (Q-
distributed local computations performed by neuronal models, learning, SARSA, and DQN) in mobile robot path planning.
spanning cognitive functions from the neuron level to the The integration of ROS ensures seamless communication
structural level of the nervous system. By integrating neuro- between the simulation environment and the robot, while the
symbolic and numerical models, a comprehensive framework OpenAI Gym environment provides a standardized framework
can be established to leverage the strengths of both approaches for training and evaluating the algorithms. The methodology
in robotics. This integrated approach holds the potential to enables rigorous testing and analysis of the robot's
enable efficient task planning, grounding symbols in performance in terms of path planning, collision avoidance,
perceptual information, and enhancing pattern recognition and adaptability to dynamic environments. This following
capabilities. Ultimately, this integration could advance subsection discusses the mathematical model of Q-learning
cognitive functions and pave the way for the creation of more with fuzzy logic approach theory towards navigation
sophisticated robotic systems. problems, and experimentation setup that is used in this work.
This paper is organized as follows. Section II presents the In the context of agents utilizing visual SLAM, traditional
proposed method which integrates the reinforcement learning algorithms are still employed for final path planning on the
(RL) and fuzzy logic for mobile robot path planning, aiming map. However, RL offers numerous applications, and in
to create a robust autonomous navigation system that mobile robot navigation, it can replace the path planning part.
optimizes routes and efficiently avoids obstacles in complex The RL model, after training, can effectively make decisions,
environments. Section III illustrates the simulation set-up, enabling the agent to select its path from one location to
while Section IV provides an evaluation of the training another based on interactions with the environment [18], [19].
process of the policy optimization. Finally, Section V presents The environment is abstracted into a grid map representation,
the evaluation and verification of the developed policy based with each position on the map corresponding to an agent's
on the proposed method, followed by the conclusion. state. Transitioning from one state to another reflects the
actual movement of the entity, while the agent's behavioral
II. METHODS decision-making is represented by its state choice at each step
The methodology for this project involves the utilization of in the RL model. The reward value plays a pivotal role in
simulation tools, namely Gazebo, ROS (Robot Operating guiding path selection. Early Q-learning recorded reward
System), and OpenAI Gym. Gazebo provides a realistic values between position states in a table, guiding the next state
environment for simulating the mobile robot path planning selection. As depth-enhanced learning emerges, the DL model
system, while ROS serves as a comprehensive framework for is integrated, replacing the table with a neural network, which
controlling the robot and interfacing with its sensors and provides corresponding decision results by inputting the state
actuators. OpenAI Gym is used to train and evaluate the [20], [21]. The weighting parameters in the neural network

936 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 11, 2023

influence the choice of the next state. On the other hand, when action a. 𝑄π is also known as action value function or Q-
incorporating fuzzy logic into the RL model, the decision- Learning algorithm.
making process becomes more nuanced and interpretable.
Fuzzy logic allows for handling uncertainties and imprecise
𝑄 (𝑠 𝑎) *𝑅 𝑠 𝑠𝑎 𝑎+ *∑ 𝛾 𝑟 𝑠
information, enabling the agent to reason with vague input and
output values. By combining RL and fuzzy logic, the agent
can make more human-like decisions, considering both the 𝑠 𝑎 𝑎+
environment's precise measurements and the agent's subjective 𝑄 (𝑠 𝑎) ,𝑟 𝛾𝑟 𝛾 𝑟 𝑠 𝑎-
understanding of the surroundings. This fusion can enhance
path planning in complex and dynamic environments by Reinforcement learning is about finding an optimal policy
considering various factors and optimizing the decision- that achieves a lot of reward over the long-term. A policy 𝜋 is
making process. defined to be better than or equal to a policy 𝜋′ if its
expected return is greater than or equal to that of 𝜋′ for all
A. Q-Learning Algorithm
states.
RL defines any decision maker as an agent and everything
outside the agent as the environment. The agent aims to 𝜋 𝜋 𝑎 (𝑠) 𝑉 (𝑠) 𝑟 𝑎 𝑠𝑡𝑎𝑡 𝑠
maximize the accumulated reward and obtains a reward value
Optimal Value Functions must satisfy the below
as a feedback signal for training through interaction with the
conditions:
environment. Beyond the agent (who perform actions) and the
environment (which made of states), there are three major 𝑉 (𝑠) = 𝑚𝑎𝑥 𝑉. (𝑠), for all states
elements of a reinforcement learning system:
𝑄* (𝑠, 𝑎) = 𝑚𝑎𝑥𝑄. (𝑠, 𝑎), for all states and actions
 Policy 𝝅: It is to formalize an agent's decision and
determine the agent’s behaviour at a given time. A We get the optimal policy by solving 𝑄 (𝑠, 𝑎) to find the
policy 𝜋 is a function that maps between the perceived action that gives the most optimal state-action value function,
state and the action is taken from that state. 𝜋 (𝑠) 𝑎𝑟 𝑄 (𝑠 𝑎)
 Reward 𝒓: The agent receives feedback known as
rewards, 𝑟𝑡+1 for each action at time step t, indicating the Q-Learning algorithm is an off-policy value-based RL
inherent desirability of that state. The main goal of the algorithm and very effective under unknown environment [6],
agent is to maximize the cumulative reward over time. [21], [22]. The value of a state-action can be decomposed into
The total sum of the rewards (return) is: immediate reward plus the value of successor state-
action𝑄π(𝑠 𝑎 ) with a discount factor (𝛾).
𝑅𝑡 = 𝑟𝑡+1 + 𝑟𝑡+2 + 𝑟𝑡+3+... 𝑟𝑇, 𝑇: final time step
𝑄 (𝑠 𝑎) ,𝑟 𝛾𝑄 (𝑠 𝑎 ) 𝑠 𝑎-
The agent-environment interaction breaks into episodes
where each episode ends in a state called the terminal state, And according to the Bellman optimality, the optimal
followed by a reset to a standard starting state. In some cases, value function can be expressed as:
the episodes continue where final time step would be 𝑇 = ∞, 𝑄 (𝑠 𝑎) ,𝑟 𝛾 𝑄 (𝑠 𝑎 ) 𝑠 𝑎-
and the return become infinite. So, a discount factor 𝛾 is
introduced. The discounted return is defined as: Update the value function iteratively to obtain optimal
value function,
Rt = rt+1 + 𝛾 rt+2 +𝛾 rt+3 + … =∑ 𝛾 𝑟
𝑄(𝑠, 𝑎) ← 𝑄(𝑠, 𝑎) + 𝛼. [𝑟 + 𝛾 max 𝑎 𝑄(𝑠 𝑎 ) - 𝑄 (𝑠, 𝑎)],
0 < 𝛾< 1
Rewards can be sparse (after a long sequence of actions), 𝛼: learning rate
every time step, or at the end of the episodes. 𝑄 (𝑠, 𝑎) converges to 𝑄* (𝑠, 𝑎) as 𝑡 → ∞.
 Value function: Most of the RL algorithms are based on Algorithm 1 illustrates the overall framework of the
estimating value functions (states or state action). Value proposed Q-learning to generate the shortest route for
function is used to estimate how good a certain state is navigation mapping.
for the agent to be in (state value function), or how
good a certain action is to perform in a specific state Algorithm 1. Overall framework of the Q-Learning
(state-action value function). The state value functions Initialize Q (s, a) arbitrarily
under the policy 𝜋, denoted 𝑉𝜋(𝑠), is the expected repeat for each episode.
return, Initialize s.
for each step of the episode do
Choose a from s using € greedy policy.
𝑉 (𝑠) *𝑅 𝑠 𝑠+ {∑ 𝛾 𝑟 𝑠 𝑠} Do action a and observer r and s’
𝑄(𝑠, 𝑎) ← 𝑄(𝑠, 𝑎) + 𝛼. [𝑟 + 𝛾 max 𝑎′ 𝑄 (𝑠 𝑎 ) - 𝑄 (𝑠, 𝑎)]
s ← s’
The state-action value function under policy 𝜋, denoted 𝑄π
until s is terminal
(𝑠, 𝑎), as the expected accumulated return from state s and

937 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 11, 2023

B. Fuzzified Reward Function


The fuzzy logic-based approach has been adopted to
enhance the decision-making [23], [24] process of the
autonomous agent navigating through a maze. This work
incorporated symbolic representation by adopting fuzzy logic
into the reward function to guide the learning process and
address the challenge of the computational demands of
training. The proposed fuzzy reward function has three input
variables and one output. The input variables are distance to
obstacle (near, medium, far), distance to target (near, medium,
far), and visual range (off target, medium, on target). The
output variable is the reward points (see Fig. 4). Fig. 2. Distance to target.
By employing three membership levels for each input
variable (See Fig. 1, 2 and 3), a comprehensive set of 27 fuzzy
rules has been devised (see Table I), effectively covering all
possible combinations of the environment states. These rules
dictate the agent's rewards, which are categorized as
punishment (least), medium, and reward (most). By leveraging
the flexibility and adaptability of fuzzy logic, the agent is
guided through its learning process with a more nuanced and
context-aware reward system, allowing it to make more
informed decisions in a variety of maze scenarios and
significantly improving its learning efficiency.
Table I presents a comprehensive and systematic overview
Fig. 3. Distance to obstacle.
of the fuzzy logic rules governing the agent's decision-making
process in the maze navigation task. The table showcases the
various combinations of input possibilities, encompassing
distance with obstacles, distance with target location, and
visual range, each categorized into appropriate linguistic
variables (e.g., near, medium, far; off target, medium, on
target). For every unique combination, the corresponding
fuzzy logic "If/Then" rules are defined, determining the
agent's rewards as punish, medium, or reward. Table I
highlights the agent's adaptability and versatility through the
vast array of rules, capturing the intricacies of different maze
scenarios. With 27 distinct rules, the fuzzy logic system can
precisely respond to the agent's real-time observations,
guiding it towards optimal actions that lead to successful
navigation. This rich and nuanced reward system empowers Fig. 4. Output fuzzy: reward points.
the agent to effectively learn from its experiences, enabling it
to avoid obstacles, approach the target, and dynamically adjust TABLE I. IF/THEN FUZZY LOGIC RULES FOR REWARDS
its behavior based on varying visual cues. Consequently, the
Visual Obstacle Obstacle
"If/Then Analysis Fuzzy Logic Rules Possibilities" table Distance
Range (Near) (Medium)
Obstacle (Far)
serves as a powerful tool in understanding and implementing
the complex decision-making process of the agent, fostering Near High Positive High Positive High Positive
efficient learning and successful maze navigation. Target
Medium Mid Positive Mid Positive Mid Positive
(Near)
Far Low Positive Low Positive Low Positive

Near High Middle High Middle High Middle


Target
Medium Low Middle Middle Low Middle
(Medium)
Far Low Middle Low Middle Low Middle

Near Low Negative Low Negative Low Negative


Target
Medium Mid Negative Mid Negative Mid Negative
(Far)

Fig. 1. Visual range. Far High Negative High Negative High Negative

938 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 11, 2023

III. SIMULATION SETUP


To thoroughly evaluate the enhanced performance of the
proposed method, this study embarked on constructing a
detailed 3-D raster map model and crafting two distinct
simulation maps, meticulously illustrated in Fig. 5(a) and Fig.
5(b). In this simulated environment, dynamic obstacles,
symbolized by white cylinders, were strategically placed,
posing challenges to a TurtleBot simulation machine car,
visually presented in black. The machine car's laser range,
portrayed in blue, scanned the surroundings as it navigated
through the intricate maze. The red square pinpointed a target
training point, emphasizing the complexity of the assigned (a)
tasks.
Fig. 5(a) specifically delves into a scenario where the
TurtleBot is tasked with locating a singular target location
represented by the red square amid a set of four cylindrical
obstacles. This intricate setting simulates real-world challenges
where the robot must efficiently identify and navigate towards
a specific point among various hindrances.
Fig. 5(b) presents a more intricate scenario where the
TurtleBot is assigned the mission of identifying two specific
cylinders as target locations within a maze of block obstacles.
This heightened complexity mirrors scenarios where the robot (b)
must discern and navigate through a maze-like environment to Fig. 5. (a) and (b) Showcase maze circuits in the Gazebo simulation
pinpoint multiple objectives. This detailed simulation environment to test and evaluate the robot's path planning capabilities.
environment allows for a comprehensive assessment of the
proposed method's effectiveness in handling diverse and
intricate navigation tasks.
The computer used for the simulations was equipped with
a 4-core Intel i5 7400 CPU running at 3.00 GHz, 8 GB of
RAM, running on the Ubuntu 16.04 operating system, and
utilizing the ROS kinetic system. The article leveraged certain
parameters for the 3-D environment model, which were
sourced from the ROS open-source community. The
corresponding parameter settings are as follows:
Fig. 6. The Turtlebot2 robot equipped with a lidar sensor.
rgoal = 100, robstacle = -100, ɛ = -100, σx = σy = 1
IV. TRAINING PERFORMANCE OF THE PROPOSED METHOD
γgoal = 0.9, robstacle = 0.9, rcritical = 0.8, rotherwise = 0.75
The visual representations in Fig. 5(a) and Fig. 5(b) aimed
In the context of this work: to illustrate the improved method's performance across
 " r " signifies a single reward. different simulation maps. While these figures may not
directly demonstrate capabilities, they serve as visual aids to
 σx and σy represent the obstacle center coordinates. showcase the distinct scenarios and complexities encountered
by the agent in each environment. It's essential to
 γ (gamma) serves as the discount factor, influencing the acknowledge that the term "validation" in the context of
importance of future rewards. comparing results from Reinforcement Learning (RL) and
In this context, r represents a reward, with rgoal and robstacle Fuzzy Logic (FL) approaches refers to a qualitative
being specific awards assigned to reaching the goal and assessment rather than a formal validation process.
encountering obstacles, respectively. The term γ serves as the
In Fig. 5(a), the experiment showcased the performance of
discount factor, influencing the importance of future rewards
the improved method on the first simulation map, providing
in the context of reinforcement learning. The parameters ε, σx,
insights into how the agent navigates a specific environment.
and and σy represent the grid center coordinates, contributing
On the other hand, Fig. 5(b) illustrated the capabilities of the
to the spatial representation of the environment and the
improved method on the second simulation map, highlighting
localization of obstacles.
its adaptability to different scenarios. By comparing the results
Fig. 6 depicts The Turtlebot2 which is a popular mobile from RL and FL approaches, the article qualitatively validated
robot platform widely used in robotics research and the effectiveness of the enhanced technique in the complex 3-
applications. D environment. These visual representations offered a
valuable qualitative assessment, helping to understand the

939 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 11, 2023

nuanced behaviors of the agent, its path planning strategies, employed in our model. The stability in parameter learning,
and obstacle avoidance mechanisms in diverse settings. The particularly evident in the FL example, facilitates faster
improved method consistently demonstrated superior convergence to the optimal values. This not only enhances the
performance, efficiently finding optimal routes to reach the efficiency of path planning but also showcases the model's
target point while navigating around obstacles effectively. The robustness in navigating complex environments.
simulations offered valuable insights into the agent's behavior,
path planning, and obstacle avoidance, elucidating
fundamental aspects of autonomous robot navigation. The
superior performance of the enhanced method was evident in
its ability to navigate efficiently, choosing optimal routes
while circumventing obstacles effectively.
Furthermore, the deliberate choice of Fig. 5(b) as a test run
was made to rigorously assess the proposed method's
robustness in scenarios with increased complexity and
multiple target points. This strategic selection adds an
additional layer of validation, demonstrating the algorithm's
efficacy in handling intricate navigation tasks.

Fig. 8. Cumulative Rewards based on the proposed method vs. pure RL


algorithm.

Fig. 7. Q-value comparison.

As shown in Fig. 7, in the Q-value map, the Fuzzy Logic


(FL) example has a faster convergence speed, especially in 50
K training sessions, and after approximating 25 K trainings,
the Q value of the FL algorithm is still richly transformed,
showing the FL is less likely to fall into local optimum. This
Fig. 9. Loss comparison.
segment of our analysis offers a glimpse into the noteworthy
performance attributes of the FL local search approach. As
depicted in Fig. 8, which illustrates the bonus map, the FL
example stands out due to its utilization of a multiple reward
mechanism and a loop memory network. This distinction is
most evident in the greater reward values attributed to the FL
path, which correspondingly signify a reduced occurrence of
repeated errors. In essence, a higher reward value in this
context indicates a superior capacity to identify and follow an
optimal path with fewer deviations. Turning our attention to
Fig. 9, we delve into the loss diagram. Here, we observe a
compelling trend: the loss associated with the FL example is
consistently lower compared to that of the RL example. This
finding is particularly significant, as it underscores the model's
proficiency in minimizing error during the learning process. A
lower loss value reflects a more accurate prediction and action
selection by the model, emphasizing the effectiveness of the Fig. 10. Loss comparative enlarged view.
FL approach in optimizing path planning. To provide a closer
examination of this phenomenon, Fig. 10 offers a magnified
view of the loss diagram from Fig. 9. This detailed perspective
reaffirms the rationality and effectiveness of the loss function

940 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 11, 2023

V. EVALUATION AND VERIFICATION OF THE DEVELOPED TABLE III. THE EXAMPLE OF REINFORCEMENT LEARNING WITH FUZZY
LOGIC ALGORITHM
POLICY
This work conducted three comprehensive tests to Length/Time
Examples
rigorously evaluate the performance of the proposed method. Test 1 Test 2 Test 3 Mean
Each simulation aimed to assess the effectiveness of the First Round 8.8 m/65 s 8.6 m/53 s 8.7 m/56 s 8.7 m/58 s
respective algorithm in enabling the mobile robot to learn and
navigate its environment autonomously. Second Round 8.8 m/63 s 8.9 m/69 s 8.7 m/60 s 8.8 m/64 s
Third Round 8.7 m/66 s 8.7 m/70 s 8.5 m/68 s 8.6 m/68 s
To verify the practical performance of the model, physical
tests were conducted on the robotic machine based on the Fourth Round 8.7 m/73 s 8.8 m/65 s 8.6 m/66 s 8.7 m/68 s
robot operating system (ROS). The TurtleBot machine car was Fifth Round 8.4 m/71 s 8.7 m/73 s 8.4 m/69 s 8.5 m/69 s
employed for these experiments to ensure consistency and
reliability. The test environment comprised an obstacle zone The overarching analysis of these comprehensive tests
constructed in the laboratory terrain, with the ideal distance reveals that the Fuzzy Logic approach consistently
from the starting point to the target point set at 8.3 meters. Fig. outperforms other methods in terms of both time consumption
5(a) and Fig. 5(b) depict the laser environment after its and path length, particularly in the scenario represented in Fig.
construction. It's important to note that the use of TurtleBot in 5(b). It consistently finds shorter paths in less time,
these experiments is not meant to directly reduce errors in the highlighting its superior efficiency. Additionally, the Fuzzy
algorithm. Instead, TurtleBot provides a standardized platform Logic method demonstrates remarkable stability in locating
for testing, ensuring consistency and reliability across multiple multiple paths, underscoring its prowess in complex
trials. The choice of TurtleBot contributes to the creation of a environment path-finding.
controlled and reproducible testing environment, minimizing
potential errors arising from variations in hardware and However, it's important to acknowledge certain limitations
environmental conditions. This emphasis on error reduction associated with the Fuzzy Logic-based approach. While it
pertains to the establishment of a robust and reliable basis for excels in various aspects of path planning, it may face
evaluating the proposed method's performance in real-world challenges when confronted with highly dynamic and rapidly
scenarios rather than directly mitigating errors in the algorithm changing environments. Fuzzy Logic, being rule-based and
or system. Following the integration of the trained model into reliant on predetermined membership functions, might
the navigation function package, a meticulous series of struggle to adapt swiftly to unpredictable obstacles or
verification tests was carried out to assess its performance. situations. Additionally, its performance could be impacted by
Each testing round consisted of five restarts, with three the complexity and size of the environment, as processing a
experiments conducted within each round to ensure the vast amount of data can introduce computational overhead.
robustness of the evaluation process. For instance, in the first Therefore, while the Fuzzy Logic approach proves highly
round of experiments, the robot's performance was tested effective in many scenarios, it may not be the optimal choice
through three individual trials: the first trial covered a distance for applications demanding real-time adaptability in extremely
of 8.8 meters in 77 seconds; the second trial covered 9.0 dynamic settings. Exploring its boundaries and considering
meters in 78 seconds, and the third trial spanned 8.6 meters in alternative approaches for such specific scenarios remains a
73 seconds. By calculating the mean of these results, we valuable avenue for future research and development.
obtained an average performance of 8.8 meters covered in 76
seconds. VI. CONCLUSION
Table III presents the detailed results of the first round, This research introduced a novel navigation method based
where the robot covered distances of 8.8 meters in 65 seconds, on Q- learning and fuzzy logic for efficient path planning of
8.6 meters in 53 seconds, and 8.7 meters in 56 seconds during agents in diverse environments. The proposed approach
the three tests. The calculated mean for Table II was 8.7 combines the strengths of deep learning with symbolic
meters covered in 58 seconds. Notably, Table II exhibited a reasoning, specifically Fuzzy Logic, to overcome the
higher learning rate compared to Table II, indicating improved challenges faced by traditional DRL methods in mobile robot
efficiency in path planning and execution. navigation, reducing the global path search time by 6-9% and
shortening the average path search length by 4-10% compared
TABLE II. THE EXAMPLE OF RL ALGORITHM to pure Q-learning. The incorporation of symbolic
representations in the learning process leads to reduced
length/time training convergence time and more practical path planning
Examples
Test 1 Test 2 Test 3 Mean results. The experimental results demonstrate its efficiency
First Round 8.8 m/77 s 9.0 m/78 s 8.6 m/73 s 8.8 m/76 s and effectiveness in complex environments, making it a
Second promising solution for autonomous robotic navigation in
9.3 m/86 s 9.1 m/83 s 8.9 m/74 s 9.1 m/81 s
Round urban megacities. As future work, the effectiveness of new RL
Third Round 8.9 m/68 s 8.6 m/63 s 9.2 m/70 s 8.9 m/67 s algorithms will be explored in even more challenging
Fourth environments, further advancing the field of autonomous
9.1 m/78 s 8.9 m/73 s 8.7 m/71 s 8.9 m/74 s
Round
robotic navigation.
Fifth Round 9.2 m/80 s 9.2 m/77 s 8.6 m/77 s 9.0 m/78 s

941 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 11, 2023

REFERENCES [13] J. Priya Inala et al., “Neurosymbolic Transformers for Multi-Agent


Communication.” [Online]. Available: https://github.com/jinala/.
[1] U. Rakhman, J. Ahn, and C. Nam, “Fully automatic data collection for [14] P. Coraggio, M. De Gregorio, and M. Forastiere, “ROBOT
neuro-symbolic task planning for mobile robot navigation,” in NAVIGATION BASED ON NEUROSYMBOLIC REASONING
Conference Proceedings - IEEE International Conference on Systems, OVER LANDMARKS,” 2008. [Online]. Available:
Man and Cybernetics, Institute of Electrical and Electronics Engineers www.worldscientific.com.
Inc., 2021, pp. 450–455. doi: 10.1109/SMC52423.2021.9658822.
[15] M. Sokolov, R. Lavrenov, A. Gabdullin, I. Afanasyev, and E. Magid,
[2] A. Zhu and S. X. Yang, “Neurofuzzy-based approach to mobile robot “3D modelling and simulation of a crawler robot in ROS/Gazebo,” in
navigation in unknown environments,” IEEE Transactions on Systems, ACM International Conference Proceeding Series, Association for
Man and Cybernetics Part C: Applications and Reviews, vol. 37, no. 4, Computing Machinery, Dec. 2016, pp. 61–65. doi:
pp. 610–621, Jul. 2007, doi: 10.1109/TSMCC.2007.897499. 10.1145/3029610.3029641.
[3] P. Coraggio and M. De Gregorio, “A Neurosymbolic Hybrid Approach [16] K. Takaya, T. Asai, V. Kroumov, and F. Smarandache, Simulation
for Landmark Recognition and Robot Localization.” Environment for Mobile Robots Testing Using ROS and Gazebo. 2016.
[4] O. Castillo, R. Martínez-Marroquín, P. Melin, F. Valdez, and J. Soria, doi: 10.0/Linux-x86_64.
“Comparative study of bio-inspired algorithms applied to the [17] K. Sukvichai, K. Wongsuwan, N. Kaewnark, and P. Wisanuvej,
optimization of type-1 and type-2 fuzzy controllers for an autonomous “Implementation of Visual Odometry Estimation for Underwater Robot
mobile robot,” Inf Sci (N Y), vol. 192, pp. 19–38, Jun. 2012, doi: on ROS by using RaspberryPi 2.”
10.1016/j.ins.2010.02.022.
[18] N. Botteghi, B. Sirmacek, K. A. A. Mustafa, M. Poel, and S. Stramigioli,
[5] Y. Li, “Deep Reinforcement Learning: An Overview,” Jan. 2017, “On Reward Shaping for Mobile Robot Navigation: A Reinforcement
[Online]. Available: http://arxiv.org/abs/1701.07274. Learning and SLAM Based Approach,” Feb. 2020, [Online]. Available:
[6] H. Van Hasselt, A. Guez, and D. Silver, “Deep Reinforcement Learning http://arxiv.org/abs/2002.04109.
with Double Q-Learning.” [Online]. Available: www.aaai.org. [19] A. V. Bernstein, E. V. Burnaev, and O. N. Kachan, “Reinforcement
[7] K. Zhang, F. Niroui, M. Ficocelli, and G. Nejat, “Robot Navigation of learning for computer vision and robot navigation,” in Lecture Notes in
Environments with Unknown Rough Terrain Using deep Reinforcement Computer Science (including subseries Lecture Notes in Artificial
Learning,” in 2018 IEEE International Symposium on Safety, Security, Intelligence and Lecture Notes in Bioinformatics), Springer Verlag,
and Rescue Robotics, SSRR 2018, Institute of Electrical and Electronics 2018, pp. 258–272. doi: 10.1007/978-3-319-96133-0_20.
Engineers Inc., Sep. 2018. doi: 10.1109/SSRR.2018.8468643. [20] A. Newman, G. Yang, B. Wang, D. Arnold, and J. Saniie, “Embedded
[8] N. Altuntas, E. Imal, N. Emanet, and C. N. Öztürk, “Reinforcement Mobile ROS Platform for SLAM Application with RGB-D Cameras,” in
learning-based mobile robot navigation,” Turkish Journal of Electrical IEEE International Conference on Electro Information Technology,
Engineering and Computer Sciences, vol. 24, no. 3, pp. 1747–1767, IEEE Computer Society, Jul. 2020, pp. 449–453. doi:
2016, doi: 10.3906/elk-1311-129. 10.1109/EIT48999.2020.9208310.
[9] C. Pérez-D’Arpino, C. Liu, P. Goebel, R. Martín-Martín, and S. [21] Q. Jiang, “Path Planning Method of Mobile Robot Based on Q-
Savarese, “Robot Navigation in Constrained Pedestrian Environments learning,” in Journal of Physics: Conference Series, IOP Publishing Ltd,
using Reinforcement Learning,” in Proceedings - IEEE International Feb. 2022. doi: 10.1088/1742-6596/2181/1/012030.
Conference on Robotics and Automation, Institute of Electrical and [22] K.-H. Park, Y.-J. Kim, and J.-H. Kim, “Modular Q-learning based multi-
Electronics Engineers Inc., 2021, pp. 1140–1146. doi: agent cooperation for robot soccer,” 2001. [Online]. Available:
10.1109/ICRA48506.2021.9560893. www.fira.net.
[10] V. Zambaldi et al., “Relational Deep Reinforcement Learning,” Jun. [23] G. Antonelli, S. Chiaverini, and G. Fusco, “A fuzzy-logic-based
2018, [Online]. Available: http://arxiv.org/abs/1806.01830. approach for mobile robot path tracking,” IEEE Transactions on Fuzzy
[11] D. Dong, C. Chen, J. Chu, and T. J. Tarn, “Robust quantum-inspired Systems, vol. 15, no. 2, pp. 211–221, Apr. 2007, doi:
reinforcement learning for robot navigation,” IEEE/ASME Transactions 10.1109/TFUZZ.2006.879998.
on Mechatronics, vol. 17, no. 1, pp. 86–97, Feb. 2012, doi: [24] E. Ayari, S. Hadouaj, and K. Ghedira, “A fuzzy logic method for
10.1109/TMECH.2010.2090896. autonomous robot navigation in dynamic and uncertain environment
[12] Y. Zhu, Z. Wang, C. Chen, and D. Dong, “Rule-Based Reinforcement composed with complex traps,” in Proceedings - 5th International Multi-
Learning for Efficient Robot Navigation With Space Reduction,” Conference on Computing in the Global Information Technology,
IEEE/ASME Transactions on Mechatronics, vol. 27, no. 2, pp. 846–857, ICCGI 2010, 2010, pp. 18–23. doi: 10.1109/ICCGI.2010.47.
Apr. 2022, doi: 10.1109/TMECH.2021.3072675.

942 | P a g e
www.ijacsa.thesai.org
© 2023. This work is licensed under
http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding
the ProQuest Terms and Conditions, you may use this content in accordance
with the terms of the License.

You might also like