Abstract
In the field of autonomous surface vehicle (ASV) navigation, ensuring safety is paramount, necessitating an advanced collision avoidance system. This study focuses on the Twin Delayed Deep Deterministic Policy Gradients (TD3) algorithm, which operates in a continuous action space. We introduce an innovative epsilon-greedy exploration strategy for TD3, improving its performance by mitigating per-update errors and effectively balancing exploration and exploitation phases. The reward function is meticulously designed to reduce both cross-track and heading errors, exemplified using the L3 model of the KVLCC2 tanker, with an emphasis on minimizing rudder deflections. Error assessments are based on the Line of Sight (LOS) algorithm. The primary goal of this research is to enhance ASV navigation’s safety and efficiency by integrating collision avoidance of static obstacle with path-following capabilities through deep reinforcement learning. The developed model adeptly avoids collisions, continuously refining its policy to adapt to diverse conditions of static obstacles while prioritizing safety and optimal navigation. The study’s success lies in effectively developing, training, and testing a neural network architecture tailored for evading static obstacles, showcasing significant advancements in ASV navigational technologies.