Abstract:
Abstract
Applications of multiple unmanned aerial vehicles (UAV) involve complex control dynamics for accomplishing any task. This paper employs a multi-UAV system for continuous tracking and end-to-end coverage of a moving convoy of vehicles to provide security and surveillance cover. The coverage is achieved by maintaining the moving convoy within the overlapping Field-of-Views (FoVs) of the UAVs. To learn the controls of the autonomous multi-UAV system, we propose a deep reinforcement learning based multi-agent actor-critic method called GPR-MADDPG. The proposed method makes use of Gaussian Process Regression (GPR) to estimate an unbiased and stable target value of the critic. Further, the kernel function of the GPR model has been adapted to keep the high variance in the convoy trajectory in check. The rewards for training the multi-UAV system are formulated to maximize the end-to-end convoy coverage by optimizing the overlaps between the FoVs along with minimizing the tracking error. Experiments were performed on real-world road trajectories of varying complexities along with varying convoy speeds and the number of UAVs. Further tests were performed using a simulator with a real-world physics engine. The experiments show that the proposed GPR-MADDPG model results in the least amount of overlapping error and accumulates maximum reward as compared to other prevalent approaches in the literature.