Learning multi-uav policies using deep reinforcement learning for flood area coverage and object tracking

Garg, A.

DSpace Home
→
Ph.D Theses
→
Year- 2024
→
View Item

Learning multi-uav policies using deep reinforcement learning for flood area coverage and object tracking

Garg, A.

URI: http://dspace.iitrpr.ac.in:8080/xmlui/handle/123456789/4829

Date: 2024-04-01

Abstract:

During disasters, such as floods, it is crucial to get real-time ground information for planning rescue and response operations. With the advent of Unmanned Aerial Vehicles (UAVs), flood monitoring capabilities have improved significantly, yet the dependency on expert human pilots limit their operational scalability in unknown flood-like environments. To tackle such issues, autonomous multi-UAV systems can be deployed to perform the task of cooperative flood area coverage without human intervention. Recent advances in robot control algorithms have attempted to deploy autonomous UAV systems for various tasks, particularly, leveraging deep reinforcement learning (Deep RL) algorithms. However, training Deep RL algorithms pose various challenges, such as sparse early rewards, target approximation errors, and overestimation bias. These limitations often leads to sub-optimal policies, especially when learning complex value functions in dynamic and stochastic environments, like floods. In this thesis, the focus is on learning effective autonomous multi-UAV policies for f lood area coverage, path planning, and object tracking by introducing novel exploration strategies and target function approximators. The proposed solutions aim to mitigate the limitations associated with training Deep RL policies for multi-UAV systems in complex environments, such as floods. Domain knowledge based directed explorations are introduced using water-flow algorithms, viz., D8 and D-infinity to expedite training of Deep RL policies and to accumulate high rewards especially in initial episodes. The proposed algorithms, D8QL (for discrete state-space) and D8DQN (for continuous state-space) uses an ϵ1-ϵ2 exploration strategy, distinguishing them from purely random exploration based ϵ-greedy strategy. Further, D3S algorithm is presented to deal with continuous action-spaces for smoother UAV motion. Additionally, a decentralized training paradigm is introduced to learn multi-UAV policies, as opposed to centrally trained policies, to perform flood area coverage and to identify critical regions. The decentralized approach enables flexible response capabilities in scenarios where communication with the ground control station might be restricted or limited. Further, to mitigate poor training due to random initialization of target networks in Deep RL based actor-citric algorithms, a Gaussian process regression (GPR) based value function approximation technique is proposed. GPR is used as the target critic to improve the multi-UAV policy to track a convoy of moving vehicles. This thesis also presents a multi-UAV path planning strategy to navigate waterborne evacuation vehicles to reach critical location(s) during floods. A minimumnode expansion strategy is proposed to tackle the issue of exponential complexity associated with the A* algorithm in large state-space environments. All the proposed algorithms are benchmarked against established Deep RL baselines and state-of-the-art algorithms from the recent literature. The results show that the proposed algorithms outperform other techniques across multiple performance measures. The proposed algorithms provide improved autonomous solutions for multi-UAV operations in flood relief tasks, offering critical area coverage, efficient path planning, and continuous object tracking.

Show full item record