Abstract:
During disasters, such as floods, it is crucial to get real-time ground information for
planning rescue and response operations. With the advent of Unmanned Aerial Vehicles
(UAVs), flood monitoring capabilities have improved significantly, yet the dependency on
expert human pilots limit their operational scalability in unknown flood-like environments.
To tackle such issues, autonomous multi-UAV systems can be deployed to perform the
task of cooperative flood area coverage without human intervention. Recent advances
in robot control algorithms have attempted to deploy autonomous UAV systems for
various tasks, particularly, leveraging deep reinforcement learning (Deep RL) algorithms.
However, training Deep RL algorithms pose various challenges, such as sparse early
rewards, target approximation errors, and overestimation bias. These limitations often
leads to sub-optimal policies, especially when learning complex value functions in dynamic
and stochastic environments, like floods.
In this thesis, the focus is on learning effective autonomous multi-UAV policies for
f
lood area coverage, path planning, and object tracking by introducing novel exploration
strategies and target function approximators. The proposed solutions aim to mitigate
the limitations associated with training Deep RL policies for multi-UAV systems in
complex environments, such as floods. Domain knowledge based directed explorations are
introduced using water-flow algorithms, viz., D8 and D-infinity to expedite training of Deep
RL policies and to accumulate high rewards especially in initial episodes. The proposed
algorithms, D8QL (for discrete state-space) and D8DQN (for continuous state-space)
uses an ϵ1-ϵ2 exploration strategy, distinguishing them from purely random exploration
based ϵ-greedy strategy. Further, D3S algorithm is presented to deal with continuous
action-spaces for smoother UAV motion. Additionally, a decentralized training paradigm
is introduced to learn multi-UAV policies, as opposed to centrally trained policies, to
perform flood area coverage and to identify critical regions. The decentralized approach
enables flexible response capabilities in scenarios where communication with the ground
control station might be restricted or limited. Further, to mitigate poor training due
to random initialization of target networks in Deep RL based actor-citric algorithms,
a Gaussian process regression (GPR) based value function approximation technique is
proposed. GPR is used as the target critic to improve the multi-UAV policy to track a
convoy of moving vehicles. This thesis also presents a multi-UAV path planning strategy
to navigate waterborne evacuation vehicles to reach critical location(s) during floods. A
minimumnode expansion strategy is proposed to tackle the issue of exponential complexity
associated with the A* algorithm in large state-space environments.
All the proposed algorithms are benchmarked against established Deep RL baselines and
state-of-the-art algorithms from the recent literature. The results show that the proposed
algorithms outperform other techniques across multiple performance measures. The
proposed algorithms provide improved autonomous solutions for multi-UAV operations
in flood relief tasks, offering critical area coverage, efficient path planning, and continuous
object tracking.