Abstract:
Depth estimation is an important computer vision low-level information important
for tasks such as 3D reconstruction, augmented reality, image de-hazing, semantic segmentation,
object detection, human action recognition and autonomous driving platform,
etc. The major challenges in the field of depth estimation are inherent ambiguity
and unavailability of prior information, effect of depth predicting medium, and problems
in active depth sensors. Conventional stereo vision sensor estimates a correct
dense-level depth from the scene. But, it is deteriorating due to the computational entanglement
and noisy information. light detection and ranging (LiDAR) depth sensor
gives a high range of sparse depth-map from the indoor and outdoor scenes, whereas
fails in huge reflection and low hanging clouds region as well as it is costlier. Simultaneously,
low-cost time of flight (ToF) and kinect Depth sensors provide depth
information with a high frame rate. However, they also suffer from illumination intensity
problems. Most of the depth sensors fail to predict the depth-map from glossy,
crystal-clear, and delicate surfaces. This work mainly focuses on analyzing and designing
different modalities for depth estimation in the context of providing the solution
to the above-mentioned challenges.
The significant contribution of this work is in: 1) proposing a novel adversarial
learning based single image depth estimation method, 2) proposing a depth estimation rom a single image and sparse depth sample approach, 3) proposing a novel depth
estimation approach which predict the depth map from a single image and semantic
prior information, 4) proposing an underwater depth estimation and enhancement
approach and 5) proposing occlusion boundary prediction and depth map refinement
framework for boundary prediction and depth map refinement.
Scene understanding is an active area of research in computer vision that encompasses
a variety of problems. Thus, we propose a two-stream deep adversarial network
for single image depth estimation in RGB images for depth map estimation. For stream
I network, we propose a novel encoder-decoder architecture using residual concepts to
extract course-level depth features. Stream II network purely processes the information
through the residual architecture for fine-level depth estimation. Also, a feature
map sharing architecture is designed to share the learned feature maps of the decoder
module of stream I. Sharing feature maps strengthen the residual learning to estimate
the scene depth and increase the robustness of the proposed network.
Along with the inherent ambiguity improvement, depth completion is also an
equally challenging problem in depth estimation. Thus, we propose an end-to-end
sparse-to-dense network (S2DNet) for single image depth estimation (SIDE). The proposed
network processes a single image along with the additional sparse depth samples
for depth estimation. The additional sparse depth samples are acquired either
with a low-resolution depth sensor or calculated by visual simultaneous localization
and mapping (SLAM) algorithms. In the first stage, the proposed S2DNet estimates
coarse-level depth map using sparse-to-dense coarse network (S2DCNet). In the second
stage, the estimated coarse-level depth map is concatenated with the input image
and used as an input to the sparse-to-dense fine network (S2DFNet) for fine-level
depth map estimation. The proposed S2DFNet comprises of attention map architecture
which helps to estimate the prominent depth information. Further, the proposed
S2DNet is extended for image de-hazing.
The multi-modality sensor fusion technique is an active research area in scene understanding.
Thus, we explore the RGB image and semantic-map fusion methods for
depth estimation. The active depth sensors are unable to predict the depth-map on
illuminated and monotonous pattern surfaces. In this work, we propose a semantic to-depth generative adversarial network (S2D-GAN) for depth estimation from RGB
image and its semantic-map. In the first stage, the proposed S2D-GAN estimates the
coarse level depth-map using a semantic-to-coarse-depth generative adversarial network
(S2CD-GAN) while the second stage estimates the fine-level depth-map using a
cascaded multi-scale spatial pooling network.
The existing air medium-based depth estimation techniques do not work for the
underwater environment. Thus, we propose an end-to-end underwater generative
adversarial network (UW-GAN) for depth estimation from an underwater single image.
Initially, a coarse-level depth map is estimated using the underwater coarse-level
generative network (UWC-Net). Then, a fine-level depth map is computed using the
underwater fine-level network (UWF-Net) which takes an input as the concatenation
of the estimated coarse-level depth map and the input image. The proposed UWFNet
composes of spatial and channel-wise squeeze and excitation blocks for fine-level
depth estimation. Also, a synthetic underwater image generation approach for large
scale database is proposed. The presented UW-GAN framework is also investigated
for underwater single-image enhancement.
In general, depth estimation methods do not predict the refined depth map output.
To resolve this problem,we propose a novel two-stream occlusion boundary prediction
(OBP-GAN) for boundary map and ORI-map estimation. The boundary and ORI-map
can be further utilized as an important cue for the task of depth-map refinement from
single images. Further, a depth-map refinement network (DMR-GAN) for refining the
depth estimated from monocular images using boundary and ORI-map is also proposed.
The proposed depth estimation approaches are evaluated on the current state-ofthe-
art databases such as NYU RGB-D v2, KITTI Odometry, SUN-RGB, real-world air
and underwater images, and synthetic underwater image dataset. Standard quantitative
evaluation parameters such as RMSE, Rel, log10, δ1, δ2, and δ3 are used to
evaluate the proposed depth estimation approaches.