Deep learning based approaches for single image depth estimation

Hambarde, P.

DSpace Home
→
Ph.D Theses
→
Year-2022
→
View Item

dc.contributor.author	Hambarde, P.
dc.date.accessioned	2022-10-26T05:49:55Z
dc.date.available	2022-10-26T05:49:55Z
dc.date.issued	2022-10-26
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/4101
dc.description.abstract	Depth estimation is an important computer vision low-level information important for tasks such as 3D reconstruction, augmented reality, image de-hazing, semantic segmentation, object detection, human action recognition and autonomous driving platform, etc. The major challenges in the field of depth estimation are inherent ambiguity and unavailability of prior information, effect of depth predicting medium, and problems in active depth sensors. Conventional stereo vision sensor estimates a correct dense-level depth from the scene. But, it is deteriorating due to the computational entanglement and noisy information. light detection and ranging (LiDAR) depth sensor gives a high range of sparse depth-map from the indoor and outdoor scenes, whereas fails in huge reflection and low hanging clouds region as well as it is costlier. Simultaneously, low-cost time of flight (ToF) and kinect Depth sensors provide depth information with a high frame rate. However, they also suffer from illumination intensity problems. Most of the depth sensors fail to predict the depth-map from glossy, crystal-clear, and delicate surfaces. This work mainly focuses on analyzing and designing different modalities for depth estimation in the context of providing the solution to the above-mentioned challenges. The significant contribution of this work is in: 1) proposing a novel adversarial learning based single image depth estimation method, 2) proposing a depth estimation rom a single image and sparse depth sample approach, 3) proposing a novel depth estimation approach which predict the depth map from a single image and semantic prior information, 4) proposing an underwater depth estimation and enhancement approach and 5) proposing occlusion boundary prediction and depth map refinement framework for boundary prediction and depth map refinement. Scene understanding is an active area of research in computer vision that encompasses a variety of problems. Thus, we propose a two-stream deep adversarial network for single image depth estimation in RGB images for depth map estimation. For stream I network, we propose a novel encoder-decoder architecture using residual concepts to extract course-level depth features. Stream II network purely processes the information through the residual architecture for fine-level depth estimation. Also, a feature map sharing architecture is designed to share the learned feature maps of the decoder module of stream I. Sharing feature maps strengthen the residual learning to estimate the scene depth and increase the robustness of the proposed network. Along with the inherent ambiguity improvement, depth completion is also an equally challenging problem in depth estimation. Thus, we propose an end-to-end sparse-to-dense network (S2DNet) for single image depth estimation (SIDE). The proposed network processes a single image along with the additional sparse depth samples for depth estimation. The additional sparse depth samples are acquired either with a low-resolution depth sensor or calculated by visual simultaneous localization and mapping (SLAM) algorithms. In the first stage, the proposed S2DNet estimates coarse-level depth map using sparse-to-dense coarse network (S2DCNet). In the second stage, the estimated coarse-level depth map is concatenated with the input image and used as an input to the sparse-to-dense fine network (S2DFNet) for fine-level depth map estimation. The proposed S2DFNet comprises of attention map architecture which helps to estimate the prominent depth information. Further, the proposed S2DNet is extended for image de-hazing. The multi-modality sensor fusion technique is an active research area in scene understanding. Thus, we explore the RGB image and semantic-map fusion methods for depth estimation. The active depth sensors are unable to predict the depth-map on illuminated and monotonous pattern surfaces. In this work, we propose a semantic to-depth generative adversarial network (S2D-GAN) for depth estimation from RGB image and its semantic-map. In the first stage, the proposed S2D-GAN estimates the coarse level depth-map using a semantic-to-coarse-depth generative adversarial network (S2CD-GAN) while the second stage estimates the fine-level depth-map using a cascaded multi-scale spatial pooling network. The existing air medium-based depth estimation techniques do not work for the underwater environment. Thus, we propose an end-to-end underwater generative adversarial network (UW-GAN) for depth estimation from an underwater single image. Initially, a coarse-level depth map is estimated using the underwater coarse-level generative network (UWC-Net). Then, a fine-level depth map is computed using the underwater fine-level network (UWF-Net) which takes an input as the concatenation of the estimated coarse-level depth map and the input image. The proposed UWFNet composes of spatial and channel-wise squeeze and excitation blocks for fine-level depth estimation. Also, a synthetic underwater image generation approach for large scale database is proposed. The presented UW-GAN framework is also investigated for underwater single-image enhancement. In general, depth estimation methods do not predict the refined depth map output. To resolve this problem,we propose a novel two-stream occlusion boundary prediction (OBP-GAN) for boundary map and ORI-map estimation. The boundary and ORI-map can be further utilized as an important cue for the task of depth-map refinement from single images. Further, a depth-map refinement network (DMR-GAN) for refining the depth estimated from monocular images using boundary and ORI-map is also proposed. The proposed depth estimation approaches are evaluated on the current state-ofthe- art databases such as NYU RGB-D v2, KITTI Odometry, SUN-RGB, real-world air and underwater images, and synthetic underwater image dataset. Standard quantitative evaluation parameters such as RMSE, Rel, log10, δ1, δ2, and δ3 are used to evaluate the proposed depth estimation approaches.	en_US
dc.language.iso	en_US	en_US
dc.subject	Depth Estimation	en_US
dc.subject	Adversarial Training	en_US
dc.subject	Underwater Imaging	en_US
dc.subject	Occlusion Boundary	en_US
dc.subject	Refinement	en_US
dc.title	Deep learning based approaches for single image depth estimation	en_US
dc.type	Thesis	en_US