Abstract:
Image inpainting is a reconstruction method, where a corrupted image consisting of
holes is lled with the most relevant contents from the valid region of same image.
With the advancements in image editing applications, image inpainting is gaining more
attention due to its ability to recover corrupted images e ciently. Also, it has a
wide variety of applications such as reconstruction of the corrupted image, occlusion
removal, re ection removal, etc. Existing approaches achieved superior performance with
coarse-to- ne, single-stage, progressive, and recurrent architectures with a compromise
of either perceptual quality (blurry, spatial inconsistencies) of results or computational
complexity. Also, the performance of the existing methods degrades when images with
large missing regions are considered. In order to mitigate these limitations, in this work,
we propose the deep generative architectures for image inpainting.
Firstly, we propose the coarse-to- ne architectures for inpainting images with varying
corrupted regions with improved performance as compared to state-of-the-art methods.
The three proposed coarse-to- ne solutions consist of: (a) a spatial projection layer to
focus on spatial consistencies in the inpainted image, (b) encoder-level feature aggregation
followed by multi-scale and multi-receptive feature sharing decoder, and (c) a nested
deformable multi-head attention layer to e ectively merge the encoder-decoder features.
Further, to reduce the computational complexity, we proposed single-stage architectures
with three solutions as: (a) a correlated multi-resolution feature fusion, (b)
diverse-receptive elds based feature learning, and (c) pseudo-decoder guided
reconstruction for image inpainting. The proposed architectures have less computational
complexity compared to earlier one and state-of-the-art methods for image inpainting.
The performance of these proposed architectures is validated in terms of qualitative,
quantitative results and computational complexity in comparison with each other and
existing methods for image inpainting.
Furthermore, to reduce the mask dependency of the proposed and existing approaches,
we propose two novel blind image inpainting approaches consisting of (a) wavelet query
multi-head attention transformer and omni-dimensional gated attention (b) high receptive
elds (multi-kernel) multi-head attention and novel high-frequency o set deformable
feature merging module. These proposed approaches is compared qualitatively and
quantitatively with existing state-of-the-art methods for blind image inpainting. To
validate the performance of the proposed architectures, the experimental analysis is done
on di erent datasets like: CelebA-HQ, FFHQ, Paris Street View, Places2 and Imagenet.