Abstract:
Modern digital cameras generally count on image signal processing (ISP) pipelines for producing naturalistic
RGB images. Nevertheless, in comparison to DSLR cameras, low-quality images are generally output from portable
mobile devices due to their physical limitations. The synthesized low-quality images usually have multiple degradations - low-resolution owing to small camera sensors, mosaic patterns on account of camera filter array and subpixel shifts due to camera motion. Such degradation usually
restrain the performance of single image super-resolution
methodologies for retrieving high-resolution (HR) image
from a single low-resolution (LR) image. Burst image
super-resolution aims at restoring a photo-realistic HR image by capturing the abundant information from multiple
LR images. Lately, the soaring popularity of burst photography has made multi-frame processing an attractive solution for overcoming the limitations of single image processing. In our work, we thus aim to propose a generic architecture, adaptive feature consolidation network (AFCNet)
for multi-frame processing. To alleviate the challenge of
effectively modelling the long-range dependency problem,
that multi-frame approaches struggle to solve, we utilize
encoder-decoder based transformer backbone which learns
multi-scale local-global representations. We propose feature alignment module to align LR burst frame features.
Further, the aligned features are fused and reconstructed by
abridged pseudo-burst fusion module and adaptive group
upsampling modules, respectively. Our proposed approach
clearly outperforms the other existing state-of-the-art techniques on benchmark datasets. The experimental results
illustrate the effectiveness and generality of our proposed
framework in upgrading the visual quality of HR images.