Abstract:
Moving object detection (MOD) in videos is a
challenging task. Estimation of accurate background is the key
to extracting the foreground from video frames. In this paper,
we have proposed a novel compact end-to-end convolutional
neural network architecture, motion saliency foreground network
(MSFgNet), to estimate the background and to extract
the foreground from video frames. Initially, the long streaming
video is divided into a number of small video streams (SVS).
The proposed network takes the SVS as an input and estimates
the background frame for each SVS. Second, the saliency
map is extracted using the current video frame and estimated
background. Furthermore, a compact encoder–decoder network
is proposed to extract the foreground from the estimated saliency
maps. The performance of the proposed MSFgNet is tested on
three benchmark datasets (CDnet-2014, LASIESTA, and PTIS)
for MOD. The computational complexity (handling of number of
parameters and execution time) and the performance of the proposed
MSFgNet are compared with the existing state-of-the-art
methods for MOD in terms of precision, recall, and F-measure.
Performance analysis shows that the proposed network is very
compact and outperforms the existing state-of-the-art methods
for MOD in videos.