dc.description.abstract |
Recognition of human actions from videos can be improved if depth information is available. Depth information
certainly helps in segregating foreground motion from the background. Single image depth estimation (SIDE) is a commonly
used method for the analysis of weather degraded images. In this study, the idea of SIDE is extended to human action
recognition (HAR) on datasets where depth information is not available. Several depth-based HAR algorithms are available but
all of them are using the depth information given with the dataset. Some other methods are using depth motion map which
refers to the depth of motion in a temporal direction. Here, a new depth-based end-to-end deep network is proposed for HAR in
which the frame-wise depth is estimated and this estimated depth is used for processing instead of RGB frame. As colour
information is not required for estimating motion, a single channel depth map is used for estimating motion in the video. It makes
the system computationally efficient. The proposed method is tested and verified on three benchmark datasets namely JHMDB,
HMDB51 and UCF101. The proposed method outperforms the existing state-of-the-art methods for HAR on all the three tested
datasets. |
en_US |