dc.description.abstract |
Human action recognition (HAR) is an important step of many person oriented
computer vision application. As the the number of cameras around us is increasing
day by day, the video data is also increasing. The analysis of this enormous
amount of data can play a vital role in various applications such as video understanding
and surveillance, video retrieval, human-computer interactions and autonomous
driving systems, etc. Several works are proposed for the task of human
action recognition [1]–[7] in which researchers have used different modalities for
HAR. The research is challenging due to the versatility of the problems in HAR.
The major problems in the field of HAR includes anthropometric variations, occlusion,
intra and inter-class variations, weather situations, variation in viewpoint
and privacy preserving. This work mainly focuses on analyzing and designing different
modalities for HAR in the context of providing the solution to some of the
above-mentioned challenges with deep learning framework. The significant contribution of this work is in proposing a novel motion estimation
method for HAR, a deep learning network for HAR that works on the depth
information, pose-based video summarization using the dynamic image, action
recognition in weather degraded videos and action recognition in low-resolution
videos.
The motion estimation for HAR in videos has always remained a big challenge for
the researchers. In this work, an efficient motion estimation method is proposed
for HAR using the Weber’s law. The approach is computationally efficient as compared
to the conventional optical flow-based methods such as [3]. Another contribution
is made by introducing a depth based HAR method for the datasets,in
which the depth information is not available. The depth is estimated using the
proposed network, and is further utilized for the task of HAR.
A new dimension is added to the research area of HAR by introducing a deep
network for HAR in weather degraded video, it is mostly an untouched problem
to date. Another significant contribution is in the direction of privacy-preserving,
which is a very emerging topic in the field of HAR. It is noted these days that users
are concerned for their privacy while using the gesture or action based devices
such as Microsft Kinect [8]. An attempt is made here to preserve the privacy
of end user in HAR by proposing action recognition in extremely low-resolution
video. The HAR from the extremely low-resolution video also helps in reducing
the network requirement as the size of data to transfer from camera to the server
is drastically reduced in the proposed approach. Further, the pose based video
summarization and action localization approach is proposed to tackle the problem
of privacy and improving the accuracy in HAR.
The proposed work is evaluated on current state-of-the-art benchmark datasets
such as HMDB51, JHMDB, sub-JHMDB, MPII, UCF-sports, and UCF-101. Other than these datasets, two new datasets are proposed in this work using the abovementioned
dataset for HAR in hazy videos (hazy-HMDB and hazy-UCF-101) and
low-resolution videos. All proposed methods are tested and evaluated by following
the standard practices of evaluation. The metric of evaluation is the average rate
of recognition (ARR) over all the splits provided with the datasets. |
en_US |