Deep learning approaches to tackle the challenges of human action recognition in videos

Chaudhary, S.

Please use this identifier to cite or link to this item: http://dspace.iitrpr.ac.in:8080/xmlui/handle/123456789/1374

Title:	Deep learning approaches to tackle the challenges of human action recognition in videos
Authors:	Chaudhary, S.
Issue Date:	11-Nov-2019
Abstract:	Human action recognition (HAR) is an important step of many person oriented computer vision application. As the the number of cameras around us is increasing day by day, the video data is also increasing. The analysis of this enormous amount of data can play a vital role in various applications such as video understanding and surveillance, video retrieval, human-computer interactions and autonomous driving systems, etc. Several works are proposed for the task of human action recognition [1]–[7] in which researchers have used different modalities for HAR. The research is challenging due to the versatility of the problems in HAR. The major problems in the field of HAR includes anthropometric variations, occlusion, intra and inter-class variations, weather situations, variation in viewpoint and privacy preserving. This work mainly focuses on analyzing and designing different modalities for HAR in the context of providing the solution to some of the above-mentioned challenges with deep learning framework. The significant contribution of this work is in proposing a novel motion estimation method for HAR, a deep learning network for HAR that works on the depth information, pose-based video summarization using the dynamic image, action recognition in weather degraded videos and action recognition in low-resolution videos. The motion estimation for HAR in videos has always remained a big challenge for the researchers. In this work, an efficient motion estimation method is proposed for HAR using the Weber’s law. The approach is computationally efficient as compared to the conventional optical flow-based methods such as [3]. Another contribution is made by introducing a depth based HAR method for the datasets,in which the depth information is not available. The depth is estimated using the proposed network, and is further utilized for the task of HAR. A new dimension is added to the research area of HAR by introducing a deep network for HAR in weather degraded video, it is mostly an untouched problem to date. Another significant contribution is in the direction of privacy-preserving, which is a very emerging topic in the field of HAR. It is noted these days that users are concerned for their privacy while using the gesture or action based devices such as Microsft Kinect [8]. An attempt is made here to preserve the privacy of end user in HAR by proposing action recognition in extremely low-resolution video. The HAR from the extremely low-resolution video also helps in reducing the network requirement as the size of data to transfer from camera to the server is drastically reduced in the proposed approach. Further, the pose based video summarization and action localization approach is proposed to tackle the problem of privacy and improving the accuracy in HAR. The proposed work is evaluated on current state-of-the-art benchmark datasets such as HMDB51, JHMDB, sub-JHMDB, MPII, UCF-sports, and UCF-101. Other than these datasets, two new datasets are proposed in this work using the abovementioned dataset for HAR in hazy videos (hazy-HMDB and hazy-UCF-101) and low-resolution videos. All proposed methods are tested and evaluated by following the standard practices of evaluation. The metric of evaluation is the average rate of recognition (ARR) over all the splits provided with the datasets.
URI:	http://localhost:8080/xmlui/handle/123456789/1374
Appears in Collections:	Year-2019

Files in This Item:

File	Description	Size	Format
Full Text.pdf		23.84 MB	Adobe PDF	View/Open Request a copy

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets