Speak2Label: Using domain knowledge for creating a large scale driver gaze zone estimation dataset

Dhall, A.; Sebe, N.; Sharma, G.; Gupta, S.; Ghosh, S.

Please use this identifier to cite or link to this item: http://dspace.iitrpr.ac.in:8080/xmlui/handle/123456789/3937

Title:	Speak2Label: Using domain knowledge for creating a large scale driver gaze zone estimation dataset
Authors:	Ghosh, S. Dhall, A. Sharma, G. Gupta, S. Sebe, N.
Issue Date:	26-Aug-2022
Abstract:	Labelling of human behavior analysis data is a complex and time consuming task. In this paper, a fully automatic technique for labelling an image based gaze behavior dataset for driver gaze zone estimation is proposed. Domain knowledge is added to the data recording paradigm and later labels are generated in an automatic manner using Speech To Text conversion (STT). In order to remove the noise in the STT process due to different illumination and ethnicity of subjects in our data, the speech frequency and energy are analysed. The resultant Driver Gaze in the Wild (DGW) dataset contains 586 recordings, captured during different times of the day including evenings. The large scale dataset contains 338 subjects with an age range of 18-63 years. As the data is recorded in different lighting conditions, an illumination robust layer is proposed in the Convolutional Neural Network (CNN). The extensive experiments show the variance in the dataset resembling real-world conditions and the effectiveness of the proposed CNN pipeline. The proposed network is also fine-tuned for the eye gaze prediction task, which shows the discriminativeness of the representation learnt by our network on the proposed DGW dataset. Project Page: https://sites.google.com/view/drivergazeprediction/home
URI:	http://localhost:8080/xmlui/handle/123456789/3937
Appears in Collections:	Year-2021

Files in This Item:

File	Description	Size	Format
Full Text.pdf		6.41 MB	Adobe PDF	View/Open Request a copy

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets