Abstract:
Traffic accidents cause over a million deaths every year, of which a
large fraction is attributed to drunk driving. An automated intoxicated driver detection system in vehicles will be useful in reducing
accidents and related financial costs. Existing solutions require
special equipment such as electrocardiogram, infrared cameras
or breathalyzers. In this work, we propose a new dataset called
DIF (Dataset of perceived Intoxicated Faces) which contains audiovisual data of intoxicated and sober people obtained from online
sources. To the best of our knowledge, this is the first work for
automatic bimodal non-invasive intoxication detection. Convolutional Neural Networks (CNN) and Deep Neural Networks (DNN)
are trained for computing the video and audio baselines, respectively. 3D CNN is used to exploit the Spatio-temporal changes in the
video. A simple variation of the traditional 3D convolution block
is proposed based on inducing non-linearity between the spatial
and temporal channels. Extensive experiments are performed to
validate the approach and baselines.