Abstract:
Eye gaze estimation is an important problem in
automatic human behavior understanding. This paper proposes a
deep learning based method for inferring the eye gaze direction.
The method is based on the use of ensemble of networks, which
capture both the geometric and texture information. Firstly, a
Deep Neural Network (DNN) is trained using the geometric
features that are extracted from the facial landmark locations.
Secondly, for the texture based features, three Convolutional
Neural Networks (CNN) are trained i.e. for the patch around the
left eye, right eye, and the combined eyes, respectively. Finally, the
information from the four channels is fused with concatenation
and dense layers are trained to predict the final eye gaze. The
experiments are performed on the two publicly available datasets:
Columbia eye gaze and TabletGaze. The extensive evaluation
shows the superior performance of the proposed framework. We
also evaluate the performance of the recently proposed swish
activation function as compared to Rectified Linear Unit (ReLU)
for eye gaze estimation.