Abstract:
Eye gaze estimation is an important problem in automatic human behavior understanding. This paper proposes a deep learning based method for inferring the eye gaze direction. The method is based on the use of ensemble of networks, which capture both the geometric and texture information. Firstly, a Deep Neural Network (DNN) is trained using the geometric features that are extracted from the facial landmark locations. Secondly, for the texture based features, three Convolutional Neural Networks (CNN) are trained i.e. For the patch around the left eye, right eye, and the combined eyes, respectively. Finally, the information from the four channels is fused with concatenation and dense layers are trained to predict the final eye gaze. The experiments are performed on the two publicly available datasets: Columbia eye gaze and TabletGaze. The extensive evaluation shows the superior performance of the proposed framework. We also evaluate the performance of the recently proposed swish activation function as compared to Rectified Linear Unit (ReLU) for eye gaze estimation.