Abstract:
In this paper, we present an end-to-end system for
enhancing the effectiveness of non-verbal gestures in human
robot interaction. We identify prominently used gestures in
performances by TED talk speakers and map them to their
corresponding speech context and modulated speech based
upon the attention of the listener. Gestures are localised
with convolution neural networks based approach. Dominant
gestures of TED speakers are used for learning the gestureto-speech mapping. We evaluated the engagement of the robot
with people by conducting a social survey. The effectiveness
of the performance was monitored by the robot and it selfimprovised its speech pattern on the basis of the attention level
of the audience, which was calculated using visual feedback
from the camera. The effectiveness of interaction as well as
the decisions made during improvisation was further evaluated
based on the head-pose detection and an interaction survey