dc.description.abstract |
With the advances in multimedia and the world wide
web, users upload millions of images and videos everyone on
social networking platforms on the Internet. From the perspective
of automatic human behavior understanding, it is of interest to
analyze and model the affects that are exhibited by groups of
people who are participating in social events in these images.
However, the analysis of the affect that is expressed by multiple
people is challenging due to the varied indoor and outdoor
settings. Recently, a few interesting works have investigated facebased
group-level emotion recognition (GER). In this paper, we
propose a multimodal framework for enhancing the affective
analysis ability of GER in challenging environments. Specifically,
for encoding a person’s information in a group-level image, we
first propose an information aggregation method for generating
feature descriptions of face, upper body, and scene. Later, we revisit
localized multiple kernel learning for fusing face, upper body,
and scene information for GER against challenging environments.
Intensive experiments are performed on two challenging grouplevel
emotion databases (HAPPEI and GAFF) to investigate
the roles of the face, upper body, scene information, and the
multimodal framework. Experimental results demonstrate that the
multimodal framework achieves promising performance for GER. |
en_US |