Explainable human behaviour prediction in social contexts

Madan, S.

DSpace Home
→
Ph.D Theses
→
Year- 2025
→
View Item

Explainable human behaviour prediction in social contexts

Madan, S.

URI: http://dspace.iitrpr.ac.in:8080/xmlui/handle/123456789/4991

Date: 2025-07

Abstract:

Understanding human behaviour in social contexts is key to building intelligent interaction systems that interact with people naturally and e”ciently. While AI has made substantial advancements, especially in vision and language, there remains a gap in AI systems that can e!ectively understand and explain human traits, emotions, and social interactions. Current state-of-the-art models struggle to accurately predict human behaviours in the intricate and subjective contexts of real-world environments. To address this gap, the thesis proposes novel methods that not only predict human-centered behaviours but also provide interpretable insights into the underlying reasons behind these predictions. To enhance AI’s ability to understand and interact with humans more intuitively, we leverage multimodal data sources such as head motion, facial expressions, speech, and gaze behaviour. In the end, this work aims to develop AI systems that can e!ectively interpret and respond to human behaviours in complex real-world social settings, with applications spanning social robotics and personalized human-computer interaction. This thesis presents a comprehensive exploration of human behaviour prediction in social contexts with explanations, addressing key challenges in understanding personality and behavioral traits, group behaviours, and social interactions. The first contribution demonstrates the utility of elementary head-motion units, termed kinemes, for behavioral analytics. By transforming head-motion patterns into sequences of kinemes, we uncover latent temporal signatures that enable e”cient and explainable predictions of personality and interview traits. Building on individual traits, our second contribution investigates the significance of body language behavioral cues in social interactions, particularly focusing on gestures and body movements. We propose a multiview attention fusion method, MAGIC-TBR, which combines features from videos and their discrete cosine transform coe”cients to capture finer behaviours like gesturing, grooming, and fumbling. In analyzing the bodily behaviour of participants in group settings, we observe that in every multiparty activity, one (or more) dominant personality typically takes the lead and is referred to as the “Most Important Person” (MIP). However, current datasets lack su”cient resources to train models to identify the MIP accurately. Existing “in-the-wild” datasets are either too small in size or do not cover a wide range of social situations. To address this, our third contribution of this thesis is the proposal of a large-scale, ‘in-the-wild’ dataset designed to capture human perceptions of importance in social images, along with the introduction of a novel approach for estimating the Most Important Person (MIP) in group settings, called MIP-CLIP. Through extensive benchmarking with state-of-the-art MIP localization methods, we highlight the need for more robust algorithms capable of handling real-world scenarios. The dataset and approach aims to significantly advance research in understanding social situations. In addition to identifying the Most Important Person (MIP) in group settings, we also find that the role of social gaze behaviours, such as mutual-gaze and shared attention, is also critical in understanding social interactions. These gaze cues provide valuable insights into the dynamics of communication and further inform the prediction of social behaviours within group contexts. Finally, we extend the analysis of gaze behaviours during dyadic communication (where two persons involve in a conversation). We propose a network designed to recognize these gaze patterns in images, providing deeper insights into the dynamics of social interaction. This work advances the development of explainable models for predicting human behaviour and lays the foundation for future progress in understanding social interactions in both individual and group contexts. The findings of this thesis enhance the understanding of human-centered social interaction dynamics, o!ering insights into the success of both individuals and groups in human-to-human and human-computer interactions.

Show full item record