CAI Logo

Theory of Mind in Object-context Scenarios


Description: Theory of Mind refers to the rich capacity humans have to infer the underlying mental states of others by observing their actions. Such ability is especially vital in a number of settings where verbal communication is limited, while a collaborative task is required to be accomplished. The BOSS dataset is a multimodal video dataset for assessing the capability of AI systems in predicting human belief states in an object-context scenario where verbal communication is prohibited. The dataset consists of videos but it also collects precise labelling of human belief state ground-truth and multimodal inputs replicating all nonverbal communication inputs captured by human perception, such as gaze tracking, hand gestures, object detection and pose estimation.

Goal: The goal of this thesis is to design and evaluate, for the first time, a deep learning model that exploits (some or all) the non-verbal inputs such as gaze data and gestures to enrich its Theory of Mind capabilities. In case of success, publication in a top tier conference is very likely.

Supervisor: Matteo Bortoletto

Distribution: 10% literature review, 70% implementation, 20% analysis

Requirements: Good knowledge of deep learning, strong programming skills in Python and PyTorch, keen interest in multimodal learning, self management skills.

Literature:

Duan, Jiafei, et al. 2022. BOSS: A Benchmark for Human Belief Prediction in Object-context Scenarios. arXiv:2206.10665.

Rabinowitz, Neil, et al. 2018. Machine theory of mind. International conference on machine learning. PMLR.

Gandhi, Kanishk, et al. 2021. Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others. Advances in Neural Information Processing Systems (NeurIPS) 34, p.9963-9976.