Explaining Disagreement in Visual Question Answering Using Eye Tracking

Susanne Hindennach, Lei Shi, Andreas Bulling

Proc. International Workshop on Pervasive Eye Tracking and Mobile Gaze-Based Interaction (PETMEI), pp. 1–7, 2024.

Abstract

When presented with the same question about an image, human annotators often give valid but disagreeing answers indicating that their reasoning was different. Such differences are lost in a single ground truth label used to train and evaluate visual question answering (VQA) methods. In this work, we explore whether visual attention maps, created using stationary eye tracking, provide insight into the reasoning underlying disagreement in VQA. We first manually inspect attention maps in the recent VQA-MHUG dataset and find cases in which attention differs consistently for disagreeing answers. We further evaluate the suitability of four different similarity metrics to detect attention differences matching the disagreement. We show that attention maps plausibly surface differences in reasoning underlying one type of disagreement, and that the metrics complementarily detect them. Taken together, our results represent an important first step to leverage eye-tracking to explain disagreement in VQA.

Links

doi: 10.1145/3649902.3656356

Paper: hindennach24_petmei.pdf

BibTeX

@inproceedings{hindennach24_petmei, title = {Explaining Disagreement in Visual Question Answering Using Eye Tracking}, author = {Hindennach, Susanne and Shi, Lei and Bulling, Andreas}, year = {2024}, pages = {1--7}, doi = {10.1145/3649902.3656356}, booktitle = {Proc. International Workshop on Pervasive Eye Tracking and Mobile Gaze-Based Interaction (PETMEI)} }