Possible Worlds Visual Question Answering
A novel causal framework that simultaneously reduces language and vision biases in VQA systems. Our method achieves 2x accuracy improvement on numerical questions while maintaining robust multimodal reasoning.
Numerical Accuracy
SOTA
VQA-CP v2
5+
Backbones
Authors: Ali Vosoughi*, Shijian Deng*, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo
*Equal contribution
IEEE Transactions on Multimedia, 2024