PW-VQA
Possible Worlds Visual Question Answering
A novel causal framework that
simultaneously reduces language and vision biases
in VQA systems. Our method achieves
2x accuracy improvement
on numerical questions while maintaining robust multimodal reasoning.
2×
Numerical Accuracy
SOTA
VQA-CP v2
5+
Backbones
📄 Read Paper
💻 View Code
🚀 Quick Start
Authors:
Ali Vosoughi*, Shijian Deng*, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo
*Equal contribution
IEEE Transactions on Multimedia, 2024