SOTAVerified

FS-MEVQA

The Few-Shot Multimodal Explanation for Visual Question Answering (FS-MEVQA) task aims to learn MEVQA from few training samples.

Papers

Showing 17 of 7 papers

TitleStatusHype
GPT-4 Technical ReportCode6
CogVLM: Visual Expert for Pretrained Language ModelsCode5
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondCode5
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextCode3
Variational Causal Inference Network for Explanatory Visual Question AnsweringCode1
REX: Reasoning-aware and Grounded ExplanationCode1
Few-Shot Multimodal Explanation for Visual Question AnsweringCode0
Show:102550

No leaderboard results yet.