SOTAVerified

Visual Commonsense Reasoning

Papers

Showing 1120 of 65 papers

TitleStatusHype
X-modaler: A Versatile and High-performance Codebase for Cross-modal AnalyticsCode1
MERLOT: Multimodal Neural Script Knowledge ModelsCode1
Unifying Vision-and-Language Tasks via Text GenerationCode1
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense GraphsCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
UNITER: UNiversal Image-TExt Representation LearningCode1
VL-BERT: Pre-training of Generic Visual-Linguistic RepresentationsCode1
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language TasksCode1
Compositional Image-Text Matching and Retrieval by Grounding EntitiesCode0
Generative Visual Commonsense Answering and Explaining with Generative Scene Graph Constructing0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.