| Vision Language Models are In-Context Value Learners | Nov 7, 2024 | In-Context LearningWorld Knowledge | —Unverified | 0 | 0 |
| Vision-Language Models Provide Promptable Representations for Reinforcement Learning | Feb 5, 2024 | Common Sense ReasoningInstruction Following | —Unverified | 0 | 0 |
| Visual Commonsense in Pretrained Unimodal and Multimodal Models | Jan 16, 2022 | AttributeWorld Knowledge | —Unverified | 0 | 0 |
| Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark | Sep 13, 2024 | Sequential Decision MakingWorld Knowledge | —Unverified | 0 | 0 |
| Visual Programming for Text-to-Image Generation and Evaluation | May 24, 2023 | Image GenerationLayout Generation | —Unverified | 0 | 0 |
| Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models | Jul 28, 2024 | World Knowledge | —Unverified | 0 | 0 |
| VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks | Dec 24, 2024 | Common Sense ReasoningTransfer Learning | —Unverified | 0 | 0 |
| VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving | Jul 9, 2024 | Autonomous DrivingImage to 3D | —Unverified | 0 | 0 |
| We Usually Don't Like Going to the Dentist: Using Common Sense to Detect Irony on Twitter | Dec 1, 2018 | Common Sense ReasoningGeneral Classification | —Unverified | 0 | 0 |
| What does the Failure to Reason with "Respectively" in Zero/Few-Shot Settings Tell Us about Language Models? | May 31, 2023 | Common Sense ReasoningFew-Shot Learning | —Unverified | 0 | 0 |