| Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models | Nov 27, 2024 | Visual Reasoning | —Unverified | 0 |
| Visual Contexts Clarify Ambiguous Expressions: A Benchmark Dataset | Nov 21, 2024 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios | Nov 20, 2024 | Question AnsweringVisual Question Answering (VQA) | —Unverified | 0 |
| Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking | Nov 20, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting | Nov 19, 2024 | 3D GenerationGPU | —Unverified | 0 |
| On Erroneous Agreements of CLIP Image Embeddings | Nov 7, 2024 | Visual Reasoning | CodeCode Available | 0 |
| Bootstrapping Top-down Information for Self-modulating Slot Attention | Nov 4, 2024 | ObjectObject Discovery | —Unverified | 0 |
| Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems | Nov 2, 2024 | SpecificityVisual Reasoning | —Unverified | 0 |
| Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models | Nov 1, 2024 | Adversarial AttackContrastive Learning | —Unverified | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Improving Generalization in Visual Reasoning via Self-Ensemble | Oct 28, 2024 | Visual Question Answering (VQA)Visual Reasoning | —Unverified | 0 |
| Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? | Oct 25, 2024 | Visual Reasoning | CodeCode Available | 0 |
| ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom | Oct 18, 2024 | Visual Reasoning | —Unverified | 0 |
| MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark | Oct 15, 2024 | FairnessScene Text Recognition | CodeCode Available | 0 |
| ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization | Oct 14, 2024 | Explanation GenerationImage Forgery Detection | —Unverified | 0 |
| TVBench: Redesigning Video-Language Evaluation | Oct 10, 2024 | Multiple-choiceOpen-Ended Question Answering | —Unverified | 0 |
| Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends | Oct 5, 2024 | BenchmarkingChart Understanding | —Unverified | 0 |
| Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning | Sep 30, 2024 | Visual Reasoning | CodeCode Available | 0 |
| GSON: A Group-based Social Navigation Framework with Large Multimodal Model | Sep 26, 2024 | Autonomous VehiclesMotion Planning | —Unverified | 0 |
| Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing | Sep 26, 2024 | Event DetectionObject | —Unverified | 0 |
| Enhancing Advanced Visual Reasoning Ability of Large Language Models | Sep 21, 2024 | In-Context LearningVisual Reasoning | —Unverified | 0 |
| Impact of ML Optimization Tactics on Greener Pre-Trained ML Models | Sep 19, 2024 | GPUimage-classification | —Unverified | 0 |
| JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images | Sep 19, 2024 | HallucinationImage Captioning | CodeCode Available | 0 |
| What Makes a Maze Look Like a Maze? | Sep 12, 2024 | Visual Reasoning | —Unverified | 0 |
| Critical Features Tracking on Triangulated Irregular Networks by a Scale-Space Method | Sep 10, 2024 | Visual Reasoning | —Unverified | 0 |
| MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct | Sep 9, 2024 | DiversityVisual Reasoning | —Unverified | 0 |
| How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model? | Sep 3, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 0 |
| Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis | Aug 27, 2024 | BenchmarkingLarge Language Model | —Unverified | 0 |
| Multi-Modal Dialogue State Tracking for Playing GuessWhich Game | Aug 15, 2024 | Dialogue State TrackingVisual Reasoning | CodeCode Available | 0 |
| ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling | Aug 7, 2024 | AttributeLanguage Modeling | —Unverified | 0 |
| Compromising Embodied Agents with Contextual Backdoor Attacks | Aug 6, 2024 | Autonomous DrivingRobot Manipulation | —Unverified | 0 |
| ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning | Aug 5, 2024 | Visual Reasoning | —Unverified | 0 |
| A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap | Jul 31, 2024 | Human-Object Interaction DetectionImage Reconstruction | CodeCode Available | 0 |
| Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM | Jul 31, 2024 | In-Context LearningLayout Design | —Unverified | 0 |
| Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering | Jul 30, 2024 | Code GenerationQuestion Answering | —Unverified | 0 |
| Take A Step Back: Rethinking the Two Stages in Visual Reasoning | Jul 29, 2024 | Logical ReasoningQuestion Answering | —Unverified | 0 |
| Untrained neural networks can demonstrate memorization-independent abstract reasoning | Jul 25, 2024 | MemorizationVisual Reasoning | CodeCode Available | 0 |
| Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators | Jul 20, 2024 | Action RecognitionCoLA | —Unverified | 0 |
| I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction | Jul 19, 2024 | 3D ReconstructionSpatial Reasoning | —Unverified | 0 |
| Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols | Jul 18, 2024 | Visual Reasoning | —Unverified | 0 |
| X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs | Jul 18, 2024 | Contrastive LearningRepresentation Learning | —Unverified | 0 |
| SwitchCIT: Switching for Continual Instruction Tuning | Jul 16, 2024 | Text GenerationVisual Reasoning | —Unverified | 0 |
| NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models | Jul 15, 2024 | Common Sense ReasoningMultiple-choice | —Unverified | 0 |
| Affordance-Guided Reinforcement Learning via Visual Prompting | Jul 14, 2024 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning | Jul 11, 2024 | Domain GeneralizationHuman-Object Interaction Detection | —Unverified | 0 |
| MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics? | Jun 28, 2024 | Task PlanningVisual Reasoning | —Unverified | 0 |
| Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA | Jun 27, 2024 | General KnowledgeQuestion Answering | —Unverified | 0 |
| Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges | Jun 26, 2024 | In-Context LearningTraveling Salesman Problem | CodeCode Available | 0 |
| Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration | Jun 24, 2024 | DiversityMultiple-choice | —Unverified | 0 |
| Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects | Jun 22, 2024 | Relational ReasoningVisual Reasoning | CodeCode Available | 0 |