| Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning | Nov 27, 2024 | Autonomous DrivingMultimodal Reasoning | —Unverified | 0 |
| Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving | Nov 20, 2024 | Autonomous DrivingMultimodal Reasoning | —Unverified | 0 |
| Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination | Nov 15, 2024 | HallucinationMultimodal Reasoning | CodeCode Available | 1 |
| LLaVA-CoT: Let Vision Language Models Reason Step-by-Step | Nov 15, 2024 | Logical ReasoningMultimodal Reasoning | CodeCode Available | 7 |
| Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization | Nov 15, 2024 | Multimodal Reasoning | CodeCode Available | 0 |
| Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level | Nov 15, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Towards Low-Resource Harmful Meme Detection with LMM Agents | Nov 8, 2024 | Multimodal Reasoning | CodeCode Available | 0 |
| Distill Visual Chart Reasoning Ability from LLMs to MLLMs | Oct 24, 2024 | Multimodal ReasoningVisual Reasoning | CodeCode Available | 2 |
| Understanding the Role of LLMs in Multimodal Evaluation Benchmarks | Oct 16, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 |
| Learning to Ground VLMs without Forgetting | Oct 14, 2024 | DecoderLanguage Modelling | —Unverified | 0 |
| An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation | Oct 4, 2024 | Language ModellingMultimodal Reasoning | —Unverified | 0 |
| NL-Eye: Abductive NLI for Images | Oct 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Unveiling AI's Potential Through Tools, Techniques, and Applications | Oct 2, 2024 | AutoMLEdge-computing | —Unverified | 0 |
| Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning | Sep 25, 2024 | BenchmarkingFormal Logic | —Unverified | 0 |
| JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images | Sep 19, 2024 | HallucinationImage Captioning | CodeCode Available | 0 |
| NVLM: Open Frontier-Class Multimodal LLMs | Sep 17, 2024 | MathMultimodal Reasoning | —Unverified | 0 |
| Knowledge-Aware Reasoning over Multimodal Semi-structured Tables | Aug 25, 2024 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning | Aug 16, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Towards Holistic Disease Risk Prediction using Small Language Models | Aug 13, 2024 | Multimodal Reasoning | —Unverified | 0 |
| DC3DO: Diffusion Classifier for 3D Objects | Aug 13, 2024 | 3D Object ClassificationClassification | CodeCode Available | 1 |
| User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance | Aug 4, 2024 | Action AnticipationBenchmarking | —Unverified | 0 |
| MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models | Aug 2, 2024 | Multimodal ReasoningMultiple-choice | CodeCode Available | 3 |
| LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models | Jul 23, 2024 | Multimodal ReasoningPrompt Engineering | CodeCode Available | 1 |
| HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning | Jul 22, 2024 | BenchmarkingHallucination | CodeCode Available | 1 |
| Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights | Jul 16, 2024 | Image CaptioningMultimodal Reasoning | CodeCode Available | 0 |
| On scalable oversight with weak LLMs judging strong LLMs | Jul 5, 2024 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| Improving Multi-Agent Debate with Sparse Communication Topology | Jun 17, 2024 | Multimodal Reasoning | —Unverified | 0 |
| MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models | Jun 17, 2024 | BenchmarkingFact Checking | CodeCode Available | 1 |
| POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models | Jun 6, 2024 | Multimodal ReasoningPrompt Engineering | —Unverified | 0 |
| Multimodal Reasoning with Multimodal Knowledge Graph | Jun 4, 2024 | cross-modal alignmentGraph Attention | —Unverified | 0 |
| Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning | May 31, 2024 | Answer GenerationMultimodal Reasoning | —Unverified | 0 |
| Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models | May 31, 2024 | Multimodal ReasoningRetrieval | CodeCode Available | 0 |
| M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models | May 24, 2024 | Multimodal Reasoning | CodeCode Available | 0 |
| Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models | May 22, 2024 | Multimodal ReasoningVisual Question Answering | —Unverified | 0 |
| Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning | May 19, 2024 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models | May 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| AccidentBlip: Agent of Accident Warning based on MA-former | Apr 18, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Exploring the Transferability of Visual Prompting for Multimodal Large Language Models | Apr 17, 2024 | HallucinationMultimodal Reasoning | CodeCode Available | 1 |
| Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V | Apr 16, 2024 | Instruction FollowingMultimodal Reasoning | —Unverified | 0 |
| MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification | Apr 7, 2024 | Image ComprehensionMath | CodeCode Available | 0 |
| Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval | Mar 26, 2024 | Multimodal ReasoningRetrieval | —Unverified | 0 |
| A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning | Mar 22, 2024 | Multimodal Reasoning | CodeCode Available | 1 |
| PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns | Mar 20, 2024 | Multimodal Reasoning | CodeCode Available | 2 |
| Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning | Mar 6, 2024 | Multimodal ReasoningQuestion Answering | CodeCode Available | 2 |
| VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing | Mar 5, 2024 | Multimodal ReasoningSentence | CodeCode Available | 0 |
| All in an Aggregated Image for In-Image Learning | Feb 28, 2024 | AllHallucination | CodeCode Available | 1 |
| Measuring Vision-Language STEM Skills of Neural Models | Feb 27, 2024 | Multimodal Reasoning | CodeCode Available | 0 |
| RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis | Feb 25, 2024 | Code GenerationMultimodal Reasoning | —Unverified | 0 |
| Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics | Feb 24, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image | Feb 22, 2024 | Adversarial RobustnessMultimodal Reasoning | CodeCode Available | 1 |