Knowledge-Aware Iterative Retrieval for Multi-Agent Systems Mar 17, 2025 Evidence Selection Large Language Model
— Unverified 0Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference Mar 17, 2025 Feature Compression Image Compression
— Unverified 0Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory Mar 17, 2025 Form GPU
— Unverified 0MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Mar 17, 2025 Articles Benchmarking
Code Code Available 1Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding Mar 17, 2025 Attribute MME
— Unverified 0From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration Mar 17, 2025 Denoising Question Answering
— Unverified 0Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos Mar 17, 2025 Benchmarking Question Answering
Code Code Available 1Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions Mar 17, 2025 Question Answering
— Unverified 0MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways Mar 17, 2025 Decision Making Medical Question Answering
— Unverified 0General Table Question Answering via Answer-Formula Joint Generation Mar 16, 2025 Question Answering
— Unverified 0GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing Mar 16, 2025 Change Detection Image Captioning
— Unverified 0PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models Mar 16, 2025 Machine Unlearning Privacy Preserving
— Unverified 0MUSS: Multilevel Subset Selection for Relevance and Diversity Mar 14, 2025 Diversity Question Answering
— Unverified 0Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering Mar 14, 2025 Embodied Question Answering Question Answering
— Unverified 0DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models Mar 14, 2025 Autonomous Driving Computational Efficiency
— Unverified 0UMB@PerAnsSumm 2025: Enhancing Perspective-Aware Summarization with Prompt Optimization and Supervised Fine-Tuning Mar 14, 2025 Community Question Answering Ensemble Learning
— Unverified 0T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation Mar 14, 2025 Attribute Question Answering
Code Code Available 0Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering Mar 14, 2025 Audio Question Answering Question Answering
Code Code Available 3Learning to Inference Adaptively for Multimodal Large Language Models Mar 13, 2025 Hallucination Question Answering
— Unverified 0KV-Distill: Nearly Lossless Learnable Context Compression for LLMs Mar 13, 2025 GPU Question Answering
— Unverified 0Unlock the Power of Unlabeled Data in Language Driving Model Mar 13, 2025 Autonomous Driving Question Answering
— Unverified 0TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs Mar 13, 2025 Benchmarking Question Answering
— Unverified 0How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game Mar 13, 2025 Multimodal Reasoning Question Answering
Code Code Available 1Retrieval-Augmented Generation with Hierarchical Knowledge Mar 13, 2025 Multi-hop Question Answering Question Answering
Code Code Available 4DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding Mar 13, 2025 4k Autonomous Driving
Code Code Available 2On the Limitations of Vision-Language Models in Understanding Image Transforms Mar 12, 2025 Question Answering Video Generation
— Unverified 0Teaching LMMs for Image Quality Scoring and Interpreting Mar 12, 2025 Descriptive Image Quality Assessment
Code Code Available 2Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment Mar 12, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0FaVChat: Unlocking Fine-Grained Facail Video Understanding with Multimodal Large Language Models Mar 12, 2025 Mixture-of-Experts Question Answering
— Unverified 0SurgicalVLM-Agent: Towards an Interactive AI Co-Pilot for Pituitary Surgery Mar 12, 2025 Activity Recognition Anatomy
— Unverified 0Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Mar 12, 2025 Question Answering RAG
Code Code Available 7SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment Mar 12, 2025 Autonomous Driving Bench2Drive
Code Code Available 3DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering Mar 11, 2025 Form Instruction Following
— Unverified 0Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method Mar 11, 2025 Language Modeling Language Modelling
— Unverified 0A Survey on Knowledge-Oriented Retrieval-Augmented Generation Mar 11, 2025 Information Retrieval Natural Language Understanding
— Unverified 0PlainQAFact: Automatic Factuality Evaluation Metric for Biomedical Plain Language Summaries Generation Mar 11, 2025 Question Answering
Code Code Available 0Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation Mar 11, 2025 Computational Efficiency Hallucination
— Unverified 0Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework Mar 11, 2025 Conformal Prediction Multimodal Reasoning
— Unverified 0UniF^2ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models Mar 11, 2025 Attribute Mixture-of-Experts
— Unverified 0FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback Mar 11, 2025 Autonomous Driving Question Answering
— Unverified 0MapQA: Open-domain Geospatial Question Answering on Map Data Mar 10, 2025 Diversity Language Modeling
— Unverified 0Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning Mar 10, 2025 Question Answering
— Unverified 0From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics Mar 10, 2025 Math Question Answering
— Unverified 0MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Mar 10, 2025 Benchmarking Medical Question Answering
Code Code Available 2Towards Fine-Grained Video Question Answering Mar 10, 2025 Language Modeling Language Modelling
— Unverified 0A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis Mar 10, 2025 Question Answering
Code Code Available 2ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA Mar 10, 2025 Multi-hop Question Answering Question Answering
— Unverified 0Talking to GDELT Through Knowledge Graphs Mar 10, 2025 Articles Knowledge Graphs
— Unverified 0KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus Mar 10, 2025 In-Context Learning Question Answering
Code Code Available 0Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru Mar 10, 2025 Autonomous Driving Question Answering
— Unverified 0