VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software May 30, 2025 Question Answering Spatial Reasoning
Code Code Available 1Exploring the Impact of Occupational Personas on Domain-Specific QA May 30, 2025 Question Answering
— Unverified 0Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering May 30, 2025 Language Modeling Language Modelling
— Unverified 0Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models May 30, 2025 Image Captioning Question Answering
— Unverified 0A Simple Linear Patch Revives Layer-Pruned Large Language Models May 30, 2025 Knowledge Distillation Question Answering
— Unverified 0Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck May 30, 2025 Question Answering Visual Question Answering
— Unverified 0Drop Dropout on Single-Epoch Language Model Pretraining May 30, 2025 Language Modeling Language Modelling
Code Code Available 0LGAR: Zero-Shot LLM-Guided Neural Ranking for Abstract Screening in Systematic Literature Reviews May 30, 2025 Binary Classification Question Answering
Code Code Available 0Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs May 30, 2025 Fact Checking Hallucination
— Unverified 0Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty? May 30, 2025 Question Answering
Code Code Available 0Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation May 29, 2025 Form Hallucination
— Unverified 0mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation May 29, 2025 Question Answering RAG
— Unverified 0TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine May 29, 2025 Diagnostic Multiple-choice
— Unverified 0MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering May 29, 2025 Medical Question Answering Question Answering
— Unverified 0VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos May 29, 2025 Question Answering Video Generation
Code Code Available 0Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models May 29, 2025 Question Answering Reinforcement Learning (RL)
— Unverified 0Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models May 29, 2025 Autonomous Driving Diagnostic
Code Code Available 3Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking May 29, 2025 Benchmarking Graph Question Answering
— Unverified 0VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning May 29, 2025 Anomaly Detection Descriptive
Code Code Available 2Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs May 29, 2025 Dimensionality Reduction Hallucination
— Unverified 0From Chat Logs to Collective Insights: Aggregative Question Answering May 29, 2025 Chatbot Question Answering
— Unverified 0ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering May 29, 2025 Chart Question Answering Chart Understanding
— Unverified 0Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint May 29, 2025 Image Captioning Question Answering
Code Code Available 1Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability May 29, 2025 Math Mathematical Reasoning
— Unverified 0QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining May 29, 2025 Question Answering Representation Learning
Code Code Available 0Differential Information: An Information-Theoretic Perspective on Preference Optimization May 29, 2025 Inductive Bias Instruction Following
— Unverified 0Spoken question answering for visual queries May 29, 2025 Question Answering Visual Question Answering (VQA)
— Unverified 0Multi-Sourced Compositional Generalization in Visual Question Answering May 29, 2025 Question Answering Visual Question Answering
Code Code Available 0Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning May 29, 2025 Diagnostic Question Answering
Code Code Available 1KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction May 29, 2025 Question Answering
Code Code Available 3Synthetic Document Question Answering in Hungarian May 29, 2025 Optical Character Recognition (OCR) Question Answering
Code Code Available 03DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model May 28, 2025 Language Modeling Language Modelling
— Unverified 0EvolveSearch: An Iterative Self-Evolving Search Agent May 28, 2025 Multi-hop Question Answering Question Answering
— Unverified 0Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data May 28, 2025 Machine Translation Paraphrase Generation
Code Code Available 0Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs May 28, 2025 Question Answering
— Unverified 0Improving QA Efficiency with DistilBERT: Fine-Tuning and Inference on mobile Intel CPUs May 28, 2025 Computational Efficiency CPU
— Unverified 0Climate Finance Bench May 28, 2025 Logical Reasoning Quantization
Code Code Available 0ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room May 28, 2025 Medical Question Answering Question Answering
— Unverified 0Structured Memory Mechanisms for Stable Context Representation in Large Language Models May 28, 2025 Question Answering Text Generation
— Unverified 0NegVQA: Can Vision Language Models Understand Negation? May 28, 2025 Negation Question Answering
— Unverified 0StressTest: Can YOUR Speech LM Handle the Stress? May 28, 2025 Question Answering Sentence
— Unverified 0VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models May 28, 2025 Decision Making Question Answering
Code Code Available 0Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems May 28, 2025 Large Language Model Question Answering
— Unverified 0DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving May 27, 2025 Autonomous Driving Decision Making
— Unverified 0FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering May 27, 2025 Benchmarking Question Answering
Code Code Available 0Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective May 27, 2025 Language Modeling Language Modelling
— Unverified 0DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding May 27, 2025 Benchmarking Change Detection
— Unverified 0Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making May 27, 2025 Decision Making Diagnostic
— Unverified 0Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models May 27, 2025 Question Answering Visual Reasoning
— Unverified 0SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge May 27, 2025 Benchmarking Multiple-choice
— Unverified 0