Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering Jul 17, 2025 Embodied Question Answering Question Answering
— Unverified 0City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning Jul 17, 2025 Question Answering Scene Understanding
— Unverified 0Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It Jul 17, 2025 Question Answering
— Unverified 0From Roots to Rewards: Dynamic Tree Reasoning with RL Jul 17, 2025 Computational Efficiency Question Answering
Code Code Available 0Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility Jul 16, 2025 Language Modeling Language Modelling
— Unverified 0Describe Anything Model for Visual Question Answering on Text-rich Images Jul 16, 2025 Descriptive Language Modeling
Code Code Available 1Warehouse Spatial Question Answering with LLM Agent Jul 14, 2025 Question Answering Spatial Reasoning
Code Code Available 1MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning Jul 9, 2025 Diagnostic Multimodal Reasoning
— Unverified 0Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights Jul 9, 2025 Diagnostic Medical Visual Question Answering
— Unverified 0LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation Jul 9, 2025 Question Answering Visual Question Answering
— Unverified 0Evaluating Attribute Confusion in Fashion Text-to-Image Generation Jul 9, 2025 Attribute cross-modal alignment
— Unverified 0Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling Jul 8, 2025 Articles Multimodal Reasoning
— Unverified 0ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding Jul 7, 2025 Hallucination Question Answering
— Unverified 0Agent-Based Detection and Resolution of Incompleteness and Ambiguity in Interactions with Large Language Models Jul 4, 2025 Question Answering
— Unverified 0AI-VaxGuide: An Agentic RAG-Based LLM for Vaccination Decisions Jul 4, 2025 Question Answering RAG
— Unverified 0OpenTable-R1: A Reinforcement Learning Augmented Tool Agent for Open-Domain Table Question Answering Jul 2, 2025 Language Modeling Language Modelling
Code Code Available 0RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism Jun 30, 2025 Question Answering RAG
Code Code Available 5L0: Reinforcement Learning to Become General Agents Jun 30, 2025 Question Answering reinforcement-learning
Code Code Available 3Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder Jun 28, 2025 Image Segmentation Large Language Model
Code Code Available 1Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models Jun 28, 2025 image-classification Image Classification
Code Code Available 0LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs Jun 27, 2025 Question Answering Video Question Answering
Code Code Available 2Large Language Model Agent for Modular Task Execution in Drug Discovery Jun 26, 2025 Drug Discovery Language Modeling
— Unverified 0ComRAG: Retrieval-Augmented Generation with Dynamic Vector Stores for Real-time Community Question Answering in Industry Jun 26, 2025 Community Question Answering Question Answering
— Unverified 0SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning Jun 26, 2025 In-Context Learning Medical Visual Question Answering
— Unverified 0DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images Jun 26, 2025 document understanding Optical Character Recognition (OCR)
Code Code Available 0IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes Jun 26, 2025 Attribute Question Answering
— Unverified 0Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality Jun 26, 2025 Conformal Prediction Question Answering
Code Code Available 0FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering Jun 25, 2025 Question Answering Visual Question Answering
— Unverified 0Towards Probabilistic Question Answering Over Tabular Data Jun 25, 2025 Natural Language Queries Question Answering
— Unverified 0MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering Jun 25, 2025 Multimodal Reasoning Question Answering
— Unverified 0HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction Jun 25, 2025 Benchmarking Person Identification
Code Code Available 0Semantic-enhanced Modality-asymmetric Retrieval for Online E-commerce Search Jun 25, 2025 Question Answering Retrieval
— Unverified 0Knowledge-Aware Diverse Reranking for Cross-Source Question Answering Jun 25, 2025 Question Answering RAG
— Unverified 0Memento: Note-Taking for Your Future Self Jun 25, 2025 Multi-hop Question Answering Question Answering
— Unverified 0SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models Jun 25, 2025 Code Generation In-Context Learning
— Unverified 0ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset Jun 25, 2025 Computational Efficiency Question Answering
— Unverified 0COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees Jun 25, 2025 Conformal Prediction Question Answering
— Unverified 0KunLunBaizeRAG: Reinforcement Learning Driven Inference Performance Leap for Large Language Models Jun 24, 2025 Multi-hop Question Answering Question Answering
— Unverified 0ToSA: Token Merging with Spatial Awareness Jun 24, 2025 Embodied Question Answering Question Answering
Code Code Available 0Inference Scaled GraphRAG: Improving Multi Hop Question Answering on Knowledge Graphs Jun 24, 2025 Information Retrieval Knowledge Graphs
— Unverified 0Enhancing Biosecurity in Tamper-Resistant Large Language Models With Quantum Gradient Descent Jun 23, 2025 Question Answering Sensitivity
— Unverified 0Semantic similarity estimation for domain specific data using BERT and other techniques Jun 23, 2025 Information Retrieval Machine Translation
— Unverified 0GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning Jun 22, 2025 Answer Generation Decision Making
— Unverified 0Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives Jun 22, 2025 Multi-hop Question Answering Question Answering
— Unverified 0Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective Jun 22, 2025 In-Context Learning Large Language Model
Code Code Available 1Scene-R1: Video-Grounded Large Language Models for 3D Scene Reasoning without 3D Annotations Jun 21, 2025 Question Answering Scene Understanding
— Unverified 0UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making Jun 20, 2025 Decision Making Question Answering
Code Code Available 0General-Purpose Robotic Navigation via LVLM-Orchestrated Perception, Reasoning, and Acting Jun 20, 2025 Embodied Question Answering Question Answering
— Unverified 0How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? Jun 19, 2025 Multiple-choice Question Answering
— Unverified 0Can Common VLMs Rival Medical VLMs? Evaluation and Strategic Insights Jun 19, 2025 Question Answering Visual Question Answering
— Unverified 0