Can We Infer Confidential Properties of Training Data from LLMs? Jun 12, 2025 image-classification Image Classification
— Unverified 0SlotPi: Physics-informed Object-centric Reasoning Models Jun 12, 2025 Object Question Answering
Code Code Available 0MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models Jun 12, 2025 Image Segmentation Medical Diagnosis
— Unverified 0Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors Jun 12, 2025 Question Answering Safety Alignment
Code Code Available 0Think before You Simulate: Symbolic Reasoning to Orchestrate Neural Computation for Counterfactual Question Answering Jun 12, 2025 counterfactual Counterfactual Reasoning
Code Code Available 0HalLoc: Token-level Localization of Hallucinations for Vision Language Models Jun 12, 2025 Hallucination Image Captioning
Code Code Available 0VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos Jun 12, 2025 Question Answering
— Unverified 0Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications Jun 12, 2025 Code Generation Question Answering
Code Code Available 0AC/DC: LLM-based Audio Comprehension via Dialogue Continuation Jun 12, 2025 AudioCaps Audio captioning
— Unverified 0EQA-RM: A Generative Embodied Reward Model with Test-time Scaling Jun 12, 2025 Embodied Question Answering Question Answering
Code Code Available 0CogStream: Context-guided Streaming Video Question Answering Jun 12, 2025 Question Answering Video Question Answering
— Unverified 0Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering Jun 11, 2025 Graph Question Answering Knowledge Graphs
Code Code Available 0CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation Jun 11, 2025 Common Sense Reasoning Question Answering
— Unverified 0ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering Jun 11, 2025 Chart Question Answering Image to text
— Unverified 0OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment Jun 11, 2025 cross-modal alignment Question Answering
Code Code Available 0Bridging the Gap Between Open-Source and Proprietary LLMs in Table QA Jun 11, 2025 Code Generation Language Modeling
Code Code Available 0Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning Jun 11, 2025 In-Context Learning Question Answering
— Unverified 0Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy Jun 11, 2025 Medical Visual Question Answering Question Answering
Code Code Available 0Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos Jun 11, 2025 Question Answering Visual Question Answering
Code Code Available 0Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained Reflection Jun 11, 2025 Medical Question Answering MedQA
Code Code Available 0Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-k Jun 10, 2025 Open-Domain Question Answering Question Answering
— Unverified 0An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models Jun 10, 2025 Action Generation Image Captioning
— Unverified 0PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly Jun 10, 2025 Question Answering Scene Understanding
— Unverified 0WIP: Large Language Model-Enhanced Smart Tutor for Undergraduate Circuit Analysis Jun 10, 2025 Language Modeling Language Modelling
— Unverified 0mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks Jun 10, 2025 Language Identification Question Answering
— Unverified 0Improved LLM Agents for Financial Document Question Answering Jun 10, 2025 Question Answering
— Unverified 0VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks Jun 10, 2025 Multiple-choice Open-Ended Question Answering
— Unverified 0Aligning Text, Images, and 3D Structure Token-by-Token Jun 9, 2025 3D Object Recognition Instruction Following
— Unverified 0HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains Jun 9, 2025 Diagnostic Question Answering
Code Code Available 0Looking Beyond Visible Cues: Implicit Video Question Answering via Dual-Clue Reasoning Jun 9, 2025 Future prediction Question Answering
Code Code Available 0LEANN: A Low-Storage Vector Index Jun 9, 2025 Question Answering RAG
— Unverified 0Federated In-Context Learning: Iterative Refinement for Improved Answer Quality Jun 9, 2025 In-Context Learning Question Answering
— Unverified 0MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs Jun 9, 2025 Hallucination Model Editing
— Unverified 0ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving Jun 9, 2025 Autonomous Driving Imitation Learning
— Unverified 0Cognitive Weave: Synthesizing Abstracted Knowledge with a Spatio-Temporal Resonance Graph Jun 9, 2025 Large Language Model Question Answering
Code Code Available 0Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning Jun 8, 2025 Offline RL Question Answering
— Unverified 0Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Jun 8, 2025 Medical Report Generation Question Answering
— Unverified 0Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning Jun 8, 2025 Attribute Hallucination
— Unverified 0The State-of-the-Art in Lifelog Retrieval: A Review of Progress at the ACM Lifelog Search Challenge Workshop 2022-24 Jun 7, 2025 Question Answering Retrieval
— Unverified 0Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering Jun 7, 2025 In-Context Learning Meta-Learning
— Unverified 0Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques Jun 6, 2025 Benchmarking Model Selection
— Unverified 0BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions Jun 6, 2025 Information Retrieval Question Answering
— Unverified 0DynamicMind: A Tri-Mode Thinking System for Large Language Models Jun 6, 2025 Computational Efficiency Prompt Engineering
— Unverified 0MAPLE: Multi-Agent Adaptive Planning with Long-Term Memory for Table Reasoning Jun 6, 2025 Question Answering Table-based Question Answering
— Unverified 0Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning Jun 5, 2025 Question Answering RAG
Code Code Available 0Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs Jun 5, 2025 cross-modal alignment Dense Captioning
— Unverified 0Ontology-based knowledge representation for bone disease diagnosis: a foundation for safe and sustainable medical artificial intelligence systems Jun 5, 2025 Diagnostic Multimodal Deep Learning
— Unverified 0Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights Jun 5, 2025 Multiple-choice Question Answering
— Unverified 0TextVidBench: A Benchmark for Long Video Scene Text Understanding Jun 5, 2025 Prompt Engineering Question Answering
— Unverified 0Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance Jun 4, 2025 Question Answering Semantic Similarity
— Unverified 0