DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering Mar 5, 2025 3D Question Answering (3D-QA) Question Answering
Code Code Available 1ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models Feb 27, 2025 Question Answering RAG
Code Code Available 1FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users Feb 26, 2025 In-Context Learning Meta-Learning
Code Code Available 1UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering Feb 26, 2025 Question Answering
Code Code Available 1MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks Feb 25, 2025 Misinformation Question Answering
Code Code Available 1HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization Feb 24, 2025 Diversity Fact Verification
Code Code Available 1KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse Feb 21, 2025 Question Answering
Code Code Available 1ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model Feb 20, 2025 Mixture-of-Experts Question Answering
Code Code Available 1How to Get Your LLM to Generate Challenging Problems for Evaluation Feb 20, 2025 Code Completion Math
Code Code Available 1Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information Feb 20, 2025 Question Answering
Code Code Available 1Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps Feb 20, 2025 Question Answering
Code Code Available 1PeerQA: A Scientific Question Answering Dataset from Peer Reviews Feb 19, 2025 answerability prediction Answer Generation
Code Code Available 1CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space Feb 18, 2025 Embodied Question Answering Question Answering
Code Code Available 1MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression Feb 17, 2025 Diagnostic Question Answering
Code Code Available 1The Mirage of Model Editing: Revisiting Evaluation in the Wild Feb 16, 2025 Model Editing Question Answering
Code Code Available 1EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering Feb 11, 2025 Question Answering Video Question Answering
Code Code Available 1LM2: Large Memory Models Feb 9, 2025 Decoder MMLU
Code Code Available 1Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs Feb 7, 2025 Federated Learning Medical Question Answering
Code Code Available 1PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models? Feb 6, 2025 Question Answering Referring Expression
Code Code Available 1TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes Feb 4, 2025 Autonomous Driving Multiple-choice
Code Code Available 1Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models Feb 3, 2025 Adversarial Robustness Image Captioning
Code Code Available 1-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation Jan 31, 2025 Question Answering Video Question Answering
Code Code Available 1KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search Jan 31, 2025 Heuristic Search Knowledge Base Question Answering
Code Code Available 1o3-mini vs DeepSeek-R1: Which One is Safer? Jan 30, 2025 Code Generation Program Repair
Code Code Available 1DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing Jan 24, 2025 Language Modeling Language Modelling
Code Code Available 1InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models Jan 19, 2025 Benchmarking Question Answering
Code Code Available 1MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning Jan 13, 2025 Causal Discovery Causal Inference
Code Code Available 1SensorQA: A Question Answering Benchmark for Daily-Life Monitoring Jan 9, 2025 Question Answering
Code Code Available 1ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark Jan 9, 2025 Fairness Hallucination
Code Code Available 1VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models Jan 9, 2025 Benchmarking Mathematical Problem-Solving
Code Code Available 1Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Jan 6, 2025 Language Model Evaluation Language Modeling
Code Code Available 1Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? Jan 5, 2025 Image Captioning Image to text
Code Code Available 1Predicting the Performance of Black-box LLMs through Self-Queries Jan 2, 2025 Question Answering
Code Code Available 1Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering Jan 1, 2025 Large Language Model Multimodal Large Language Model
Code Code Available 1Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner Dec 30, 2024 Question Answering Table Recognition
Code Code Available 1Long Context vs. RAG for LLMs: An Evaluation and Revisits Dec 27, 2024 Question Answering RAG
Code Code Available 1Interacted Object Grounding in Spatio-Temporal Human-Object Interactions Dec 27, 2024 Human-Object Interaction Detection Object
Code Code Available 1Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation Dec 24, 2024 Graph Question Answering Hallucination
Code Code Available 1CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era Dec 24, 2024 Knowledge Base Question Answering Knowledge Graphs
Code Code Available 1LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating Dec 24, 2024 document understanding Question Answering
Code Code Available 1Property Enhanced Instruction Tuning for Multi-task Molecule Generation with Large Language Models Dec 24, 2024 Machine Translation Molecular Property Prediction
Code Code Available 1Resource-Aware Arabic LLM Creation: Model Adaptation, Integration, and Multi-Domain Testing Dec 23, 2024 ArabicMMLU Dialect Identification
Code Code Available 1Beyond End-to-End VLMs: Leveraging Intermediate Text Representations for Superior Flowchart Understanding Dec 21, 2024 Attribute Question Answering
Code Code Available 1Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization Dec 19, 2024 Contrastive Learning Decision Making
Code Code Available 1Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering Dec 18, 2024 graph construction knowledge editing
Code Code Available 1MedCoT: Medical Chain of Thought via Hierarchical Expert Dec 18, 2024 Diagnostic Medical Visual Question Answering
Code Code Available 1EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation Dec 17, 2024 Question Answering RAG
Code Code Available 1MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants Dec 17, 2024 Image Captioning Question Answering
Code Code Available 1SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types Dec 16, 2024 Question Answering
Code Code Available 1UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models Dec 16, 2024 Question Answering
Code Code Available 1