GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback Mar 19, 2025 Language Modeling Language Modelling
— Unverified 0KoGNER: A Novel Framework for Knowledge Graph Distillation on Biomedical Named Entity Recognition Mar 19, 2025 Knowledge Distillation Knowledge Graphs
— Unverified 0Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding Mar 18, 2025 document understanding Question Answering
Code Code Available 0Synthetic Data Generation Using Large Language Models: Advances in Text and Code Mar 18, 2025 Code Translation Prompt Engineering
— Unverified 0EIAD: Explainable Industrial Anomaly Detection Via Multi-Modal Large Language Models Mar 18, 2025 Anomaly Detection Defect Detection
— Unverified 0CARE: A QLoRA-Fine Tuned Multi-Domain Chatbot With Fast Learning On Minimal Hardware Mar 18, 2025 Chatbot Question Answering
— Unverified 0Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence Mar 18, 2025 Question Answering Uncertainty Quantification
— Unverified 0Synthetic Clarification and Correction Dialogues about Data-Centric Tasks -- A Teacher-Student Approach Mar 18, 2025 Question Answering Table-based Question Answering
— Unverified 0How much do LLMs learn from negative examples? Mar 18, 2025 Multiple-choice Question Answering
Code Code Available 0Identifying and Mitigating Position Bias of Multi-image Vision-Language Models Mar 18, 2025 Position Question Answering
— Unverified 0HICD: Hallucination-Inducing via Attention Dispersion for Contrastive Decoding to Mitigate Hallucinations in Large Language Models Mar 17, 2025 Hallucination Question Answering
Code Code Available 0RAG-RL: Advancing Retrieval-Augmented Generation via RL and Curriculum Learning Mar 17, 2025 Answer Generation Multi-hop Question Answering
— Unverified 0HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding Mar 17, 2025 Question Answering Scene Understanding
— Unverified 0Unified Autoregressive Visual Generation and Understanding with Continuous Tokens Mar 17, 2025 Image Captioning Image Generation
— Unverified 0Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory Mar 17, 2025 Form GPU
— Unverified 0VITED: Video Temporal Evidence Distillation Mar 17, 2025 Question Answering Video Question Answering
— Unverified 0Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference Mar 17, 2025 Feature Compression Image Compression
— Unverified 0Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions Mar 17, 2025 Question Answering
— Unverified 0MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways Mar 17, 2025 Decision Making Medical Question Answering
— Unverified 0From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration Mar 17, 2025 Denoising Question Answering
— Unverified 0Knowledge-Aware Iterative Retrieval for Multi-Agent Systems Mar 17, 2025 Evidence Selection Large Language Model
— Unverified 0Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding Mar 17, 2025 Attribute MME
— Unverified 0MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG Mar 17, 2025 Information Retrieval Question Answering
Code Code Available 0GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing Mar 16, 2025 Change Detection Image Captioning
— Unverified 0General Table Question Answering via Answer-Formula Joint Generation Mar 16, 2025 Question Answering
— Unverified 0PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models Mar 16, 2025 Machine Unlearning Privacy Preserving
— Unverified 0T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation Mar 14, 2025 Attribute Question Answering
Code Code Available 0Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering Mar 14, 2025 Embodied Question Answering Question Answering
— Unverified 0DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models Mar 14, 2025 Autonomous Driving Computational Efficiency
— Unverified 0MUSS: Multilevel Subset Selection for Relevance and Diversity Mar 14, 2025 Diversity Question Answering
— Unverified 0UMB@PerAnsSumm 2025: Enhancing Perspective-Aware Summarization with Prompt Optimization and Supervised Fine-Tuning Mar 14, 2025 Community Question Answering Ensemble Learning
— Unverified 0Unlock the Power of Unlabeled Data in Language Driving Model Mar 13, 2025 Autonomous Driving Question Answering
— Unverified 0Learning to Inference Adaptively for Multimodal Large Language Models Mar 13, 2025 Hallucination Question Answering
— Unverified 0TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs Mar 13, 2025 Benchmarking Question Answering
— Unverified 0KV-Distill: Nearly Lossless Learnable Context Compression for LLMs Mar 13, 2025 GPU Question Answering
— Unverified 0SurgicalVLM-Agent: Towards an Interactive AI Co-Pilot for Pituitary Surgery Mar 12, 2025 Activity Recognition Anatomy
— Unverified 0On the Limitations of Vision-Language Models in Understanding Image Transforms Mar 12, 2025 Question Answering Video Generation
— Unverified 0FaVChat: Unlocking Fine-Grained Facail Video Understanding with Multimodal Large Language Models Mar 12, 2025 Mixture-of-Experts Question Answering
— Unverified 0Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment Mar 12, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0UniF^2ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models Mar 11, 2025 Attribute Mixture-of-Experts
— Unverified 0Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method Mar 11, 2025 Language Modeling Language Modelling
— Unverified 0Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework Mar 11, 2025 Conformal Prediction Multimodal Reasoning
— Unverified 0DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering Mar 11, 2025 Form Instruction Following
— Unverified 0PlainQAFact: Automatic Factuality Evaluation Metric for Biomedical Plain Language Summaries Generation Mar 11, 2025 Question Answering
Code Code Available 0FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback Mar 11, 2025 Autonomous Driving Question Answering
— Unverified 0A Survey on Knowledge-Oriented Retrieval-Augmented Generation Mar 11, 2025 Information Retrieval Natural Language Understanding
— Unverified 0Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation Mar 11, 2025 Computational Efficiency Hallucination
— Unverified 0KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus Mar 10, 2025 In-Context Learning Question Answering
Code Code Available 0Towards Fine-Grained Video Question Answering Mar 10, 2025 Language Modeling Language Modelling
— Unverified 0Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru Mar 10, 2025 Autonomous Driving Question Answering
— Unverified 0