CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs Dec 3, 2024 Image Captioning Quantization
— Unverified 0QA-TOOLBOX: Conversational Question-Answering for process task guidance in manufacturing Dec 3, 2024 Conversational Question Answering Data Augmentation
— Unverified 0Semantic Tokens in Retrieval Augmented Generation Dec 3, 2024 Decision Making Question Answering
— Unverified 0Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining Dec 3, 2024 backdoor defense Computational Efficiency
Code Code Available 1Mastering Board Games by External and Internal Planning with Language Models Dec 2, 2024 Board Games Language Modeling
— Unverified 0AlignFormer: Modality Matching Can Achieve Better Zero-shot Instruction-Following Speech-LLM Dec 2, 2024 Instruction Following Question Answering
— Unverified 0GraphOTTER: Evolving LLM-based Graph Reasoning for Complex Table Question Answering Dec 2, 2024 Question Answering
Code Code Available 1Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking Dec 2, 2024 Benchmarking Decision Making
— Unverified 0Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks Dec 2, 2024 Multi-Object Tracking Object Tracking
Code Code Available 0SEAL: Semantic Attention Learning for Long Video Representation Dec 2, 2024 Diversity Question Answering
— Unverified 0Understanding the World's Museums through Vision-Language Reasoning Dec 2, 2024 Benchmarking Question Answering
Code Code Available 0LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Dec 2, 2024 Embodied Question Answering Question Answering
Code Code Available 2PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos Dec 2, 2024 Question Answering Video Understanding
Code Code Available 1Unlocking Video-LLM via Agent-of-Thoughts Distillation Dec 2, 2024 Language Modeling Language Modelling
— Unverified 0Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages Dec 1, 2024 ARC Multiple-choice
— Unverified 0Learn to Unlearn: Meta-Learning-Based Knowledge Graph Embedding Unlearning Dec 1, 2024 Graph Embedding Knowledge Graph Embedding
— Unverified 0Generative Language Models Potential for Requirement Engineering Applications: Insights into Current Strengths and Limitations Dec 1, 2024 NER Prompt Engineering
— Unverified 0KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting Dec 1, 2024 Multiple-choice Multiple Choice Question Answering (MCQA)
Code Code Available 0Improving Vietnamese Legal Document Retrieval using Synthetic Data Dec 1, 2024 Information Retrieval Question Answering
— Unverified 0DynRank: Improving Passage Retrieval with Dynamic Zero-Shot Prompting Based on Question Classification Nov 30, 2024 Open-Domain Question Answering Passage Retrieval
— Unverified 0DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness Nov 29, 2024 Optical Character Recognition (OCR) Question Answering
Code Code Available 0STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training Nov 29, 2024 Question Answering Video Understanding
— Unverified 0Actions and Objects Pathways for Domain Adaptation in Video Question Answering Nov 29, 2024 Domain Adaptation Domain Generalization
— Unverified 0PerLA: Perceptive 3D Language Assistant Nov 29, 2024 Dense Captioning Graph Neural Network
Code Code Available 1SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks Nov 29, 2024 Question Answering Visual Question Answering
Code Code Available 0Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark Nov 29, 2024 Benchmarking Grounded Video Question Answering
— Unverified 0COLD: Causal reasOning in cLosed Daily activities Nov 29, 2024 Causal Inference Commonsense Causal Reasoning
Code Code Available 0TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension Nov 29, 2024 8k Question Answering
Code Code Available 0Unimib Assistant: designing a student-friendly RAG-based chatbot for all their needs Nov 29, 2024 All Chatbot
— Unverified 0Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers Nov 28, 2024 Image Captioning image-classification
— Unverified 0Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs Nov 28, 2024 Attribute Hallucination
— Unverified 0DIESEL -- Dynamic Inference-Guidance via Evasion of Semantic Embeddings in LLMs Nov 28, 2024 Question Answering Reranking
— Unverified 0ElectroVizQA: How well do Multi-modal LLMs perform in Electronics Visual Question Answering? Nov 27, 2024 Question Answering Visual Question Answering
— Unverified 0Active Data Curation Effectively Distills Large-Scale Multimodal Models Nov 27, 2024 Decoder Image Captioning
— Unverified 03D Scene Graph Guided Vision-Language Pre-training Nov 27, 2024 3D dense captioning 3D visual grounding
— Unverified 0GeneQuery: A General QA-based Framework for Spatial Gene Expression Predictions from Histology Images Nov 27, 2024 Question Answering whole slide images
Code Code Available 0Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track Nov 27, 2024 Medical Question Answering Question Answering
— Unverified 0Can bidirectional encoder become the ultimate winner for downstream applications of foundation models? Nov 27, 2024 Language Modeling Language Modelling
— Unverified 0HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation Nov 27, 2024 Graph Generation Question Answering
— Unverified 0DRS: Deep Question Reformulation With Structured Output Nov 27, 2024 Question Answering
Code Code Available 0Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation Nov 27, 2024 Information Retrieval Part-Of-Speech Tagging
— Unverified 0VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format Nov 27, 2024 Dense Video Captioning Grounded Video Question Answering
Code Code Available 1Cross-modal Information Flow in Multimodal Large Language Models Nov 27, 2024 Question Answering Visual Question Answering
Code Code Available 1SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation Nov 27, 2024 Question Answering Speech Enhancement
— Unverified 0Efficient Multi-modal Large Language Models via Visual Token Grouping Nov 26, 2024 Image Captioning Question Answering
— Unverified 0Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering Nov 26, 2024 Prognosis Question Answering
Code Code Available 2Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey Nov 26, 2024 Natural Language Understanding Question Answering
— Unverified 0Scaling Speech-Text Pre-training with Synthetic Interleaved Data Nov 26, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 7Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment Nov 26, 2024 Image Quality Assessment Question Answering
Code Code Available 2Task Progressive Curriculum Learning for Robust Visual Question Answering Nov 26, 2024 Data Augmentation Ensemble Learning
— Unverified 0