| Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation | Jan 10, 2025 | Knowledge DistillationQuestion Answering | —Unverified | 0 |
| LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding | Jan 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Feedback-Driven Vision-Language Alignment with Minimal Human Supervision | Jan 8, 2025 | HallucinationQuestion Answering | —Unverified | 0 |
| KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration | Jan 7, 2025 | Anomaly DetectionAnomaly Segmentation | —Unverified | 0 |
| Visual question answering: from early developments to recent advances -- a survey | Jan 7, 2025 | DescriptiveNatural Language Understanding | —Unverified | 0 |
| ReDiT: Re‑evaluating large visual question answering model confidence by defining input scenario Difficulty and applying Temperature mapping | Jan 6, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation | Jan 6, 2025 | Language Model EvaluationLanguage Modeling | CodeCode Available | 1 |
| Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? | Jan 5, 2025 | Image CaptioningImage to text | CodeCode Available | 1 |
| Accounting for Focus Ambiguity in Visual Questions | Jan 4, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models | Jan 3, 2025 | Binary ClassificationFace Anti-Spoofing | —Unverified | 0 |