| VividMed: Vision Language Model with Versatile Visual Grounding for Medicine | Oct 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Towards Foundation Models for 3D Vision: How Close Are We? | Oct 14, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Skipping Computations in Multimodal LLMs | Oct 12, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping | Oct 11, 2024 | MMEQuestion Answering | CodeCode Available | 1 |
| ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models | Oct 7, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration | Oct 6, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning | Oct 1, 2024 | Common Sense ReasoningDeepFake Detection | CodeCode Available | 1 |
| T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition | Sep 29, 2024 | In-Context LearningQuestion Answering | CodeCode Available | 1 |
| Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE | Sep 26, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models | Sep 23, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering | Sep 19, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs | Sep 17, 2024 | Question AnsweringToken Reduction | CodeCode Available | 1 |
| LIME: Less Is More for MLLM Evaluation | Sep 10, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 1 |
| M3-Jepa: Multimodal Alignment via Multi-directional MoE based on the JEPA framework | Sep 9, 2024 | Computational EfficiencyCross-Modal Retrieval | CodeCode Available | 1 |
| V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard? | Aug 20, 2024 | Few-Shot LearningIn-Context Learning | CodeCode Available | 1 |
| Visual Agents as Fast and Slow Thinkers | Aug 16, 2024 | Question AnsweringReasoning Segmentation | CodeCode Available | 1 |
| Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery | Aug 9, 2024 | Contrastive LearningMedical Visual Question Answering | CodeCode Available | 1 |
| Boosting Audio Visual Question Answering via Key Semantic-Aware Cues | Jul 30, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model | Jul 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Learning Trimodal Relation for AVQA with Missing Modality | Jul 23, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning | Jul 22, 2024 | BenchmarkingHallucination | CodeCode Available | 1 |
| Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark | Jul 18, 2024 | GPUImage Retrieval | CodeCode Available | 1 |
| CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation | Jul 1, 2024 | Image-text RetrievalQuestion Answering | CodeCode Available | 1 |
| MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment | Jun 28, 2024 | Answer GenerationImage Captioning | CodeCode Available | 1 |
| STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Question-Answering | Jun 28, 2024 | Medical DiagnosisMedical Question Answering | CodeCode Available | 1 |