| MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs | Aug 21, 2024 | Contrastive LearningLanguage Modeling | —Unverified | 0 |
| Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework | Aug 21, 2024 | geo-localizationLanguage Modeling | —Unverified | 0 |
| CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering | Aug 21, 2024 | Continual LearningQuestion Answering | CodeCode Available | 0 |
| V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard? | Aug 20, 2024 | Few-Shot LearningIn-Context Learning | CodeCode Available | 1 |
| TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition | Aug 19, 2024 | GPUMulti-Task Learning | CodeCode Available | 0 |
| PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding | Aug 18, 2024 | Language ModellingQuestion Answering | CodeCode Available | 2 |
| FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection | Aug 17, 2024 | Federated LearningMedical Visual Question Answering | CodeCode Available | 0 |
| Beyond the Hype: A dispassionate look at vision-language models in medical scenario | Aug 16, 2024 | Question AnsweringSpatial Reasoning | —Unverified | 0 |
| A Survey on Benchmarks of Multimodal Large Language Models | Aug 16, 2024 | Question AnsweringSurvey | CodeCode Available | 2 |
| Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm | Aug 16, 2024 | Decision MakingMedical Visual Question Answering | CodeCode Available | 0 |
| Visual Agents as Fast and Slow Thinkers | Aug 16, 2024 | Question AnsweringReasoning Segmentation | CodeCode Available | 1 |
| IIU: Independent Inference Units for Knowledge-based Visual Question Answering | Aug 15, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion | Aug 14, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| CROME: Cross-Modal Adapters for Efficient Multimodal LLM | Aug 13, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning | Aug 10, 2024 | HallucinationOptical Character Recognition | CodeCode Available | 11 |
| mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models | Aug 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery | Aug 9, 2024 | Contrastive LearningMedical Visual Question Answering | CodeCode Available | 1 |
| Revisiting Multi-Modal LLM Evaluation | Aug 9, 2024 | Chart UnderstandingOptical Character Recognition | —Unverified | 0 |
| Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models | Aug 8, 2024 | Contrastive LearningFine-Grained Image Recognition | —Unverified | 0 |
| Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation | Aug 7, 2024 | GPUQuestion Answering | —Unverified | 0 |
| GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI | Aug 6, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 2 |
| Targeted Visual Prompting for Medical Visual Question Answering | Aug 6, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 0 |
| LLaVA-OneVision: Easy Visual Task Transfer | Aug 6, 2024 | 3D Question Answering (3D-QA) | CodeCode Available | 0 |
| Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining | Aug 5, 2024 | DecoderDepth Estimation | CodeCode Available | 7 |