| VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning | Sep 10, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis | Sep 10, 2024 | Contrastive LearningCross-Modal Retrieval | CodeCode Available | 2 |
| Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding | Sep 10, 2024 | HallucinationImage Captioning | —Unverified | 0 |
| LIME: Less Is More for MLLM Evaluation | Sep 10, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 1 |
| M3-Jepa: Multimodal Alignment via Multi-directional MoE based on the JEPA framework | Sep 9, 2024 | Computational EfficiencyCross-Modal Retrieval | CodeCode Available | 1 |
| Breaking Neural Network Scaling Laws with Modularity | Sep 9, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| POINTS: Improving Your Vision-language Model with Affordable Strategies | Sep 7, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes | Sep 6, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving | Sep 5, 2024 | Autonomous DrivingMotion Planning | —Unverified | 0 |
| MOSMOS: Multi-organ segmentation facilitated by medical report supervision | Sep 4, 2024 | Contrastive LearningOrgan Segmentation | —Unverified | 0 |
| How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model? | Sep 3, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 0 |
| Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models | Sep 3, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Kvasir-VQA: A Text-Image Pair GI Tract Dataset | Sep 2, 2024 | Image CaptioningImage Generation | CodeCode Available | 0 |
| Look, Learn and Leverage (L^3): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment | Aug 30, 2024 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering | Aug 30, 2024 | DecoderLanguage Modeling | —Unverified | 0 |
| M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation | Aug 29, 2024 | Instruction FollowingMedical Report Generation | —Unverified | 0 |
| CogVLM2: Visual Language Models for Image and Video Understanding | Aug 29, 2024 | MM-VetMVBench | CodeCode Available | 9 |
| Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail | Aug 28, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Can SAR improve RSVQA performance? | Aug 28, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant for Semiconductor Electron Micrograph Analysis | Aug 27, 2024 | Instruction FollowingQuestion Answering | —Unverified | 0 |
| Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis | Aug 27, 2024 | BenchmarkingLarge Language Model | —Unverified | 0 |
| Evaluating Attribute Comprehension in Large Vision-Language Models | Aug 25, 2024 | AttributeImage-text matching | CodeCode Available | 0 |
| Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering | Aug 24, 2024 | knowledge editingOpen-Domain Question Answering | —Unverified | 0 |
| Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption | Aug 23, 2024 | Instruction FollowingKnowledge Distillation | —Unverified | 0 |
| MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |