| Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models | Oct 21, 2024 | Instruction Followingobject-detection | CodeCode Available | 0 |
| Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE | Sep 26, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion | Sep 26, 2024 | DescriptiveGeneralized Referring Expression Comprehension | CodeCode Available | 2 |
| FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension | Sep 23, 2024 | Image ComprehensionReferring Expression | CodeCode Available | 1 |
| MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension | Sep 20, 2024 | cross-modal alignmentReferring Expression | CodeCode Available | 1 |
| LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension | Sep 18, 2024 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 1 |
| Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression | Sep 5, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection Training | Aug 20, 2024 | Autonomous VehiclesComputational Efficiency | CodeCode Available | 0 |
| Revisiting Multi-Modal LLM Evaluation | Aug 9, 2024 | Chart UnderstandingOptical Character Recognition | —Unverified | 0 |
| MaskInversion: Localized Embeddings via Optimization of Explainability Maps | Jul 29, 2024 | Image GenerationReferring Expression | —Unverified | 0 |