| A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges | Jan 4, 2025 | FairnessHallucination | CodeCode Available | 4 | 5 |
| FG-CLIP: Fine-Grained Visual and Textual Alignment | May 8, 2025 | Image-text Retrievalobject-detection | CodeCode Available | 4 | 5 |
| Multi-label Cluster Discrimination for Visual Representation Learning | Jul 24, 2024 | Contrastive LearningImage-text Retrieval | CodeCode Available | 4 | 5 |
| Multimodal Whole Slide Foundation Model for Pathology | Nov 29, 2024 | Cross-Modal Retrievalmodel | CodeCode Available | 4 | 5 |
| Long-CLIP: Unlocking the Long-Text Capability of CLIP | Mar 22, 2024 | Image GenerationImage Retrieval | CodeCode Available | 4 | 5 |
| LLM-Pruner: On the Structural Pruning of Large Language Models | May 19, 2023 | Text Generationzero-shot-classification | CodeCode Available | 3 | 5 |
| DiffCLIP: Differential Attention Meets CLIP | Mar 9, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification | Sep 1, 2024 | Scene ClassificationTransductive Zero-Shot Classification | CodeCode Available | 2 | 5 |
| CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification | Feb 27, 2024 | ClassificationDiagnostic | CodeCode Available | 2 | 5 |
| CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation | Apr 30, 2024 | MambaState Space Models | CodeCode Available | 2 | 5 |