| DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation | Jul 14, 2025 | DecoderGPU | CodeCode Available | 0 |
| Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography | Jun 16, 2025 | breast density classificationClassification | —Unverified | 0 |
| Harmonizing and Merging Source Models for CLIP-based Domain Generalization | Jun 11, 2025 | Domain Generalizationzero-shot-classification | —Unverified | 0 |
| Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Jun 10, 2025 | Contrastive LearningImage-text matching | CodeCode Available | 1 |
| GeoVision Labeler: Zero-Shot Geospatial Classification with Vision and Language Models | May 30, 2025 | ClassificationDisaster Response | CodeCode Available | 2 |
| Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation | May 25, 2025 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| AmorLIP: Efficient Language-Image Pretraining via Amortization | May 25, 2025 | Contrastive LearningRepresentation Learning | CodeCode Available | 0 |
| Beginning with You: Perceptual-Initialization Improves Vision-Language Representation and Alignment | May 20, 2025 | Representation LearningRetrieval | —Unverified | 0 |
| From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection | May 19, 2025 | feature selectionOut-of-Distribution Generalization | CodeCode Available | 1 |
| Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption | May 19, 2025 | Knowledge DistillationTest-time Adaptation | CodeCode Available | 0 |
| StarFT: Robust Fine-tuning of Zero-shot Models via Spuriosity Alignment | May 19, 2025 | zero-shot-classificationZero-Shot Learning | CodeCode Available | 0 |
| Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner | May 16, 2025 | Cross-Modal RetrievalDiagnostic | CodeCode Available | 2 |
| Advanced Crash Causation Analysis for Freeway Safety: A Large Language Model Approach to Identifying Key Contributing Factors | May 15, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining | May 12, 2025 | Audio captioningAudio Generation | —Unverified | 0 |
| Image Classification Using a Diffusion Model as a Pre-Training Model | May 11, 2025 | Contrastive Learningimage-classification | —Unverified | 0 |
| MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks | May 9, 2025 | DiagnosticInstruction Following | CodeCode Available | 1 |
| FG-CLIP: Fine-Grained Visual and Textual Alignment | May 8, 2025 | Image-text Retrievalobject-detection | CodeCode Available | 4 |
| Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning | May 6, 2025 | Representation Learningzero-shot-classification | —Unverified | 0 |
| On the effectiveness of Large Language Models in the mechanical design domain | May 2, 2025 | ClassificationSentence | CodeCode Available | 0 |
| Helping Large Language Models Protect Themselves: An Enhanced Filtering and Summarization System | May 2, 2025 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| Tell Me What You Know About Sexism: Expert-LLM Interaction Strategies and Co-Created Definitions for Zero-Shot Sexism Detection | Apr 21, 2025 | zero-shot-classificationZero-Shot Learning | CodeCode Available | 0 |
| RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Radiology with Zero-Shot Multi-Task Capability | Apr 10, 2025 | Contrastive LearningOpen Vocabulary Semantic Segmentation | —Unverified | 0 |
| Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction | Apr 4, 2025 | AttributeLanguage Modeling | CodeCode Available | 1 |
| Refining CLIP's Spatial Awareness: A Visual-Centric Perspective | Apr 3, 2025 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization | Mar 31, 2025 | Contrastive Learningimage-classification | —Unverified | 0 |