| UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting | Feb 25, 2025 | 3DGScross-modal alignment | —Unverified | 0 |
| SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features | Feb 20, 2025 | FairnessImage-text Retrieval | —Unverified | 0 |
| Using tournaments to calculate AUROC for zero-shot classification with LLMs | Feb 20, 2025 | Binary ClassificationClassification | —Unverified | 0 |
| Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning | Feb 19, 2025 | Caption GenerationClassification | —Unverified | 0 |
| Text Classification in the LLM Era - Where do we stand? | Feb 17, 2025 | ClassificationSentiment Analysis | —Unverified | 0 |
| Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering | Feb 13, 2025 | ClassificationPrompt Engineering | —Unverified | 0 |
| From Haystack to Needle: Label Space Reduction for Zero-shot Classification | Feb 12, 2025 | Classificationzero-shot-classification | —Unverified | 0 |
| Captured by Captions: On Memorization and its Mitigation in CLIP Models | Feb 11, 2025 | Image RetrievalMemorization | —Unverified | 0 |
| DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions | Feb 7, 2025 | Anomaly DetectionImage-text Retrieval | —Unverified | 0 |
| Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models | Jan 23, 2025 | Image RetrievalRetrieval | CodeCode Available | 0 |