| Starbucks: Improved Training for 2D Matryoshka Embeddings | Oct 17, 2024 | Language Modellingtext similarity | CodeCode Available | 1 |
| SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs | Oct 12, 2024 | AudioCapsAudio captioning | CodeCode Available | 0 |
| Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation | Oct 11, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| On the Evaluation of Generative Robotic Simulations | Oct 10, 2024 | Diversitytext similarity | —Unverified | 0 |
| Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments | Oct 9, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | —Unverified | 0 |
| VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models | Oct 1, 2024 | Hallucinationtext similarity | —Unverified | 0 |
| Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings | Sep 12, 2024 | FADImage Captioning | —Unverified | 0 |
| Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models? | Sep 4, 2024 | Information RetrievalRetrieval | CodeCode Available | 1 |
| What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations | Sep 4, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design | Aug 22, 2024 | Information RetrievalReranking | —Unverified | 0 |