| NeoBERT: A Next-Generation BERT | Feb 26, 2025 | In-Context LearningMTEB Benchmark | CodeCode Available | 2 |
| KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model | Jan 2, 2025 | MTEB BenchmarkRetrieval-augmented Generation | CodeCode Available | 2 |
| Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents | Oct 30, 2023 | Information RetrievalMTEB Benchmark | CodeCode Available | 1 |
| C-Pack: Packed Resources For General Chinese Embeddings | Sep 14, 2023 | MTEB Benchmark | CodeCode Available | 1 |
| GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training | May 30, 2025 | MTEB BenchmarkNatural Language Inference | —Unverified | 0 |
| Optimization of embeddings storage for RAG systems using quantization and dimensionality reduction techniques | Apr 30, 2025 | Dimensionality ReductionMTEB Benchmark | —Unverified | 0 |
| FaMTEB: Massive Text Embedding Benchmark in Persian Language | Feb 17, 2025 | ChatbotMTEB Benchmark | —Unverified | 0 |
| GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings | Oct 18, 2024 | Contrastive LearningMTEB Benchmark | CodeCode Available | 0 |
| Contextual Document Embeddings | Oct 3, 2024 | Contrastive LearningDocument Embedding | —Unverified | 0 |
| jina-embeddings-v3: Multilingual Embeddings With Task LoRA | Sep 16, 2024 | MTEB BenchmarkRepresentation Learning | —Unverified | 0 |
| A Bi-metric Framework for Fast Similarity Search | Jun 5, 2024 | MTEB BenchmarkRe-Ranking | CodeCode Available | 0 |
| Recent advances in text embedding: A Comprehensive Review of Top-Performing Methods on the MTEB Benchmark | May 27, 2024 | DiversityMTEB Benchmark | —Unverified | 0 |
| Text Embeddings by Weakly-Supervised Contrastive Pre-training | Dec 7, 2022 | MTEB BenchmarkOnly Connect Walls Dataset Task 1 (Grouping) | CodeCode Available | 0 |