| Deduplicating Training Data Makes Language Models Better | Jul 14, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing | Feb 21, 2022 | Few-Shot LearningSentence | CodeCode Available | 2 | 5 |
| DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings | Apr 21, 2022 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 | 5 |
| MedCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval | Jul 2, 2023 | Biomedical Information RetrievalContrastive Learning | CodeCode Available | 2 | 5 |
| DreamLIP: Language-Image Pre-training with Long Captions | Mar 25, 2024 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 | 5 |
| Enhancing Retrieval-Augmented Generation: A Study of Best Practices | Jan 13, 2025 | In-Context LearningRAG | CodeCode Available | 2 | 5 |
| BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model | Aug 20, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric | Dec 16, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| CCTC: A Cross-Sentence Chinese Text Correction Dataset for Native Speakers | Oct 1, 2022 | Grammatical Error CorrectionSentence | CodeCode Available | 2 | 5 |
| Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale | Mar 13, 2024 | Constituency Grammar InductionLanguage Modeling | CodeCode Available | 2 | 5 |
| Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models | Oct 4, 2024 | Dense Video CaptioningSentence | CodeCode Available | 2 | 5 |
| AutoRE: Document-Level Relation Extraction with Large Language Models | Mar 21, 2024 | Document-level Relation ExtractionRelation | CodeCode Available | 2 | 5 |
| "I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset | May 18, 2022 | Sentence | CodeCode Available | 2 | 5 |
| A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models | Oct 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations | Feb 20, 2024 | Sentence | CodeCode Available | 2 | 5 |
| Abstractive Summarization of Spoken andWritten Instructions with BERT | Aug 21, 2020 | Abstractive Text SummarizationArticles | CodeCode Available | 2 | 5 |
| Learning representations of learning representations | Apr 12, 2024 | Sentence | CodeCode Available | 2 | 5 |
| ANAH: Analytical Annotation of Hallucinations in Large Language Models | May 30, 2024 | Generative Question AnsweringHallucination | CodeCode Available | 2 | 5 |
| ARAGOG: Advanced RAG Output Grading | Apr 1, 2024 | Document EmbeddingLanguage Modeling | CodeCode Available | 2 | 5 |
| beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems | Sep 16, 2024 | Collaborative FilteringRecommendation Systems | CodeCode Available | 2 | 5 |
| MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction | Apr 23, 2022 | Grammatical Error CorrectionSentence | CodeCode Available | 2 | 5 |
| One Thousand and One Pairs: A "novel" challenge for long-context language models | Jun 24, 2024 | RetrievalSentence | CodeCode Available | 2 | 5 |
| PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification | Aug 30, 2019 | Paraphrase IdentificationSentence | CodeCode Available | 2 | 5 |
| CLUE: A Chinese Language Understanding Evaluation Benchmark | Apr 13, 2020 | General ClassificationMachine Reading Comprehension | CodeCode Available | 2 | 5 |
| DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models | Mar 15, 2024 | RAGRetrieval | CodeCode Available | 2 | 5 |