| Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models | Jun 10, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| A Model of Cross-Lingual Knowledge-Grounded Response Generation for Open-Domain Dialogue Systems | Nov 1, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Hierarchical Transformers Are More Efficient Language Models | Oct 26, 2021 | Image GenerationLanguage Modeling | CodeCode Available | 1 | 5 |
| ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing | Mar 4, 2023 | DiversityImage Captioning | CodeCode Available | 1 | 5 |
| Automatic Controllable Product Copywriting for E-Commerce | Jun 21, 2022 | Aspect ExtractionLanguage Modeling | CodeCode Available | 1 | 5 |
| Dealing with Typos for BERT-based Passage Retrieval and Ranking | Aug 27, 2021 | Information RetrievalLanguage Modeling | CodeCode Available | 1 | 5 |
| AdaSplash: Adaptive Sparse Flash Attention | Feb 17, 2025 | GPULanguage Modeling | CodeCode Available | 1 | 5 |
| CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation | Jul 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Data Augmentation using Pre-trained Transformer Models | Mar 4, 2020 | Data AugmentationDiversity | CodeCode Available | 1 | 5 |
| Democratizing Reasoning Ability: Tailored Learning from Large Language Model | Oct 20, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 1 | 5 |
| Scalable-Softmax Is Superior for Attention | Jan 31, 2025 | Information RetrievalLanguage Modeling | CodeCode Available | 1 | 5 |
| CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models | Feb 20, 2025 | BlockingLanguage Modeling | CodeCode Available | 1 | 5 |
| Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices | Oct 2, 2024 | GPULanguage Modeling | CodeCode Available | 1 | 5 |
| LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement | Apr 22, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| Making Language Models Better Tool Learners with Execution Feedback | May 22, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| hmBERT: Historical Multilingual Language Models for Named Entity Recognition | May 31, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation? | Feb 17, 2025 | Knowledge DistillationLanguage Modeling | CodeCode Available | 1 | 5 |
| Picard understanding Darmok: A Dataset and Model for Metaphor-Rich Translation in a Constructed Language | Jul 16, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Scaling Large Language Model-based Multi-Agent Collaboration | Jun 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Housekeep: Tidying Virtual Households using Commonsense Reasoning | May 22, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| DART-Eval: A Comprehensive DNA Language Model Evaluation Benchmark on Regulatory DNA | Dec 6, 2024 | counterfactualLanguage Model Evaluation | CodeCode Available | 1 | 5 |
| LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts | Dec 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| LLMs Can Simulate Standardized Patients via Agent Coevolution | Dec 16, 2024 | DiagnosticLanguage Modeling | CodeCode Available | 1 | 5 |
| How Language Model Hallucinations Can Snowball | May 22, 2023 | HallucinationLanguage Modeling | CodeCode Available | 1 | 5 |
| DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration | Jun 6, 2025 | Computational EfficiencyLanguage Modeling | CodeCode Available | 1 | 5 |