| COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training | Oct 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| LaViDa: A Large Diffusion Language Model for Multimodal Understanding | May 22, 2025 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 | 5 |
| A Survey on Large Language Model Acceleration based on KV Cache Management | Dec 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| 8-bit Optimizers via Block-wise Quantization | Oct 6, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| Large Language Model based Long-tail Query Rewriting in Taobao Search | Nov 7, 2023 | Contrastive LearningLanguage Modeling | CodeCode Available | 3 | 5 |
| Language Model Inversion | Nov 22, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks | Jun 12, 2024 | BenchmarkingChatbot | CodeCode Available | 3 | 5 |
| Language Models are Few-Shot Learners | May 28, 2020 | answerability predictionArticles | CodeCode Available | 3 | 5 |
| Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey | Feb 8, 2024 | ArticlesEntity Alignment | CodeCode Available | 3 | 5 |
| 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data | Aug 7, 2024 | 16k2k | CodeCode Available | 3 | 5 |