| Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling | Mar 7, 2023 | In-Context LearningLanguage Modeling | CodeCode Available | 5 |
| Fast Inference from Transformers via Speculative Decoding | Nov 30, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| InstructPix2Pix: Learning to Follow Image Editing Instructions | Nov 17, 2022 | Image Editing | CodeCode Available | 5 |
| GigaAM: Efficient Self-Supervised Learner for Speech Recognition | Jun 1, 2025 | Automatic Speech RecognitionLanguage Modeling | CodeCode Available | 4 |
| ImgEdit: A Unified Image Editing Dataset and Benchmark | May 26, 2025 | Image Editing | CodeCode Available | 4 |
| Partition Generative Modeling: Masked Modeling Without Masks | May 24, 2025 | Computational EfficiencyLanguage Modeling | CodeCode Available | 4 |
| Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning | May 23, 2025 | DecoderImage Captioning | CodeCode Available | 4 |
| lmgame-Bench: How Good are LLMs at Playing Games? | May 21, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model | May 6, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 4 |
| Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning | Mar 20, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 4 |