| MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark | Jun 5, 2025 | RhythmSpoken Language Understanding | CodeCode Available | 7 | 5 |
| OpenVoice: Versatile Instant Voice Cloning | Dec 3, 2023 | RhythmVoice Cloning | CodeCode Available | 7 | 5 |
| EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling | Dec 31, 2023 | 3D Face AnimationDiversity | CodeCode Available | 3 | 5 |
| FlashSpeech: Efficient Zero-Shot Speech Synthesis | Apr 23, 2024 | RhythmSpeech Synthesis | CodeCode Available | 3 | 5 |
| Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play | May 5, 2025 | AI AgentAutomatic Speech Recognition | CodeCode Available | 3 | 5 |
| SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition | Feb 27, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 | 5 |
| TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control | Sep 24, 2024 | ClusteringLanguage Modelling | CodeCode Available | 3 | 5 |
| Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis | May 16, 2024 | Language ModellingLarge Language Model | CodeCode Available | 3 | 5 |
| MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models | Mar 14, 2024 | 3D Face AnimationDiversity | CodeCode Available | 2 | 5 |
| MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation | Jul 21, 2024 | DiversityMusic Generation | CodeCode Available | 2 | 5 |