| MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark | Jun 5, 2025 | RhythmSpoken Language Understanding | CodeCode Available | 7 |
| OpenVoice: Versatile Instant Voice Cloning | Dec 3, 2023 | RhythmVoice Cloning | CodeCode Available | 7 |
| Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play | May 5, 2025 | AI AgentAutomatic Speech Recognition | CodeCode Available | 3 |
| TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control | Sep 24, 2024 | ClusteringLanguage Modelling | CodeCode Available | 3 |
| Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis | May 16, 2024 | Language ModellingLarge Language Model | CodeCode Available | 3 |
| FlashSpeech: Efficient Zero-Shot Speech Synthesis | Apr 23, 2024 | RhythmSpeech Synthesis | CodeCode Available | 3 |
| SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition | Feb 27, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 |
| EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling | Dec 31, 2023 | 3D Face AnimationDiversity | CodeCode Available | 3 |
| An Electrocardiogram Foundation Model Built on over 10 Million Recordings with External Evaluation across Multiple Domains | Oct 5, 2024 | DiagnosticEvent Detection | CodeCode Available | 2 |
| Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation | Aug 5, 2024 | RhythmSelf-Supervised Learning | CodeCode Available | 2 |
| MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation | Jul 21, 2024 | DiversityMusic Generation | CodeCode Available | 2 |
| AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion | Jun 1, 2024 | Gesture GenerationRhythm | CodeCode Available | 2 |
| Diff-BGM: A Diffusion Model for Video Background Music Generation | May 20, 2024 | DiversityMusic Generation | CodeCode Available | 2 |
| MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models | Mar 14, 2024 | 3D Face AnimationDiversity | CodeCode Available | 2 |
| SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems | Jan 8, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings | Oct 4, 2022 | Gesture GenerationRhythm | CodeCode Available | 2 |
| Unsupervised Speech Decomposition via Triple Information Bottleneck | Apr 23, 2020 | RhythmStyle Transfer | CodeCode Available | 2 |
| ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning | Apr 11, 2025 | Contrastive LearningDeep Learning | CodeCode Available | 1 |
| ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease Diagnosis | Feb 16, 2025 | DiagnosticRhythm | CodeCode Available | 1 |
| Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model | Feb 15, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| ImprovNet -- Generating Controllable Musical Improvisations with Iterative Corruption Refinement | Feb 6, 2025 | Music GenerationRhythm | CodeCode Available | 1 |
| A Multi-Resolution Mutual Learning Network for Multi-Label ECG Classification | Jun 12, 2024 | ECG ClassificationRhythm | CodeCode Available | 1 |
| Singing Voice Graph Modeling for SingFake Detection | Jun 5, 2024 | DeepFake DetectionFace Swapping | CodeCode Available | 1 |
| Perception-Inspired Graph Convolution for Music Understanding Tasks | May 15, 2024 | Graph ClassificationGraph Learning | CodeCode Available | 1 |
| SDEMG: Score-based Diffusion Model for Surface Electromyographic Signal Denoising | Feb 6, 2024 | DenoisingRhythm | CodeCode Available | 1 |
| TSRNet: Simple Framework for Real-time ECG Anomaly Detection with Multimodal Time and Spectrogram Restoration Network | Dec 15, 2023 | Anomaly DetectionRhythm | CodeCode Available | 1 |
| Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion | Dec 7, 2023 | Gesture GenerationRhythm | CodeCode Available | 1 |
| Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark | Nov 23, 2023 | Automatic Lyrics TranscriptionRhythm | CodeCode Available | 1 |
| Music ControlNet: A model similar to SD ControlNetD that can accurately control music generation | Nov 7, 2023 | Music GenerationRhythm | CodeCode Available | 1 |
| Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model | Nov 2, 2023 | Music GenerationRhythm | CodeCode Available | 1 |
| MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation | Sep 19, 2023 | Rhythm | CodeCode Available | 1 |
| LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation | Sep 17, 2023 | Gesture GenerationRhythm | CodeCode Available | 1 |
| Multi-scale Cross-restoration Framework for Electrocardiogram Anomaly Detection | Aug 3, 2023 | Anomaly DetectionDiagnostic | CodeCode Available | 1 |
| AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks | Jul 19, 2023 | RhythmSemantic correspondence | CodeCode Available | 1 |
| Rhythm Modeling for Voice Conversion | Jul 12, 2023 | RhythmVoice Conversion | CodeCode Available | 1 |
| Unsupervised Melody-to-Lyric Generation | May 30, 2023 | DisentanglementRhythm | CodeCode Available | 1 |
| EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation | May 30, 2023 | Gesture GenerationRhythm | CodeCode Available | 1 |
| QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation | May 18, 2023 | Gesture GenerationQuantization | CodeCode Available | 1 |
| scPrisma infers, filters and enhances topological signals in single-cell data using spectral template matching | Feb 27, 2023 | RhythmTemplate Matching | CodeCode Available | 1 |
| Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units | Dec 19, 2022 | RhythmVoice Conversion | CodeCode Available | 1 |
| Self-Supervised PPG Representation Learning Shows High Inter-Subject Variability | Dec 7, 2022 | Activity RecognitionRepresentation Learning | CodeCode Available | 1 |
| A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units | Nov 12, 2022 | RhythmVoice Conversion | CodeCode Available | 1 |
| Multimodality Multi-Lead ECG Arrhythmia Classification using Self-Supervised Learning | Sep 30, 2022 | ECG ClassificationKnowledge Distillation | CodeCode Available | 1 |
| The ReprGesture entry to the GENEA Challenge 2022 | Aug 25, 2022 | DecoderGesture Generation | CodeCode Available | 1 |
| Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion | Aug 18, 2022 | DisentanglementRhythm | CodeCode Available | 1 |
| Detecting beats in the photoplethysmogram: benchmarking open-source algorithms | Jul 19, 2022 | BenchmarkingPhotoplethysmography (PPG) beat detection | CodeCode Available | 1 |
| TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation | May 25, 2022 | Representation LearningRhythm | CodeCode Available | 1 |
| Development of Interpretable Machine Learning Models to Detect Arrhythmia based on ECG Data | May 5, 2022 | BIG-bench Machine LearningFeature Importance | CodeCode Available | 1 |
| ECG Biometric Recognition: Review, System Proposal, and Benchmark Evaluation | Apr 8, 2022 | Rhythm | CodeCode Available | 1 |
| IMLE-Net: An Interpretable Multi-level Multi-channel Model for ECG Classification | Apr 6, 2022 | ECG ClassificationRhythm | CodeCode Available | 1 |