SOTAVerified

Text-to-Music Generation

Papers

Showing 137 of 37 papers

TitleStatusHype
FLUX that Plays MusicCode13
Fast Timing-Conditioned Latent Audio DiffusionCode7
Stable Audio OpenCode7
Simple and Controllable Music GenerationCode6
MusicLM: Generating Music From TextCode6
Improving Text-To-Audio Models with Synthetic CaptionsCode5
Quality-aware Masked Diffusion Transformer for Enhanced Music GenerationCode4
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised PretrainingCode4
Moûsai: Text-to-Music Generation with Long-Context Latent DiffusionCode4
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-InstrumentCode2
ETTA: Elucidating the Design Space of Text-to-Audio ModelsCode2
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion ModelsCode2
Melody-Guided Music GenerationCode2
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music GenerationCode2
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and CaptioningCode2
Mustango: Toward Controllable Text-to-Music GenerationCode2
PAM: Prompting Audio-Language Models for Audio Quality AssessmentCode2
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion ModelsCode1
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup StrategiesCode1
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation TaskCode1
Music ControlNet: A model similar to SD ControlNetD that can accurately control music generationCode1
Investigating Personalization Methods in Text to Music GenerationCode1
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language EvaluationCode1
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion ModelsCode1
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case Study in Tabletop Role-Playing Games SoundtracksCode0
Noise2Music: Text-conditioned Music Generation with Diffusion Models0
Combining audio control and style transfer using latent diffusion0
ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models0
JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning0
The Interpretation Gap in Text-to-Music Generation Models0
Diff-TONE: Timestep Optimization for iNstrument Editing in Text-to-Music Diffusion Models0
Efficient Neural Music Generation0
MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners0
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation0
Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer0
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation0
Diffusion based Text-to-Music Generation with Global and Local Text based Conditioning0
Show:102550

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AudioLDM2-musicFD_openl3354.05Unverified
2Stable AudioFD_openl3108.69Unverified
3RiffusionFAD13.4Unverified
4MubertFAD9.6Unverified
5MeLoDyFAD5.41Unverified
6MusicGen w/ random melody (1.5B)FAD5Unverified
7MusicLMFAD4Unverified
8Noise2Music spectrogramFAD3.84Unverified
9MusicGen w/o melody (3.3B)FAD3.8Unverified
10UniAudioFAD3.65Unverified
#ModelMetricClaimedVerifiedStatus
1Mustango (non-pretrained)FAD2.09Unverified