SOTAVerified

AudioCaps

Papers

Showing 125 of 64 papers

TitleStatusHype
Improving Text-To-Audio Models with Synthetic CaptionsCode5
AudioLDM: Text-to-Audio Generation with Latent Diffusion ModelsCode4
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion ModelCode3
ETTA: Elucidating the Design Space of Text-to-Audio ModelsCode2
GLAP: General contrastive audio-text pretraining across domains and languagesCode2
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio CaptioningCode2
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning PerformanceCode2
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound GenerationCode2
Prefix tuning for automated audio captioningCode1
Audio Retrieval with Natural Language QueriesCode1
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency DistillationCode1
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
Audio Retrieval with WavText5K and CLAP TrainingCode1
On Metric Learning for Audio-Text Cross-Modal RetrievalCode1
Target Sound Extraction with Variable Cross-modality CluesCode1
Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidatesCode1
Bridging Language Gaps in Audio-Text RetrievalCode1
ADIFF: Explaining audio difference using natural languageCode1
LAVCap: LLM-based Audio-Visual Captioning using Optimal TransportCode1
Can Audio Captions Be Evaluated with Image Caption Metrics?Code1
Audio Captioning TransformerCode1
Revisiting Deep Audio-Text Retrieval Through the Lens of TransportationCode1
RECAP: Retrieval-Augmented Audio CaptioningCode1
Separate What You Describe: Language-Queried Audio Source SeparationCode1
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.