SOTAVerified

AudioCaps

Papers

Showing 150 of 64 papers

TitleStatusHype
Improving Text-To-Audio Models with Synthetic CaptionsCode5
AudioLDM: Text-to-Audio Generation with Latent Diffusion ModelsCode4
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion ModelCode3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio CaptioningCode2
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning PerformanceCode2
GLAP: General contrastive audio-text pretraining across domains and languagesCode2
ETTA: Elucidating the Design Space of Text-to-Audio ModelsCode2
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound GenerationCode2
Audio Captioning TransformerCode1
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency DistillationCode1
ADIFF: Explaining audio difference using natural languageCode1
Audio Retrieval with Natural Language QueriesCode1
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
Audio Retrieval with WavText5K and CLAP TrainingCode1
Bridging Language Gaps in Audio-Text RetrievalCode1
Can Audio Captions Be Evaluated with Image Caption Metrics?Code1
Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidatesCode1
LAVCap: LLM-based Audio-Visual Captioning using Optimal TransportCode1
On Metric Learning for Audio-Text Cross-Modal RetrievalCode1
Prefix tuning for automated audio captioningCode1
RECAP: Retrieval-Augmented Audio CaptioningCode1
Revisiting Deep Audio-Text Retrieval Through the Lens of TransportationCode1
Separate What You Describe: Language-Queried Audio Source SeparationCode1
Target Sound Extraction with Variable Cross-modality CluesCode1
Visually-Aware Audio Captioning With Adaptive Audio-Visual AttentionCode1
Estimated Audio-Caption Correspondences Improve Language-Based Audio RetrievalCode0
Weakly-supervised Automated Audio Captioning via text only trainingCode0
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMsCode0
ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution ErrorsCode0
MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and GenerationCode0
Accommodating Audio Modality in CLIP for Multimodal ProcessingCode0
AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGSCode0
Audiobox: Unified Audio Generation with Natural Language Prompts0
IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling0
Joint Speech Recognition and Audio Captioning0
Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?0
Language-based Audio Retrieval with Co-Attention Networks0
TAIL: Text-Audio Incremental Learning0
Leveraging Pre-trained BERT for Audio Captioning0
Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning0
Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval0
VoiceLDM: Text-to-Speech with Environmental Context0
Text-to-Audio Generation Synchronized with Videos0
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model0
AC/DC: LLM-based Audio Comprehension via Dialogue Continuation0
Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer0
Retrieval-Augmented Text-to-Audio Generation0
Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning0
Audio-text Retrieval in Context0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.