SOTAVerified

AudioCaps

Papers

Showing 2650 of 64 papers

TitleStatusHype
Revisiting Deep Audio-Text Retrieval Through the Lens of TransportationCode1
Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval0
Text-to-Audio Generation Synchronized with Videos0
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio CaptioningCode2
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing0
Audiobox: Unified Audio Generation with Natural Language Prompts0
Audio-Visual LLM for Video Understanding0
FLAP: Fast Language-Audio Pre-training0
Generation or Replication: Auscultating Audio Latent Diffusion Models0
VoiceLDM: Text-to-Speech with Environmental Context0
Weakly-supervised Automated Audio Captioning via text only trainingCode0
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency DistillationCode1
RECAP: Retrieval-Augmented Audio CaptioningCode1
Retrieval-Augmented Text-to-Audio Generation0
Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?0
Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer0
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment0
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion ModelCode3
Prefix tuning for automated audio captioningCode1
Target Sound Extraction with Variable Cross-modality CluesCode1
Accommodating Audio Modality in CLIP for Multimodal ProcessingCode0
AudioLDM: Text-to-Audio Generation with Latent Diffusion ModelsCode4
Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidatesCode1
Visually-Aware Audio Captioning With Adaptive Audio-Visual AttentionCode1
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.