SOTAVerified

AudioCaps

Papers

Showing 5164 of 64 papers

TitleStatusHype
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion0
Audio-Visual LLM for Video Understanding0
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features0
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing0
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval0
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment0
Dissecting Temporal Understanding in Text-to-Audio Retrieval0
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap0
Audio Captioning with Composition of Acoustic and Semantic Information0
Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning0
AudioCaps: Generating Captions for Audios in The Wild0
FLAP: Fast Language-Audio Pre-training0
Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval0
Generation or Replication: Auscultating Audio Latent Diffusion Models0
Show:102550
← PrevPage 2 of 2Next →

No leaderboard results yet.