SOTAVerified

AudioCaps

Papers

Showing 2650 of 64 papers

TitleStatusHype
Visually-Aware Audio Captioning With Adaptive Audio-Visual AttentionCode1
Dissecting Temporal Understanding in Text-to-Audio Retrieval0
Audio Captioning with Composition of Acoustic and Semantic Information0
Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning0
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion0
Audio-Visual LLM for Video Understanding0
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features0
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing0
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval0
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment0
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap0
AudioCaps: Generating Captions for Audios in The Wild0
FLAP: Fast Language-Audio Pre-training0
Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval0
Generation or Replication: Auscultating Audio Latent Diffusion Models0
Audiobox: Unified Audio Generation with Natural Language Prompts0
IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling0
Joint Speech Recognition and Audio Captioning0
Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?0
Language-based Audio Retrieval with Co-Attention Networks0
TAIL: Text-Audio Incremental Learning0
Leveraging Pre-trained BERT for Audio Captioning0
Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning0
Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval0
VoiceLDM: Text-to-Speech with Environmental Context0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.