SOTAVerified

AudioCaps

Papers

Showing 4150 of 64 papers

TitleStatusHype
Estimated Audio-Caption Correspondences Improve Language-Based Audio RetrievalCode0
Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval0
MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and GenerationCode0
Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval0
Text-to-Audio Generation Synchronized with Videos0
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing0
Audiobox: Unified Audio Generation with Natural Language Prompts0
Audio-Visual LLM for Video Understanding0
FLAP: Fast Language-Audio Pre-training0
Generation or Replication: Auscultating Audio Latent Diffusion Models0
Show:102550
← PrevPage 5 of 7Next →

No leaderboard results yet.