SOTAVerified

AudioCaps

Papers

Showing 2130 of 64 papers

TitleStatusHype
Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval0
Improving Text-To-Audio Models with Synthetic CaptionsCode5
MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and GenerationCode0
Bridging Language Gaps in Audio-Text RetrievalCode1
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound GenerationCode2
Revisiting Deep Audio-Text Retrieval Through the Lens of TransportationCode1
Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval0
Text-to-Audio Generation Synchronized with Videos0
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio CaptioningCode2
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.