SOTAVerified

Text to Audio Retrieval

Papers

Showing 110 of 20 papers

TitleStatusHype
M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAPCode0
Do Audio-Language Models Understand Linguistic Variations?0
The language of sound search: Examining User Queries in Audio Search Engines0
Evaluation of pretrained language models on music understandingCode0
Dissecting Temporal Understanding in Text-to-Audio Retrieval0
Estimated Audio-Caption Correspondences Improve Language-Based Audio RetrievalCode0
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
WikiMuTe: A web-sourced dataset of semantic descriptions for music audio0
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language EvaluationCode1
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data SetsCode0
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1InternVideo2-6BR@155.2Unverified
2VASTR@152Unverified
3ONE-PEACER@142.5Unverified
4VALORR@140.1Unverified
5AL-MixGen + Multi-TTAR@134.7Unverified
6QB-Norm+CER@123.9Unverified
#ModelMetricClaimedVerifiedStatus
1PaSST-RoBERTa & Estimated Audio–Caption CorrespondencesR@127.69Unverified
2InternVideo2-6BR@127.2Unverified
3VASTR@126.9Unverified
4PaSST–RoBERTa & GPT-augmentR@126.07Unverified
5ONE-PEACER@122.4Unverified
6VALORR@117.5Unverified
#ModelMetricClaimedVerifiedStatus
1OPTText-to-audio R@10.78Unverified