SOTAVerified

Text to Audio Retrieval

Papers

Showing 110 of 20 papers

TitleStatusHype
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and DatasetCode2
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetCode2
Audio Retrieval with Natural Language QueriesCode1
Cross Modal Retrieval with Querybank NormalisationCode1
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language EvaluationCode1
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data SetsCode0
Estimated Audio-Caption Correspondences Improve Language-Based Audio RetrievalCode0
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1InternVideo2-6BR@155.2Unverified
2VASTR@152Unverified
3ONE-PEACER@142.5Unverified
4VALORR@140.1Unverified
5AL-MixGen + Multi-TTAR@134.7Unverified
6QB-Norm+CER@123.9Unverified
#ModelMetricClaimedVerifiedStatus
1PaSST-RoBERTa & Estimated Audio–Caption CorrespondencesR@127.69Unverified
2InternVideo2-6BR@127.2Unverified
3VASTR@126.9Unverified
4PaSST–RoBERTa & GPT-augmentR@126.07Unverified
5ONE-PEACER@122.4Unverified
6VALORR@117.5Unverified
#ModelMetricClaimedVerifiedStatus
1OPTText-to-audio R@10.78Unverified