SOTAVerified

Text to Audio Retrieval

Papers

Showing 110 of 20 papers

TitleStatusHype
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and DatasetCode2
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetCode2
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language EvaluationCode1
Cross Modal Retrieval with Querybank NormalisationCode1
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
Audio Retrieval with Natural Language QueriesCode1
M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAPCode0
Do Audio-Language Models Understand Linguistic Variations?0
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1InternVideo2-6BR@155.2Unverified
2VASTR@152Unverified
3ONE-PEACER@142.5Unverified
4VALORR@140.1Unverified
5AL-MixGen + Multi-TTAR@134.7Unverified
6QB-Norm+CER@123.9Unverified
#ModelMetricClaimedVerifiedStatus
1PaSST-RoBERTa & Estimated Audio–Caption CorrespondencesR@127.69Unverified
2InternVideo2-6BR@127.2Unverified
3VASTR@126.9Unverified
4PaSST–RoBERTa & GPT-augmentR@126.07Unverified
5ONE-PEACER@122.4Unverified
6VALORR@117.5Unverified
#ModelMetricClaimedVerifiedStatus
1OPTText-to-audio R@10.78Unverified