SOTAVerified

Text to Audio Retrieval

Papers

Showing 1120 of 20 papers

TitleStatusHype
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and DatasetCode2
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetCode2
Data leakage in cross-modal retrieval training: A case study0
Exploring Train and Test-Time Augmentations for Audio-Language Learning0
Matching Text and Audio Embeddings: Exploring Transfer-learning Strategies for Language-based Audio Retrieval0
Cross Modal Retrieval with Querybank NormalisationCode1
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and GenerationCode0
Audio Retrieval with Natural Language QueriesCode1
Show:102550
← PrevPage 2 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1InternVideo2-6BR@155.2Unverified
2VASTR@152Unverified
3ONE-PEACER@142.5Unverified
4VALORR@140.1Unverified
5AL-MixGen + Multi-TTAR@134.7Unverified
6QB-Norm+CER@123.9Unverified
#ModelMetricClaimedVerifiedStatus
1PaSST-RoBERTa & Estimated Audio–Caption CorrespondencesR@127.69Unverified
2InternVideo2-6BR@127.2Unverified
3VASTR@126.9Unverified
4PaSST–RoBERTa & GPT-augmentR@126.07Unverified
5ONE-PEACER@122.4Unverified
6VALORR@117.5Unverified
#ModelMetricClaimedVerifiedStatus
1OPTText-to-audio R@10.78Unverified