SOTAVerified

Text to Audio Retrieval

Papers

Showing 1120 of 20 papers

TitleStatusHype
Evaluation of pretrained language models on music understandingCode0
M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAPCode0
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and GenerationCode0
Exploring Train and Test-Time Augmentations for Audio-Language Learning0
WikiMuTe: A web-sourced dataset of semantic descriptions for music audio0
The language of sound search: Examining User Queries in Audio Search Engines0
Do Audio-Language Models Understand Linguistic Variations?0
Dissecting Temporal Understanding in Text-to-Audio Retrieval0
Data leakage in cross-modal retrieval training: A case study0
Matching Text and Audio Embeddings: Exploring Transfer-learning Strategies for Language-based Audio Retrieval0
Show:102550
← PrevPage 2 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1InternVideo2-6BR@155.2Unverified
2VASTR@152Unverified
3ONE-PEACER@142.5Unverified
4VALORR@140.1Unverified
5AL-MixGen + Multi-TTAR@134.7Unverified
6QB-Norm+CER@123.9Unverified
#ModelMetricClaimedVerifiedStatus
1PaSST-RoBERTa & Estimated Audio–Caption CorrespondencesR@127.69Unverified
2InternVideo2-6BR@127.2Unverified
3VASTR@126.9Unverified
4PaSST–RoBERTa & GPT-augmentR@126.07Unverified
5ONE-PEACER@122.4Unverified
6VALORR@117.5Unverified
#ModelMetricClaimedVerifiedStatus
1OPTText-to-audio R@10.78Unverified