SOTAVerified
Home/Audio & Speech

Audio & Speech

239 tasks · View all areas

Papers in this area

Showing 110 of 10 papers

TitleStatusHype
Hear Your Code Fail, Voice-Assisted Debugging for Python0
NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech0
MUPAX: Multidimensional Problem Agnostic eXplainable AI0
Autoregressive Speech Enhancement via Acoustic Tokens0
SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks0
Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine0
P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge0
Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison0
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation ModelsCode0
An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments0
Show:102550
TaskPapersResults
Cross-Lingual ASR50
English Conversational Speech Recognition50
Environment Sound Classification50
Retrieval-augmented Few-shot In-context Audio Captioning

Retrieval-augmented few-shot in-context audio captioning is …

50
Sequential skip prediction50
Visual Keyword Spotting

Spot a given query keyword in a silent talking face video

50
Word-level pronunciation scoring

Total score of a word pronunciation.

50
Audio Effects Modeling

Modeling of audio effects such as reverberation, compression…

40
Audio Fingerprint40
Audio-Visual Captioning40
Music Compression40
Music Genre Transfer40
Sound Prompted Semantic Segmentation

Sound prompted semantic segmentation aims to predict a segme…

40
Soundscape evaluation

Evaluation of soundscape in accordance to ISO/TS 12913-2

40
Speaker-Specific Lip to Speech Synthesis

How accurately can we infer an individual’s speech style and…

40
Speech Prompted Semantic Segmentation

Speech prompted semantic segmentation aims to predict semant…

40
Utterance-level pronounciation scoring

Total pronunciation score of an utterance.

40
Vietnamese Speech Recognition40
Vocal technique classification40
Acoustic Question Answering30
Audio Dequantization

Audio Dequantization is a process of estimating the original…

30
Audio Signal Recognition30
Keyword Spotting on Google Speech Commands30
Multi-instrument Music Transcription30
Multimodal Music Generation30
Multi-task Audio Source Seperation30
Underwater Acoustic Classification

Classification of underwater acoustic data

30
Zero-Shot Audio Retrieval30
audio moment retrieval20
Audio-Visual Video Captioning20
Cross-environment ASR20
Drum Transcription in Music (DTM)20
Figure Of Speech Detection20
Music Quality Assessment

Evaluating the quality of music given noise and filtering co…

20
Referring Audio-Visual Segmentation20
Speech Synthesis - Gujarati20
Text to Audio/Video Retrieval20
text-to-audiovisual retrieval20
Timbre Interpolation20
Vocal ensemble separation20
ArzEn Speech Recognition10
Audio-Driven Body Animation10
Audio Multiple Target Classification10
Audio Scene Understanding10
Audio-Video Question Answering (AVQA)10
Audio/Video to Text Retrieval10
Bird Species Classification With Audio-Visual Data10
Cadenza 1 - Task 1 - Headphone

A person with a hearing loss is listening to music via headp…

10
Cadenza 1 - Task 2 - In Car

A person with hearing loss is wearing their hearing aids and…

10
Cross-device ASR10