SOTAVerified
Home/Audio & Speech

Audio & Speech

239 tasks · View all areas

Papers in this area

Showing 110 of 10 papers

TitleStatusHype
Hear Your Code Fail, Voice-Assisted Debugging for Python0
NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech0
MUPAX: Multidimensional Problem Agnostic eXplainable AI0
Autoregressive Speech Enhancement via Acoustic Tokens0
SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks0
Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine0
P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge0
Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison0
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation ModelsCode0
An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments0
Show:102550
TaskPapersResults
Small-Footprint Keyword Spotting250
Text-Dependent Speaker Verification250
automatic-speech-translation230
Sequence-To-Sequence Speech Recognition220
Chord Recognition210
Speech Tokenization

Speech tokenization is the task of representing speech signa…

210
Text-Independent Speaker Recognition210
Audio-Visual Question Answering (AVQA)200
Audio inpainting

Filling in holes in audio data

190
Packet Loss Concealment

Predicting audio packets lost during transmission.

190
Audio Question Answering180
Zero-shot Audio Classification180
Unsupervised Speech Recognition170
Melody Extraction160
Prosody Prediction

Predicting prosodic prominence from text. This is a 2-way cl…

150
Radar waveform design150
Arabic Speech Recognition140
Music Captioning140
Voice Similarity140
Music Style Transfer130
Noisy Speech Recognition130
Automatic Lyrics Transcription

Automatic Lyrics Transcription is the task of transcribing s…

120
Simultaneous Speech-to-Speech Translation120
Simultaneous Speech-to-Text Translation

Simultaneous Speech-to-Text translation aims to translate co…

120
Drum Transcription110
Audio to Text Retrieval100
Phone-level pronunciation scoring100
Silent Speech Recognition

Interpret speech without acoustic signals

100
Singer Identification100
Self-Supervised Audio Classification90
Video-to-Sound Generation90
Few-Shot Audio Classification

Few-shot classification for audio signals. Presents a unique…

80
Multi-Speaker Source Separation80
Speaker Profiling

Estimation of Physical parameters from Speech data

80
Zero-Shot Multi-Speaker TTS80
Music Performance Rendering

Music performance rendering is the task of generating human-…

70
Speech Interruption Detection

"Overlapping speech is a natural and frequently occurring ph…

70
Vowel Classification70
Audio declipping

Audio declipping is the task of estimating the original audi…

60
Audio Emotion Recognition60
Bird Audio Detection60
Speech Intent Classification60
Speech Language Identification60
text-to-speech translation60
Visually Guided Sound Source Separation

The task of visually guided sound source separation (also re…

60
Zero-shot Audio Captioning

Zero-shot audio captioning aims at automatically generating …

60
Zero-Shot Environment Sound Classification60
Zero-shot Text to Audio Retrieval60
AUDIO-VISUAL QUESTION ANSWERING (MUSIC-AVQA-v2.0)

A more reliable and balanced version of original MUSIC-AVQA …

50
Automatic Phoneme Recognition

Automatic Phoneme Recognition (APR) involves converting spok…

50