SOTAVerified
Home/Audio & Speech

Audio & Speech

239 tasks · View all areas

Papers in this area

Showing 110 of 10 papers

TitleStatusHype
Hear Your Code Fail, Voice-Assisted Debugging for Python0
NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech0
MUPAX: Multidimensional Problem Agnostic eXplainable AI0
Autoregressive Speech Enhancement via Acoustic Tokens0
SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks0
Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine0
P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge0
Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison0
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation ModelsCode0
An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments0
Show:102550
TaskPapersResults
Speech Recognition

Speech Recognition is the task of converting spoken language…

6,433398
Audio Classification

Audio Classification is a machine learning task that involve…

361202
Speech Separation

The task of extracting all overlapping speech sources in a g…

359129
Keyword Spotting

In speech processing, keyword spotting deals with the identi…

407127
Speech Enhancement

Speech Enhancement is a signal processing task that involves…

982122
Speaker Verification

Speaker verification is the verifying the identity of a pers…

74649
Speech Emotion Recognition

Speech Emotion Recognition is a task of speech processing an…

43148
Music Source Separation

Music source separation is the task of decomposing music int…

10743
Speaker Diarization

Speaker Diarization is the task of segmenting and co-indexin…

32840
Speech Synthesis

Speech synthesis is the task of generating speech from some …

1,24931
Speech-to-Text Translation

Translate audio signals of speech in one language into text …

14631
Music Transcription

Music transcription is the task of converting an acoustic mu…

9631
Audio captioning

Audio Captioning is the task of describing audio using text.…

11929
Text to Audio Retrieval2028
Audio Generation

Audio generation (synthesis) is the task of generating raw a…

27026
Sound Event Detection

Sound Event Detection (SED) is the task of recognizing the s…

19425
Text-to-Music Generation3722
Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that inv…

33221
Cover song identification

Cover Song Identification is the task of identifying an alte…

1821
Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) involves converting spoke…

3,01220
Music Modeling

( Image credit: [R-Transformer](https://arxiv.org/pdf/1907.0…

3418
Speech-to-Speech Translation

Speech-to-speech translation (S2ST) consists on translating …

11717
Speaker Identification24815
Beat Tracking

Determine the positions of all beats in a music recording.

1915
Audio Super-Resolution

Audio super-resolution, especially speech, refers to the pro…

2214
Downbeat Tracking

Determine the positions of all downbeats in a music recordin…

1113
Online Beat Tracking413
Audio Tagging

Audio tagging is a task to predict the tags of audio clips. …

8111
Voice Anti-spoofing

Discriminate genuine speech and spoofing attacks

2310
Music Auto-Tagging229
Audio Quality Assessment

Computational audio quality assessment aims to predict the q…

159
Accented Speech Recognition208
Sound Event Localization and Detection

Given multichannel audio input, a sound event detection and …

657
Acoustic Scene Classification

The goal of acoustic scene classification is to classify a t…

1326
Environmental Sound Classification

Classification of Environmental Sounds. Most often sounds fo…

465
Voice Conversion

I remember all the summer days Drinking wine in the sunshine…

5204
Audio Source Separation

Audio Source Separation is the process of separating a mixtu…

1123
Audio Denoising203
Target Sound Extraction

Target Sound Extraction is the task of extracting a sound co…

163
Music Question Answering43
Acoustic Novelty Detection

Detect novel events given acoustic signals, either in domest…

33
Speaker Recognition

Speaker Recognition is the process of identifying or confirm…

4352
Speech Denoising

Obtain the clean speech of the target speaker by suppressing…

652
Music Genre Recognition

Recognizing the genre (e.g. rock, pop, jazz, etc.) of a piec…

102
Music Generation

Musique guitar

3861
Sound Source Localization1041
Direction of Arrival Estimation

Estimating the direction-of-arrival (DOA) of a sound source …

941
Active Speaker Detection631
Lip to Speech Synthesis

Given a silent video of a speaker, generate the correspondin…

131
Active Speaker Localization

Active Speaker Localization (ASL) is the process of spatiall…

51