SOTAVerified
Home/Audio & Speech

Audio & Speech

239 tasks · View all areas

Papers in this area

Showing 110 of 10 papers

TitleStatusHype
Hear Your Code Fail, Voice-Assisted Debugging for Python0
NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech0
MUPAX: Multidimensional Problem Agnostic eXplainable AI0
Autoregressive Speech Enhancement via Acoustic Tokens0
SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks0
Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine0
P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge0
Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison0
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation ModelsCode0
An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments0
Show:102550
TaskPapersResults
Voice Query Recognition31
fake voice detection21
Speaker Attribution in German Parliamentary Debates (GermEval 2023, subtask 1)

Subtask 1 (full task) consists of predicting the cue words t…

11
Speaker Attribution in German Parliamentary Debates (GermEval 2023, subtask 2)

Subtask 2 (role labelling): Given the gold cue words, the ta…

11
Ultrasound11
Automatic Speech Recognition3,1740
Vocal Bursts Intensity Prediction

predict the intensity of 10 categorical emotions

2,6140
Text to Speech

import gTTS import os def texttospeechkurdish(text, outputfi…

1,4190
Vocal Bursts Valence Prediction

predict the degrees of valence and arousal for the given voc…

1,1640
Vocal Bursts Type Prediction

predict the type of given vocal bursts

6750
Rhythm5150
Speech-to-Text4030
Music Information Retrieval2550
blind source separation

Blind source separation (BSS) is a signal processing techniq…

2110
Sound Classification1480
Speech Representation Learning1310
Audio Synthesis1270
Music Recommendation1180
Voice Cloning

Voice cloning is a highly desired feature for personalized s…

1120
Phoneme Recognition1040
Robust Speech Recognition970
Singing Voice Synthesis

(Verse 1) Sa bawat hakbang, sa bawat daan May pangarap kang …

810
Audio Deepfake Detection

Nowadays, deepfake is now generically used by the media or p…

740
Audio Signal Processing

This is a general task that covers transforming audio inputs…

700
Text-Independent Speaker Verification700
Acoustic echo cancellation690
AudioCaps640
Grapheme-to-Phoneme Conversion620
Speaker Separation580
Music Genre Classification560
Target Speaker Extraction

Extract the dialogue content of the specified target in a mu…

550
Music Classification520
Room Impulse Response (RIR)

Room Impulse Response (RIR) is an audio signal processing ta…

520
SSVEP

Classification of examples recorded under the Steady-State V…

510
Bandwidth Extension

Bandwidth extension is the task of expanding the bandwidth o…

500
Speech Dereverberation

Removing reverberation from audio signals

500
Acoustic Modelling490
Speech Extraction480
Expressive Speech Synthesis470
Audio Compression420
Speaker anonymization400
audio-visual learning380
Music Emotion Recognition350
Music Tagging350
Audio-Visual Synchronization320
Distant Speech Recognition300
Synthetic Speech Detection

Detect fake synthetic speech generated using machine learnin…

290
Emotional Speech Synthesis260
Acoustic Unit Discovery250
Audio-Visual Active Speaker Detection

Determine if and when each visible person in the video is sp…

250