Home/Audio & Speech

Audio & Speech

Papers in this area

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 10 papers

Title	Date	Tasks	Status
Hear Your Code Fail, Voice-Assisted Debugging for Python	Jul 20, 2025	CPUMedical Diagnosis	—Unverified
NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech	Jul 17, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
MUPAX: Multidimensional Problem Agnostic eXplainable AI	Jul 17, 2025	Anatomical Landmark DetectionAudio Classification	—Unverified
Autoregressive Speech Enhancement via Acoustic Tokens	Jul 17, 2025	Speech Enhancement	—Unverified
SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks	Jul 17, 2025	DeepFake DetectionFace Swapping	—Unverified
Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine	Jul 17, 2025	Audio ClassificationAutomatic Speech Recognition	—Unverified
P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge	Jul 15, 2025	Speech Enhancementtext-to-speech	—Unverified
Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison	Jul 15, 2025	Voice Cloning	—Unverified
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models	Jul 15, 2025	Audio Source Separationblind source separation	CodeCode Available
An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments	Jul 14, 2025	Speech-to-Texttext-to-speech	—Unverified

Show:10 25 50

Task	Papers	Results
Speech Recognition Speech Recognition is the task of converting spoken language…	6,433	398
Audio Classification Audio Classification is a machine learning task that involve…	361	202
Speech Separation The task of extracting all overlapping speech sources in a g…	359	129
Keyword Spotting In speech processing, keyword spotting deals with the identi…	407	127
Speech Enhancement Speech Enhancement is a signal processing task that involves…	982	122
Speaker Verification Speaker verification is the verifying the identity of a pers…	746	49
Speech Emotion Recognition Speech Emotion Recognition is a task of speech processing an…	431	48
Music Source Separation Music source separation is the task of decomposing music int…	107	43
Speaker Diarization Speaker Diarization is the task of segmenting and co-indexin…	328	40
Speech Synthesis Speech synthesis is the task of generating speech from some …	1,249	31
Speech-to-Text Translation Translate audio signals of speech in one language into text …	146	31
Music Transcription Music transcription is the task of converting an acoustic mu…	96	31
Audio captioning Audio Captioning is the task of describing audio using text.…	119	29
Text to Audio Retrieval	20	28
Audio Generation Audio generation (synthesis) is the task of generating raw a…	270	26
Sound Event Detection Sound Event Detection (SED) is the task of recognizing the s…	194	25
Text-to-Music Generation	37	22
Text-To-Speech Synthesis Text-To-Speech Synthesis is a machine learning task that inv…	332	21
Cover song identification Cover Song Identification is the task of identifying an alte…	18	21
Automatic Speech Recognition (ASR) Automatic Speech Recognition (ASR) involves converting spoke…	3,012	20
Music Modeling ( Image credit: [R-Transformer](https://arxiv.org/pdf/1907.0…	34	18
Speech-to-Speech Translation Speech-to-speech translation (S2ST) consists on translating …	117	17
Speaker Identification	248	15
Beat Tracking Determine the positions of all beats in a music recording.	19	15
Audio Super-Resolution Audio super-resolution, especially speech, refers to the pro…	22	14
Downbeat Tracking Determine the positions of all downbeats in a music recordin…	11	13
Online Beat Tracking	4	13
Audio Tagging Audio tagging is a task to predict the tags of audio clips. …	81	11
Voice Anti-spoofing Discriminate genuine speech and spoofing attacks	23	10
Music Auto-Tagging	22	9
Audio Quality Assessment Computational audio quality assessment aims to predict the q…	15	9
Accented Speech Recognition	20	8
Sound Event Localization and Detection Given multichannel audio input, a sound event detection and …	65	7
Acoustic Scene Classification The goal of acoustic scene classification is to classify a t…	132	6
Environmental Sound Classification Classification of Environmental Sounds. Most often sounds fo…	46	5
Voice Conversion I remember all the summer days Drinking wine in the sunshine…	520	4
Audio Source Separation Audio Source Separation is the process of separating a mixtu…	112	3
Audio Denoising	20	3
Target Sound Extraction Target Sound Extraction is the task of extracting a sound co…	16	3
Music Question Answering	4	3
Acoustic Novelty Detection Detect novel events given acoustic signals, either in domest…	3	3
Speaker Recognition Speaker Recognition is the process of identifying or confirm…	435	2
Speech Denoising Obtain the clean speech of the target speaker by suppressing…	65	2
Music Genre Recognition Recognizing the genre (e.g. rock, pop, jazz, etc.) of a piec…	10	2
Music Generation Musique guitar	386	1
Sound Source Localization	104	1
Direction of Arrival Estimation Estimating the direction-of-arrival (DOA) of a sound source …	94	1
Active Speaker Detection	63	1
Lip to Speech Synthesis Given a silent video of a speaker, generate the correspondin…	13	1
Active Speaker Localization Active Speaker Localization (ASL) is the process of spatiall…	5	1