Home/Audio & Speech

Audio & Speech

Papers in this area

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 10 papers

Title	Date	Tasks	Status
Hear Your Code Fail, Voice-Assisted Debugging for Python	Jul 20, 2025	CPUMedical Diagnosis	—Unverified
NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech	Jul 17, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
MUPAX: Multidimensional Problem Agnostic eXplainable AI	Jul 17, 2025	Anatomical Landmark DetectionAudio Classification	—Unverified
Autoregressive Speech Enhancement via Acoustic Tokens	Jul 17, 2025	Speech Enhancement	—Unverified
SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks	Jul 17, 2025	DeepFake DetectionFace Swapping	—Unverified
Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine	Jul 17, 2025	Audio ClassificationAutomatic Speech Recognition	—Unverified
P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge	Jul 15, 2025	Speech Enhancementtext-to-speech	—Unverified
Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison	Jul 15, 2025	Voice Cloning	—Unverified
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models	Jul 15, 2025	Audio Source Separationblind source separation	CodeCode Available
An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments	Jul 14, 2025	Speech-to-Texttext-to-speech	—Unverified

Show:10 25 50

Task	Papers	Results
Voice Query Recognition	3	1
fake voice detection	2	1
Speaker Attribution in German Parliamentary Debates (GermEval 2023, subtask 1) Subtask 1 (full task) consists of predicting the cue words t…	1	1
Speaker Attribution in German Parliamentary Debates (GermEval 2023, subtask 2) Subtask 2 (role labelling): Given the gold cue words, the ta…	1	1
Ultrasound	1	1
Automatic Speech Recognition	3,174	0
Vocal Bursts Intensity Prediction predict the intensity of 10 categorical emotions	2,614	0
Text to Speech import gTTS import os def texttospeechkurdish(text, outputfi…	1,419	0
Vocal Bursts Valence Prediction predict the degrees of valence and arousal for the given voc…	1,164	0
Vocal Bursts Type Prediction predict the type of given vocal bursts	675	0
Rhythm	515	0
Speech-to-Text	403	0
Music Information Retrieval	255	0
blind source separation Blind source separation (BSS) is a signal processing techniq…	211	0
Sound Classification	148	0
Speech Representation Learning	131	0
Audio Synthesis	127	0
Music Recommendation	118	0
Voice Cloning Voice cloning is a highly desired feature for personalized s…	112	0
Phoneme Recognition	104	0
Robust Speech Recognition	97	0
Singing Voice Synthesis (Verse 1) Sa bawat hakbang, sa bawat daan May pangarap kang …	81	0
Audio Deepfake Detection Nowadays, deepfake is now generically used by the media or p…	74	0
Audio Signal Processing This is a general task that covers transforming audio inputs…	70	0
Text-Independent Speaker Verification	70	0
Acoustic echo cancellation	69	0
AudioCaps	64	0
Grapheme-to-Phoneme Conversion	62	0
Speaker Separation	58	0
Music Genre Classification	56	0
Target Speaker Extraction Extract the dialogue content of the specified target in a mu…	55	0
Music Classification	52	0
Room Impulse Response (RIR) Room Impulse Response (RIR) is an audio signal processing ta…	52	0
SSVEP Classification of examples recorded under the Steady-State V…	51	0
Bandwidth Extension Bandwidth extension is the task of expanding the bandwidth o…	50	0
Speech Dereverberation Removing reverberation from audio signals	50	0
Acoustic Modelling	49	0
Speech Extraction	48	0
Expressive Speech Synthesis	47	0
Audio Compression	42	0
Speaker anonymization	40	0
audio-visual learning	38	0
Music Emotion Recognition	35	0
Music Tagging	35	0
Audio-Visual Synchronization	32	0
Distant Speech Recognition	30	0
Synthetic Speech Detection Detect fake synthetic speech generated using machine learnin…	29	0
Emotional Speech Synthesis	26	0
Acoustic Unit Discovery	25	0
Audio-Visual Active Speaker Detection Determine if and when each visible person in the video is sp…	25	0