Papers in this area
Showing 1–10 of 10 papers
| Task | Papers | Results |
|---|---|---|
| Speech Recognition Speech Recognition is the task of converting spoken language… | 6,433 | 398 |
| Audio Classification Audio Classification is a machine learning task that involve… | 361 | 202 |
| Speech Separation The task of extracting all overlapping speech sources in a g… | 359 | 129 |
| Keyword Spotting In speech processing, keyword spotting deals with the identi… | 407 | 127 |
| Speech Enhancement Speech Enhancement is a signal processing task that involves… | 982 | 122 |
| Speaker Verification Speaker verification is the verifying the identity of a pers… | 746 | 49 |
| Speech Emotion Recognition Speech Emotion Recognition is a task of speech processing an… | 431 | 48 |
| Music Source Separation Music source separation is the task of decomposing music int… | 107 | 43 |
| Speaker Diarization Speaker Diarization is the task of segmenting and co-indexin… | 328 | 40 |
| Speech Synthesis Speech synthesis is the task of generating speech from some … | 1,249 | 31 |
| Speech-to-Text Translation Translate audio signals of speech in one language into text … | 146 | 31 |
| Music Transcription Music transcription is the task of converting an acoustic mu… | 96 | 31 |
| Audio captioning Audio Captioning is the task of describing audio using text.… | 119 | 29 |
| Text to Audio Retrieval | 20 | 28 |
| Audio Generation Audio generation (synthesis) is the task of generating raw a… | 270 | 26 |
| Sound Event Detection Sound Event Detection (SED) is the task of recognizing the s… | 194 | 25 |
| Text-to-Music Generation | 37 | 22 |
| Text-To-Speech Synthesis Text-To-Speech Synthesis is a machine learning task that inv… | 332 | 21 |
| Cover song identification Cover Song Identification is the task of identifying an alte… | 18 | 21 |
| Automatic Speech Recognition (ASR) Automatic Speech Recognition (ASR) involves converting spoke… | 3,012 | 20 |
| Music Modeling ( Image credit: [R-Transformer](https://arxiv.org/pdf/1907.0… | 34 | 18 |
| Speech-to-Speech Translation Speech-to-speech translation (S2ST) consists on translating … | 117 | 17 |
| Speaker Identification | 248 | 15 |
| Beat Tracking Determine the positions of all beats in a music recording. | 19 | 15 |
| Audio Super-Resolution Audio super-resolution, especially speech, refers to the pro… | 22 | 14 |
| Downbeat Tracking Determine the positions of all downbeats in a music recordin… | 11 | 13 |
| Online Beat Tracking | 4 | 13 |
| Audio Tagging Audio tagging is a task to predict the tags of audio clips. … | 81 | 11 |
| Voice Anti-spoofing Discriminate genuine speech and spoofing attacks | 23 | 10 |
| Music Auto-Tagging | 22 | 9 |
| Audio Quality Assessment Computational audio quality assessment aims to predict the q… | 15 | 9 |
| Accented Speech Recognition | 20 | 8 |
| Sound Event Localization and Detection Given multichannel audio input, a sound event detection and … | 65 | 7 |
| Acoustic Scene Classification The goal of acoustic scene classification is to classify a t… | 132 | 6 |
| Environmental Sound Classification Classification of Environmental Sounds. Most often sounds fo… | 46 | 5 |
| Voice Conversion I remember all the summer days Drinking wine in the sunshine… | 520 | 4 |
| Audio Source Separation Audio Source Separation is the process of separating a mixtu… | 112 | 3 |
| Audio Denoising | 20 | 3 |
| Target Sound Extraction Target Sound Extraction is the task of extracting a sound co… | 16 | 3 |
| Music Question Answering | 4 | 3 |
| Acoustic Novelty Detection Detect novel events given acoustic signals, either in domest… | 3 | 3 |
| Speaker Recognition Speaker Recognition is the process of identifying or confirm… | 435 | 2 |
| Speech Denoising Obtain the clean speech of the target speaker by suppressing… | 65 | 2 |
| Music Genre Recognition Recognizing the genre (e.g. rock, pop, jazz, etc.) of a piec… | 10 | 2 |
| Music Generation Musique guitar | 386 | 1 |
| Sound Source Localization | 104 | 1 |
| Direction of Arrival Estimation Estimating the direction-of-arrival (DOA) of a sound source … | 94 | 1 |
| Active Speaker Detection | 63 | 1 |
| Lip to Speech Synthesis Given a silent video of a speaker, generate the correspondin… | 13 | 1 |
| Active Speaker Localization Active Speaker Localization (ASL) is the process of spatiall… | 5 | 1 |