| Small-Footprint Keyword Spotting | 25 | 0 |
| Text-Dependent Speaker Verification | 25 | 0 |
| automatic-speech-translation | 23 | 0 |
| Sequence-To-Sequence Speech Recognition | 22 | 0 |
| Chord Recognition | 21 | 0 |
| Speech Tokenization Speech tokenization is the task of representing speech signa… | 21 | 0 |
| Text-Independent Speaker Recognition | 21 | 0 |
| Audio-Visual Question Answering (AVQA) | 20 | 0 |
| Audio inpainting Filling in holes in audio data | 19 | 0 |
| Packet Loss Concealment Predicting audio packets lost during transmission. | 19 | 0 |
| Audio Question Answering | 18 | 0 |
| Zero-shot Audio Classification | 18 | 0 |
| Unsupervised Speech Recognition | 17 | 0 |
| Melody Extraction | 16 | 0 |
| Prosody Prediction Predicting prosodic prominence from text. This is a 2-way cl… | 15 | 0 |
| Radar waveform design | 15 | 0 |
| Arabic Speech Recognition | 14 | 0 |
| Music Captioning | 14 | 0 |
| Voice Similarity | 14 | 0 |
| Music Style Transfer | 13 | 0 |
| Noisy Speech Recognition | 13 | 0 |
| Automatic Lyrics Transcription Automatic Lyrics Transcription is the task of transcribing s… | 12 | 0 |
| Simultaneous Speech-to-Speech Translation | 12 | 0 |
| Simultaneous Speech-to-Text Translation Simultaneous Speech-to-Text translation aims to translate co… | 12 | 0 |
| Drum Transcription | 11 | 0 |
| Audio to Text Retrieval | 10 | 0 |
| Phone-level pronunciation scoring | 10 | 0 |
| Silent Speech Recognition Interpret speech without acoustic signals | 10 | 0 |
| Singer Identification | 10 | 0 |
| Self-Supervised Audio Classification | 9 | 0 |
| Video-to-Sound Generation | 9 | 0 |
| Few-Shot Audio Classification Few-shot classification for audio signals. Presents a unique… | 8 | 0 |
| Multi-Speaker Source Separation | 8 | 0 |
| Speaker Profiling Estimation of Physical parameters from Speech data | 8 | 0 |
| Zero-Shot Multi-Speaker TTS | 8 | 0 |
| Music Performance Rendering Music performance rendering is the task of generating human-… | 7 | 0 |
| Speech Interruption Detection "Overlapping speech is a natural and frequently occurring ph… | 7 | 0 |
| Vowel Classification | 7 | 0 |
| Audio declipping Audio declipping is the task of estimating the original audi… | 6 | 0 |
| Audio Emotion Recognition | 6 | 0 |
| Bird Audio Detection | 6 | 0 |
| Speech Intent Classification | 6 | 0 |
| Speech Language Identification | 6 | 0 |
| text-to-speech translation | 6 | 0 |
| Visually Guided Sound Source Separation The task of visually guided sound source separation (also re… | 6 | 0 |
| Zero-shot Audio Captioning Zero-shot audio captioning aims at automatically generating … | 6 | 0 |
| Zero-Shot Environment Sound Classification | 6 | 0 |
| Zero-shot Text to Audio Retrieval | 6 | 0 |
| AUDIO-VISUAL QUESTION ANSWERING (MUSIC-AVQA-v2.0) A more reliable and balanced version of original MUSIC-AVQA … | 5 | 0 |
| Automatic Phoneme Recognition Automatic Phoneme Recognition (APR) involves converting spok… | 5 | 0 |