| Cross-Lingual ASR | 5 | 0 |
| English Conversational Speech Recognition | 5 | 0 |
| Environment Sound Classification | 5 | 0 |
| Retrieval-augmented Few-shot In-context Audio Captioning Retrieval-augmented few-shot in-context audio captioning is … | 5 | 0 |
| Sequential skip prediction | 5 | 0 |
| Visual Keyword Spotting Spot a given query keyword in a silent talking face video | 5 | 0 |
| Word-level pronunciation scoring Total score of a word pronunciation. | 5 | 0 |
| Audio Effects Modeling Modeling of audio effects such as reverberation, compression… | 4 | 0 |
| Audio Fingerprint | 4 | 0 |
| Audio-Visual Captioning | 4 | 0 |
| Music Compression | 4 | 0 |
| Music Genre Transfer | 4 | 0 |
| Sound Prompted Semantic Segmentation Sound prompted semantic segmentation aims to predict a segme… | 4 | 0 |
| Soundscape evaluation Evaluation of soundscape in accordance to ISO/TS 12913-2 | 4 | 0 |
| Speaker-Specific Lip to Speech Synthesis How accurately can we infer an individual’s speech style and… | 4 | 0 |
| Speech Prompted Semantic Segmentation Speech prompted semantic segmentation aims to predict semant… | 4 | 0 |
| Utterance-level pronounciation scoring Total pronunciation score of an utterance. | 4 | 0 |
| Vietnamese Speech Recognition | 4 | 0 |
| Vocal technique classification | 4 | 0 |
| Acoustic Question Answering | 3 | 0 |
| Audio Dequantization Audio Dequantization is a process of estimating the original… | 3 | 0 |
| Audio Signal Recognition | 3 | 0 |
| Keyword Spotting on Google Speech Commands | 3 | 0 |
| Multi-instrument Music Transcription | 3 | 0 |
| Multimodal Music Generation | 3 | 0 |
| Multi-task Audio Source Seperation | 3 | 0 |
| Underwater Acoustic Classification Classification of underwater acoustic data | 3 | 0 |
| Zero-Shot Audio Retrieval | 3 | 0 |
| audio moment retrieval | 2 | 0 |
| Audio-Visual Video Captioning | 2 | 0 |
| Cross-environment ASR | 2 | 0 |
| Drum Transcription in Music (DTM) | 2 | 0 |
| Figure Of Speech Detection | 2 | 0 |
| Music Quality Assessment Evaluating the quality of music given noise and filtering co… | 2 | 0 |
| Referring Audio-Visual Segmentation | 2 | 0 |
| Speech Synthesis - Gujarati | 2 | 0 |
| Text to Audio/Video Retrieval | 2 | 0 |
| text-to-audiovisual retrieval | 2 | 0 |
| Timbre Interpolation | 2 | 0 |
| Vocal ensemble separation | 2 | 0 |
| ArzEn Speech Recognition | 1 | 0 |
| Audio-Driven Body Animation | 1 | 0 |
| Audio Multiple Target Classification | 1 | 0 |
| Audio Scene Understanding | 1 | 0 |
| Audio-Video Question Answering (AVQA) | 1 | 0 |
| Audio/Video to Text Retrieval | 1 | 0 |
| Bird Species Classification With Audio-Visual Data | 1 | 0 |
| Cadenza 1 - Task 1 - Headphone A person with a hearing loss is listening to music via headp… | 1 | 0 |
| Cadenza 1 - Task 2 - In Car A person with hearing loss is wearing their hearing aids and… | 1 | 0 |
| Cross-device ASR | 1 | 0 |