| Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization | Jan 16, 2024 | Action DetectionActivity Detection | —Unverified | 0 |
| Towards Emotion Analysis in Short-form Videos: A Large-Scale Dataset and Baseline | Nov 29, 2023 | audio-visual learningForm | CodeCode Available | 1 |
| Boosting Audio-visual Zero-shot Learning with Large Language Models | Nov 21, 2023 | audio-visual learningDescriptive | CodeCode Available | 0 |
| Can CLIP Help Sound Source Localization? | Nov 7, 2023 | audio-visual learningContrastive Learning | CodeCode Available | 1 |
| Deep Video Inpainting Guided by Audio-Visual Self-Supervision | Oct 11, 2023 | audio-visual learningVideo Inpainting | CodeCode Available | 0 |
| AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models | Sep 19, 2023 | audio-visual learningRepresentation Learning | CodeCode Available | 1 |
| Class-Incremental Grouping Network for Continual Audio-Visual Learning | Sep 11, 2023 | audio-visual learningclass-incremental learning | CodeCode Available | 1 |
| Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning | Sep 8, 2023 | audio-visual learningQuantization | —Unverified | 0 |
| RealImpact: A Dataset of Impact Sound Fields for Real Objects | Jun 16, 2023 | audio-visual learning | —Unverified | 0 |
| A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition | May 30, 2023 | audio-visual learning | CodeCode Available | 1 |