| Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data | Oct 2, 2024 | Audio ClassificationCaption Generation | CodeCode Available | 1 |
| Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning | Jul 16, 2024 | Caption Generationcross-modal alignment | CodeCode Available | 1 |
| HCQA @ Ego4D EgoSchema Challenge 2024 | Jun 22, 2024 | Caption Generation | CodeCode Available | 1 |
| SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset | May 12, 2024 | Action SpottingAutomatic Speech Recognition | CodeCode Available | 1 |
| BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes | Apr 3, 2024 | Caption GenerationHierarchical Multi-label Classification | CodeCode Available | 1 |
| NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment | Nov 5, 2023 | Caption GenerationCommon Sense Reasoning | CodeCode Available | 1 |
| VLIS: Unimodal Language Models Guide Multimodal Language Generation | Oct 15, 2023 | Caption GenerationExplanation Generation | CodeCode Available | 1 |
| Self-supervised Cross-view Representation Reconstruction for Change Captioning | Sep 28, 2023 | Caption GenerationHallucination | CodeCode Available | 1 |
| RECAP: Retrieval-Augmented Audio Captioning | Sep 18, 2023 | AudioCapsAudio captioning | CodeCode Available | 1 |
| MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response | Sep 15, 2023 | Caption GenerationLanguage Modelling | CodeCode Available | 1 |