| Dialogue-based generation of self-driving simulation scenarios using Large Language Models | Oct 26, 2023 | multimodal interactionSelf-Driving Cars | CodeCode Available | 1 |
| MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks | Oct 13, 2023 | multimodal interactionMultimodal Reasoning | CodeCode Available | 1 |
| CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion Recognition | Jul 28, 2023 | Emotion RecognitionEmotion Recognition in Conversation | CodeCode Available | 1 |
| Multi-Grained Multimodal Interaction Network for Entity Linking | Jul 19, 2023 | Contrastive LearningDescriptive | CodeCode Available | 1 |
| A Facial Expression-Aware Multimodal Multi-task Learning Framework for Emotion Recognition in Multi-party Conversations | Jul 1, 2023 | Emotion RecognitionEmotion Recognition in Conversation | CodeCode Available | 1 |
| Generative Multimodal Entity Linking | Jun 22, 2023 | Entity LinkingIn-Context Learning | CodeCode Available | 1 |
| Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering | Sep 10, 2021 | multimodal interactionNatural Language Understanding | CodeCode Available | 1 |
| Dynamic Modality Interaction Modeling for Image-Text Retrieval | Jul 11, 2021 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 1 |
| ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision | Feb 5, 2021 | Cross-Modal RetrievalImage Retrieval | CodeCode Available | 1 |
| Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer | Jul 1, 2020 | multimodal interactionMulti-modal Named Entity Recognition | CodeCode Available | 1 |