| Generative Multimodal Entity Linking | Jun 22, 2023 | Entity LinkingIn-Context Learning | CodeCode Available | 1 |
| Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering | Sep 10, 2021 | multimodal interactionNatural Language Understanding | CodeCode Available | 1 |
| Dynamic Modality Interaction Modeling for Image-Text Retrieval | Jul 11, 2021 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 1 |
| ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision | Feb 5, 2021 | Cross-Modal RetrievalImage Retrieval | CodeCode Available | 1 |
| Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer | Jul 1, 2020 | multimodal interactionMulti-modal Named Entity Recognition | CodeCode Available | 1 |
| A multi-stage augmented multimodal interaction network for fish feeding intensity quantification | Jun 17, 2025 | Decision Makingmultimodal interaction | —Unverified | 0 |
| InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback | May 29, 2025 | multimodal interaction | —Unverified | 0 |
| ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding | May 25, 2025 | Chart UnderstandingLogical Reasoning | CodeCode Available | 0 |
| DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems | May 11, 2025 | Gesture Recognitionmultimodal interaction | —Unverified | 0 |
| A Survey of Interactive Generative Video | Apr 30, 2025 | Autonomous Drivingmultimodal interaction | —Unverified | 0 |
| Immersive Multimedia Communication: State-of-the-Art on eXtended Reality Streaming | Mar 27, 2025 | multimodal interaction | —Unverified | 0 |
| ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting | Feb 20, 2025 | Image Captioningmultimodal interaction | —Unverified | 0 |
| Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving | Feb 12, 2025 | Mathmultimodal interaction | —Unverified | 0 |
| Towards Explainable Multimodal Depression Recognition for Clinical Interviews | Jan 27, 2025 | Decision MakingDepression Detection | CodeCode Available | 0 |
| FGU3R: Fine-Grained Fusion via Unified 3D Representation for Multimodal 3D Object Detection | Jan 8, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining | Jan 1, 2025 | Domain AdaptationModel Selection | —Unverified | 0 |
| Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer | Dec 24, 2024 | Gesture Recognitionmultimodal interaction | —Unverified | 0 |
| CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation | Nov 15, 2024 | Emotion RecognitionEmotion Recognition in Conversation | —Unverified | 0 |
| Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability | Nov 15, 2024 | multimodal interaction | —Unverified | 0 |
| MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval | Nov 13, 2024 | Image ComprehensionInformation Retrieval | CodeCode Available | 0 |
| Phase Diagram of Vision Large Language Models Inference: A Perspective from Interaction across Image and Instruction | Nov 1, 2024 | multimodal interaction | —Unverified | 0 |
| Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D Scenes | Oct 29, 2024 | 3D scene Editingmultimodal interaction | —Unverified | 0 |
| Retrospective Learning from Interactions | Oct 17, 2024 | multimodal interaction | —Unverified | 0 |
| Robi Butler: Multimodal Remote Interaction with a Household Robot Assistant | Sep 30, 2024 | multimodal interaction | —Unverified | 0 |
| LLM-Assisted Visual Analytics: Opportunities and Challenges | Sep 4, 2024 | Managementmultimodal interaction | —Unverified | 0 |