| Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model | Jun 16, 2025 | Large Language Modelmultimodal interaction | CodeCode Available | 5 |
| Segment and Track Anything | May 11, 2023 | Autonomous Drivingmultimodal interaction | CodeCode Available | 4 |
| Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction | May 5, 2025 | Image Generationmultimodal interaction | CodeCode Available | 4 |
| I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts | May 25, 2025 | Mixture-of-Expertsmultimodal interaction | CodeCode Available | 2 |
| TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data | Jul 10, 2024 | Contrastive Learningmultimodal interaction | CodeCode Available | 2 |
| Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval | Mar 22, 2023 | Image-text matchingLanguage Modeling | CodeCode Available | 2 |
| OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer | Mar 13, 2025 | Decodermultimodal interaction | CodeCode Available | 2 |
| LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference | Jun 26, 2024 | multimodal interaction | CodeCode Available | 2 |
| Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want | Mar 29, 2024 | Instruction FollowingLanguage Modelling | CodeCode Available | 2 |
| Foundations and Recent Trends in Multimodal Mobile Agents: A Survey | Nov 4, 2024 | multimodal interactionSurvey | CodeCode Available | 2 |
| Agent AI: Surveying the Horizons of Multimodal Interaction | Jan 7, 2024 | multimodal interaction | CodeCode Available | 2 |
| LLMs Can Evolve Continually on Modality for X-Modal Reasoning | Oct 26, 2024 | Continual Learningmultimodal interaction | CodeCode Available | 1 |
| Narrative Action Evaluation with Prompt-Guided Multimodal Interaction | Apr 22, 2024 | Action Quality Assessmentmultimodal interaction | CodeCode Available | 1 |
| Dialogue-based generation of self-driving simulation scenarios using Large Language Models | Oct 26, 2023 | multimodal interactionSelf-Driving Cars | CodeCode Available | 1 |
| Cooperative Sentiment Agents for Multimodal Sentiment Analysis | Apr 19, 2024 | DisentanglementEmotion Recognition | CodeCode Available | 1 |
| Dynamic Modality Interaction Modeling for Image-Text Retrieval | Jul 11, 2021 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 1 |
| ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision | Feb 5, 2021 | Cross-Modal RetrievalImage Retrieval | CodeCode Available | 1 |
| CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion Recognition | Jul 28, 2023 | Emotion RecognitionEmotion Recognition in Conversation | CodeCode Available | 1 |
| Spider: Any-to-Many Multimodal LLM | Nov 14, 2024 | multimodal interaction | CodeCode Available | 1 |
| UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models | Jul 23, 2024 | Entity Linkingmultimodal interaction | CodeCode Available | 1 |
| A Facial Expression-Aware Multimodal Multi-task Learning Framework for Emotion Recognition in Multi-party Conversations | Jul 1, 2023 | Emotion RecognitionEmotion Recognition in Conversation | CodeCode Available | 1 |
| Generative Multimodal Entity Linking | Jun 22, 2023 | Entity LinkingIn-Context Learning | CodeCode Available | 1 |
| Multi-Grained Multimodal Interaction Network for Entity Linking | Jul 19, 2023 | Contrastive LearningDescriptive | CodeCode Available | 1 |
| MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks | Oct 13, 2023 | multimodal interactionMultimodal Reasoning | CodeCode Available | 1 |
| MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts | Nov 16, 2023 | Binary ClassificationDescriptive | CodeCode Available | 1 |
| Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in Conversations | Sep 8, 2024 | Emotion RecognitionMamba | CodeCode Available | 1 |
| Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer | Jul 1, 2020 | multimodal interactionMulti-modal Named Entity Recognition | CodeCode Available | 1 |
| Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models | Jun 30, 2024 | Hallucinationmultimodal interaction | CodeCode Available | 1 |
| Spatio-Temporal 3D Point Clouds from WiFi-CSI Data via Transformer Networks | Oct 7, 2024 | multimodal interaction | CodeCode Available | 1 |
| Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering | Sep 10, 2021 | multimodal interactionNatural Language Understanding | CodeCode Available | 1 |
| DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems | May 11, 2025 | Gesture Recognitionmultimodal interaction | —Unverified | 0 |
| A Review of Temporal Aspects of Hand Gesture Analysis Applied to Discourse Analysis and Natural Conversation | Dec 17, 2013 | multimodal interactionSystematic Literature Review | —Unverified | 0 |
| Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | Jul 25, 2024 | Image to textLanguage Modeling | —Unverified | 0 |
| A POMDP-based Multimodal Interaction System Using a Humanoid Robot | Oct 1, 2016 | Face Recognitionmultimodal interaction | —Unverified | 0 |
| Corpus of Multimodal Interaction for Collaborative Planning | Jun 1, 2019 | multimodal interaction | —Unverified | 0 |
| A novel multimodal dynamic fusion network for disfluency detection in spoken utterances | Nov 27, 2022 | multimodal interaction | —Unverified | 0 |
| Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer | Dec 24, 2024 | Gesture Recognitionmultimodal interaction | —Unverified | 0 |
| Graph-based Fine-grained Multimodal Attention Mechanism for Sentiment Analysis | Nov 16, 2021 | multimodal interactionMultimodal Sentiment Analysis | —Unverified | 0 |
| CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation | Nov 15, 2024 | Emotion RecognitionEmotion Recognition in Conversation | —Unverified | 0 |
| An Evaluation Framework for Multimodal Interaction | May 1, 2018 | Gesture Recognitionmultimodal interaction | —Unverified | 0 |
| Integration of Multimodal Interaction as Assistance in Virtual Environments | Jul 1, 2012 | multimodal interaction | —Unverified | 0 |
| Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability | Nov 15, 2024 | multimodal interaction | —Unverified | 0 |
| Chat-to-Design: AI Assisted Personalized Fashion Design | Jul 3, 2022 | multimodal interactionNatural Language Understanding | —Unverified | 0 |
| From Modal to Multimodal Ambiguities: a Classification Approach | Apr 4, 2017 | ClassificationGeneral Classification | —Unverified | 0 |
| Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D Scenes | Oct 29, 2024 | 3D scene Editingmultimodal interaction | —Unverified | 0 |
| Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous Systems | Sep 11, 2023 | Incremental Learningmultimodal interaction | —Unverified | 0 |
| FGU3R: Fine-Grained Fusion via Unified 3D Representation for Multimodal 3D Object Detection | Jan 8, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Guidelines for creating man-machine multimodal interfaces | Jan 29, 2019 | multimodal interaction | —Unverified | 0 |
| HGNET: A Hierarchical Feature Guided Network for Occupancy Flow Field Prediction | Jul 1, 2024 | Autonomous Drivingmultimodal interaction | —Unverified | 0 |
| Expanding the Role of Affective Phenomena in Multimodal Interaction Research | May 18, 2023 | multimodal interaction | —Unverified | 0 |