| A multi-stage augmented multimodal interaction network for fish feeding intensity quantification | Jun 17, 2025 | Decision Makingmultimodal interaction | —Unverified | 0 |
| Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model | Jun 16, 2025 | Large Language Modelmultimodal interaction | CodeCode Available | 5 |
| InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback | May 29, 2025 | multimodal interaction | —Unverified | 0 |
| I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts | May 25, 2025 | Mixture-of-Expertsmultimodal interaction | CodeCode Available | 2 |
| ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding | May 25, 2025 | Chart UnderstandingLogical Reasoning | CodeCode Available | 0 |
| DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems | May 11, 2025 | Gesture Recognitionmultimodal interaction | —Unverified | 0 |
| Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction | May 5, 2025 | Image Generationmultimodal interaction | CodeCode Available | 4 |
| A Survey of Interactive Generative Video | Apr 30, 2025 | Autonomous Drivingmultimodal interaction | —Unverified | 0 |
| Immersive Multimedia Communication: State-of-the-Art on eXtended Reality Streaming | Mar 27, 2025 | multimodal interaction | —Unverified | 0 |
| OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer | Mar 13, 2025 | Decodermultimodal interaction | CodeCode Available | 2 |
| ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting | Feb 20, 2025 | Image Captioningmultimodal interaction | —Unverified | 0 |
| Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving | Feb 12, 2025 | Mathmultimodal interaction | —Unverified | 0 |
| Towards Explainable Multimodal Depression Recognition for Clinical Interviews | Jan 27, 2025 | Decision MakingDepression Detection | CodeCode Available | 0 |
| FGU3R: Fine-Grained Fusion via Unified 3D Representation for Multimodal 3D Object Detection | Jan 8, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining | Jan 1, 2025 | Domain AdaptationModel Selection | —Unverified | 0 |
| Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer | Dec 24, 2024 | Gesture Recognitionmultimodal interaction | —Unverified | 0 |
| Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability | Nov 15, 2024 | multimodal interaction | —Unverified | 0 |
| CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation | Nov 15, 2024 | Emotion RecognitionEmotion Recognition in Conversation | —Unverified | 0 |
| Spider: Any-to-Many Multimodal LLM | Nov 14, 2024 | multimodal interaction | CodeCode Available | 1 |
| MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval | Nov 13, 2024 | Image ComprehensionInformation Retrieval | CodeCode Available | 0 |
| Foundations and Recent Trends in Multimodal Mobile Agents: A Survey | Nov 4, 2024 | multimodal interactionSurvey | CodeCode Available | 2 |
| Phase Diagram of Vision Large Language Models Inference: A Perspective from Interaction across Image and Instruction | Nov 1, 2024 | multimodal interaction | —Unverified | 0 |
| Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D Scenes | Oct 29, 2024 | 3D scene Editingmultimodal interaction | —Unverified | 0 |
| LLMs Can Evolve Continually on Modality for X-Modal Reasoning | Oct 26, 2024 | Continual Learningmultimodal interaction | CodeCode Available | 1 |
| Retrospective Learning from Interactions | Oct 17, 2024 | multimodal interaction | —Unverified | 0 |
| Spatio-Temporal 3D Point Clouds from WiFi-CSI Data via Transformer Networks | Oct 7, 2024 | multimodal interaction | CodeCode Available | 1 |
| Robi Butler: Multimodal Remote Interaction with a Household Robot Assistant | Sep 30, 2024 | multimodal interaction | —Unverified | 0 |
| Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in Conversations | Sep 8, 2024 | Emotion RecognitionMamba | CodeCode Available | 1 |
| LLM-Assisted Visual Analytics: Opportunities and Challenges | Sep 4, 2024 | Managementmultimodal interaction | —Unverified | 0 |
| RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba | Aug 16, 2024 | AllMamba | —Unverified | 0 |
| Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions | Aug 2, 2024 | Benchmarkingmultimodal interaction | CodeCode Available | 0 |
| Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | Jul 25, 2024 | Image to textLanguage Modeling | —Unverified | 0 |
| A Unified Understanding of Adversarial Vulnerability Regarding Unimodal Models and Vision-Language Pre-training Models | Jul 25, 2024 | Data Augmentationmultimodal interaction | —Unverified | 0 |
| UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models | Jul 23, 2024 | Entity Linkingmultimodal interaction | CodeCode Available | 1 |
| TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data | Jul 10, 2024 | Contrastive Learningmultimodal interaction | CodeCode Available | 2 |
| Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents | Jul 1, 2024 | Emotional IntelligenceEmotion Classification | CodeCode Available | 0 |
| HGNET: A Hierarchical Feature Guided Network for Occupancy Flow Field Prediction | Jul 1, 2024 | Autonomous Drivingmultimodal interaction | —Unverified | 0 |
| Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models | Jun 30, 2024 | Hallucinationmultimodal interaction | CodeCode Available | 1 |
| OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents | Jun 27, 2024 | DecoderImitation Learning | —Unverified | 0 |
| A look under the hood of the Interactive Deep Learning Enterprise (No-IDLE) | Jun 27, 2024 | AnatomyDeep Learning | —Unverified | 0 |
| LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference | Jun 26, 2024 | multimodal interaction | CodeCode Available | 2 |
| EMMI -- Empathic Multimodal Motivational Interviews Dataset: Analyses and Annotations | Jun 24, 2024 | multimodal interaction | —Unverified | 0 |
| Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum | Apr 27, 2024 | Contrastive LearningEmotion Recognition | —Unverified | 0 |
| Narrative Action Evaluation with Prompt-Guided Multimodal Interaction | Apr 22, 2024 | Action Quality Assessmentmultimodal interaction | CodeCode Available | 1 |
| Cooperative Sentiment Agents for Multimodal Sentiment Analysis | Apr 19, 2024 | DisentanglementEmotion Recognition | CodeCode Available | 1 |
| Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want | Mar 29, 2024 | Instruction FollowingLanguage Modelling | CodeCode Available | 2 |
| BlendScape: Enabling End-User Customization of Video-Conferencing Environments through Generative AI | Mar 20, 2024 | Image Generationmultimodal interaction | —Unverified | 0 |
| Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction | Mar 16, 2024 | Adversarial RobustnessImage-text Retrieval | —Unverified | 0 |
| On the Arrow of Inference | Feb 22, 2024 | counterfactualCounterfactual Reasoning | —Unverified | 0 |
| Memory-Inspired Temporal Prompt Interaction for Text-Image Classification | Jan 26, 2024 | Classificationimage-classification | —Unverified | 0 |