SOTAVerified

multimodal interaction

Papers

Showing 125 of 106 papers

TitleStatusHype
A multi-stage augmented multimodal interaction network for fish feeding intensity quantification0
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech ModelCode5
InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback0
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart UnderstandingCode0
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-ExpertsCode2
DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems0
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal InteractionCode4
A Survey of Interactive Generative Video0
Immersive Multimedia Communication: State-of-the-Art on eXtended Reality Streaming0
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with TransformerCode2
ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting0
Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving0
Towards Explainable Multimodal Depression Recognition for Clinical InterviewsCode0
FGU3R: Fine-Grained Fusion via Unified 3D Representation for Multimodal 3D Object Detection0
MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining0
Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer0
CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation0
Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability0
Spider: Any-to-Many Multimodal LLMCode1
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal RetrievalCode0
Foundations and Recent Trends in Multimodal Mobile Agents: A SurveyCode2
Phase Diagram of Vision Large Language Models Inference: A Perspective from Interaction across Image and Instruction0
Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D Scenes0
LLMs Can Evolve Continually on Modality for X-Modal ReasoningCode1
Retrospective Learning from Interactions0
Show:102550
← PrevPage 1 of 5Next →

No leaderboard results yet.