SOTAVerified

multimodal interaction

Papers

Showing 125 of 106 papers

TitleStatusHype
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech ModelCode5
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal InteractionCode4
Segment and Track AnythingCode4
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context InferenceCode2
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-ExpertsCode2
Agent AI: Surveying the Horizons of Multimodal InteractionCode2
TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete DataCode2
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person RetrievalCode2
Foundations and Recent Trends in Multimodal Mobile Agents: A SurveyCode2
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You WantCode2
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with TransformerCode2
Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in ConversationsCode1
A Facial Expression-Aware Multimodal Multi-task Learning Framework for Emotion Recognition in Multi-party ConversationsCode1
LLMs Can Evolve Continually on Modality for X-Modal ReasoningCode1
Spatio-Temporal 3D Point Clouds from WiFi-CSI Data via Transformer NetworksCode1
Multi-Grained Multimodal Interaction Network for Entity LinkingCode1
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction ExpertsCode1
Narrative Action Evaluation with Prompt-Guided Multimodal InteractionCode1
Dialogue-based generation of self-driving simulation scenarios using Large Language ModelsCode1
CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion RecognitionCode1
Generative Multimodal Entity LinkingCode1
Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal TransformerCode1
Cooperative Sentiment Agents for Multimodal Sentiment AnalysisCode1
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language ModelsCode1
Dynamic Modality Interaction Modeling for Image-Text RetrievalCode1
Show:102550
← PrevPage 1 of 5Next →

No leaderboard results yet.