SOTAVerified

multimodal interaction

Papers

Showing 125 of 106 papers

TitleStatusHype
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech ModelCode5
Segment and Track AnythingCode4
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal InteractionCode4
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with TransformerCode2
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You WantCode2
Agent AI: Surveying the Horizons of Multimodal InteractionCode2
TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete DataCode2
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person RetrievalCode2
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-ExpertsCode2
Foundations and Recent Trends in Multimodal Mobile Agents: A SurveyCode2
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context InferenceCode2
Multi-Grained Multimodal Interaction Network for Entity LinkingCode1
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction ExpertsCode1
Narrative Action Evaluation with Prompt-Guided Multimodal InteractionCode1
Spatio-Temporal 3D Point Clouds from WiFi-CSI Data via Transformer NetworksCode1
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language ModelsCode1
Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal TransformerCode1
Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in ConversationsCode1
Dynamic Modality Interaction Modeling for Image-Text RetrievalCode1
CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion RecognitionCode1
Cooperative Sentiment Agents for Multimodal Sentiment AnalysisCode1
LLMs Can Evolve Continually on Modality for X-Modal ReasoningCode1
A Facial Expression-Aware Multimodal Multi-task Learning Framework for Emotion Recognition in Multi-party ConversationsCode1
MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension TasksCode1
Dialogue-based generation of self-driving simulation scenarios using Large Language ModelsCode1
Show:102550
← PrevPage 1 of 5Next →

No leaderboard results yet.