SOTAVerified

multimodal interaction

Papers

Showing 150 of 106 papers

TitleStatusHype
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech ModelCode5
Segment and Track AnythingCode4
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal InteractionCode4
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-ExpertsCode2
TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete DataCode2
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person RetrievalCode2
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with TransformerCode2
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context InferenceCode2
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You WantCode2
Foundations and Recent Trends in Multimodal Mobile Agents: A SurveyCode2
Agent AI: Surveying the Horizons of Multimodal InteractionCode2
LLMs Can Evolve Continually on Modality for X-Modal ReasoningCode1
Narrative Action Evaluation with Prompt-Guided Multimodal InteractionCode1
Dialogue-based generation of self-driving simulation scenarios using Large Language ModelsCode1
Cooperative Sentiment Agents for Multimodal Sentiment AnalysisCode1
Dynamic Modality Interaction Modeling for Image-Text RetrievalCode1
ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionCode1
CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion RecognitionCode1
Spider: Any-to-Many Multimodal LLMCode1
UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language ModelsCode1
A Facial Expression-Aware Multimodal Multi-task Learning Framework for Emotion Recognition in Multi-party ConversationsCode1
Generative Multimodal Entity LinkingCode1
Multi-Grained Multimodal Interaction Network for Entity LinkingCode1
MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension TasksCode1
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction ExpertsCode1
Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in ConversationsCode1
Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal TransformerCode1
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language ModelsCode1
Spatio-Temporal 3D Point Clouds from WiFi-CSI Data via Transformer NetworksCode1
Temporal Pyramid Transformer with Multimodal Interaction for Video Question AnsweringCode1
DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems0
A Review of Temporal Aspects of Hand Gesture Analysis Applied to Discourse Analysis and Natural Conversation0
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic0
A POMDP-based Multimodal Interaction System Using a Humanoid Robot0
Corpus of Multimodal Interaction for Collaborative Planning0
A novel multimodal dynamic fusion network for disfluency detection in spoken utterances0
Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer0
Graph-based Fine-grained Multimodal Attention Mechanism for Sentiment Analysis0
CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation0
An Evaluation Framework for Multimodal Interaction0
Integration of Multimodal Interaction as Assistance in Virtual Environments0
Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability0
Chat-to-Design: AI Assisted Personalized Fashion Design0
From Modal to Multimodal Ambiguities: a Classification Approach0
Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D Scenes0
Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous Systems0
FGU3R: Fine-Grained Fusion via Unified 3D Representation for Multimodal 3D Object Detection0
Guidelines for creating man-machine multimodal interfaces0
HGNET: A Hierarchical Feature Guided Network for Occupancy Flow Field Prediction0
Expanding the Role of Affective Phenomena in Multimodal Interaction Research0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.