SOTAVerified

Multimodal Large Language Model

Papers

Showing 301347 of 347 papers

TitleStatusHype
Multimodal Large Language Model for Visual Navigation0
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation0
Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model0
Multimodal Transformer for Comics Text-Cloze0
ObjectFinder: An Open-Vocabulary Assistive System for Interactive Object Search by Blind People0
OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects0
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning0
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions0
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks0
On Fairness of Unified Multimodal Large Language Model for Image Generation0
On Path to Multimodal Generalist: General-Level and General-Bench0
OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model0
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources0
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy0
Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training0
ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling0
OrthoDoc: Multimodal Large Language Model for Assisting Diagnosis in Computed Tomography0
PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis0
Parking, Perception, and Retail: Street-Level Determinants of Community Vitality in Harbin0
PHRASED: Phrase Dictionary Biasing for Speech Translation0
Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model0
RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models0
Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model0
RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction0
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation0
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation0
Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation0
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization0
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model0
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection0
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability0
ST^3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming0
StreetviewLLM: Extracting Geographic Information Using a Chain-of-Thought Multimodal Large Language Model0
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization0
SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults0
TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model0
The NTNU System at the S&I Challenge 2025 SLA Open Track0
The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge0
Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond0
Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques0
Towards Visual Text Grounding of Multimodal Large Language Model0
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security0
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation0
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion0
Universal Item Tokenization for Transferable Generative Recommendation0
Show:102550
← PrevPage 7 of 7Next →

No leaderboard results yet.