SOTAVerified

Multimodal Large Language Model

Papers

Showing 110 of 347 papers

TitleStatusHype
MagicQuill: An Intelligent Interactive Image Editing SystemCode7
VITA: Towards Open-Source Interactive Omni Multimodal LLMCode7
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese UnderstandingCode7
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and EditingCode5
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement LearningCode5
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksCode5
Ovis: Structural Embedding Alignment for Multimodal Large Language ModelCode5
StarVector: Generating Scalable Vector Graphics Code from Images and TextCode5
Ferret: Refer and Ground Anything Anywhere at Any GranularityCode5
R1-Onevision:An Open-Source Multimodal Large Language Model Capable of Deep ReasoningCode4
Show:102550
← PrevPage 1 of 35Next →

No leaderboard results yet.