SOTAVerified

Multimodal Large Language Model

Papers

Showing 2650 of 347 papers

TitleStatusHype
ShapeLLM: Universal 3D Object Understanding for Embodied InteractionCode3
TinyGPT-V: Efficient Multimodal Large Language Model via Small BackbonesCode3
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel DecodingCode2
Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsCode2
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single TransformerCode2
Referring to Any PersonCode2
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language ModelCode2
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language ModelCode2
Introducing Visual Perception Token into Multimodal Large Language ModelCode2
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic DataCode2
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal UnderstandingCode2
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal UnderstandingCode2
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code GenerationCode2
Towards a Multimodal Large Language Model with Pixel-Level Insight for BiomedicineCode2
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object DetectionCode2
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language InterpretationCode2
StoryTeller: Improving Long Video Description through Global Audio-Visual Character IdentificationCode2
Protecting Privacy in Multimodal Large Language Models with MLLMU-BenchCode2
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent EvaluationCode2
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse SamplingCode2
One Token to Seg Them All: Language Instructed Reasoning Segmentation in VideosCode2
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet UpcyclingCode2
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry AreaCode2
MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series AnalysisCode2
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video GenerationCode2
Show:102550
← PrevPage 2 of 14Next →

No leaderboard results yet.