SOTAVerified

Multimodal Large Language Model

Papers

Showing 176200 of 347 papers

TitleStatusHype
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models0
CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates0
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model0
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment0
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning0
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep ThinkingCode0
Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model0
Towards Visual Text Grounding of Multimodal Large Language Model0
Universal Item Tokenization for Transferable Generative Recommendation0
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target GranularitiesCode0
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources0
Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training0
Dynamic Pyramid Network for Efficient Multimodal Large Language ModelCode0
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation0
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation0
LEGION: Learning to Ground and Explain for Synthetic Image Detection0
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability0
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model0
When neural implant meets multimodal LLM: A dual-loop system for neuromodulation and naturalistic neuralbehavioral research0
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing0
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning0
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance0
Hybrid Agents for Image Restoration0
Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition0
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks0
Show:102550
← PrevPage 8 of 14Next →

No leaderboard results yet.