SOTAVerified

Multimodal Large Language Model

Papers

Showing 76100 of 347 papers

TitleStatusHype
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target GranularitiesCode0
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources0
Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training0
Dynamic Pyramid Network for Efficient Multimodal Large Language ModelCode0
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation0
Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future DirectionsCode1
LEGION: Learning to Ground and Explain for Synthetic Image Detection0
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation0
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability0
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model0
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing0
When neural implant meets multimodal LLM: A dual-loop system for neuromodulation and naturalistic neuralbehavioral research0
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open SpaceCode1
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning0
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and EditingCode3
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance0
Hybrid Agents for Image Restoration0
Referring to Any PersonCode2
Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition0
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language ModelCode2
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement LearningCode5
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language ModelCode2
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of TricksCode0
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering0
Towards General Visual-Linguistic Face Forgery Detection(V2)Code1
Show:102550
← PrevPage 4 of 14Next →

No leaderboard results yet.