SOTAVerified

Multimodal Large Language Model

Papers

Showing 5175 of 347 papers

TitleStatusHype
ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling0
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPOCode0
Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering0
Unifying Segment Anything in Microscopy with Multimodal Large Language ModelCode1
Batch Augmentation with Unimodal Fine-tuning for Multimodal LearningCode0
Is your multimodal large language model a good science tutor?0
MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills0
On Path to Multimodal Generalist: General-Level and General-Bench0
Consistency-aware Fake Videos Detection on Short Video PlatformsCode0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
FaceInsight: A Multimodal Large Language Model for Face Perception0
ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images0
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction UnderstandingCode1
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly DetectionCode1
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model0
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single TransformerCode2
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal ModelsCode0
CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates0
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment0
Enhancing Time Series Forecasting via Multi-Level Text Alignment with LLMsCode1
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep ThinkingCode0
Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model0
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning0
Towards Visual Text Grounding of Multimodal Large Language Model0
Universal Item Tokenization for Transferable Generative Recommendation0
Show:102550
← PrevPage 3 of 14Next →

No leaderboard results yet.