SOTAVerified

Multimodal Large Language Model

Papers

Showing 226250 of 347 papers

TitleStatusHype
J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM0
Multimodal Hypothetical Summary for Retrieval-based Multi-image Question AnsweringCode0
Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation0
MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond0
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges0
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM0
COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework0
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation0
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance0
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling0
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios0
EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM0
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation0
ObjectFinder: An Open-Vocabulary Assistive System for Interactive Object Search by Blind People0
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image0
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models0
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model0
Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model0
Multimodal large language model for wheat breeding: a new exploration of smart breeding0
StreetviewLLM: Extracting Geographic Information Using a Chain-of-Thought Multimodal Large Language Model0
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model0
Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model0
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media ContextsCode0
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-TuningCode0
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization0
Show:102550
← PrevPage 10 of 14Next →

No leaderboard results yet.