SOTAVerified

Multimodal Large Language Model

Papers

Showing 151200 of 347 papers

TitleStatusHype
Layout Generation Agents with Large Language ModelsCode0
Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language ModelsCode0
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media ContextsCode0
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep ThinkingCode0
MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generationCode0
Leveraging Multimodal LLM for Inspirational User Interface SearchCode0
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model0
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection0
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability0
ST^3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming0
StreetviewLLM: Extracting Geographic Information Using a Chain-of-Thought Multimodal Large Language Model0
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization0
SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults0
TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model0
The NTNU System at the S&I Challenge 2025 SLA Open Track0
The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge0
Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond0
Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques0
Towards Visual Text Grounding of Multimodal Large Language Model0
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security0
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation0
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion0
Universal Item Tokenization for Transferable Generative Recommendation0
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning0
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation0
VGR: Visual Grounded Reasoning0
Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model0
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition0
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese0
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks0
Visual Text Generation in the Wild0
ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation0
VL-Mamba: Exploring State Space Models for Multimodal Learning0
VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection0
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks0
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach0
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models0
When neural implant meets multimodal LLM: A dual-loop system for neuromodulation and naturalistic neuralbehavioral research0
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image0
Multimodal large language model for wheat breeding: a new exploration of smart breeding0
A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization0
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability0
A Medical Multimodal Large Language Model for Pediatric Pneumonia0
A Neural Matrix Decomposition Recommender System Model based on the Multimodal Large Language Model0
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions0
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM0
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges0
A Survey on Multimodal Large Language Models0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.