SOTAVerified

Multimodal Large Language Model

Papers

Showing 301347 of 347 papers

TitleStatusHype
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion0
Universal Item Tokenization for Transferable Generative Recommendation0
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning0
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation0
VGR: Visual Grounded Reasoning0
Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model0
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition0
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese0
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks0
Visual Text Generation in the Wild0
ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation0
VL-Mamba: Exploring State Space Models for Multimodal Learning0
VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection0
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks0
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach0
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models0
When neural implant meets multimodal LLM: A dual-loop system for neuromodulation and naturalistic neuralbehavioral research0
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image0
LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic SurgeryCode0
Leveraging Multimodal LLM for Inspirational User Interface SearchCode0
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-TuningCode0
Diagnosing and Mitigating Modality Interference in Multimodal Large Language ModelsCode0
Consistency-aware Fake Videos Detection on Short Video PlatformsCode0
SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM InversionCode0
Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language ModelsCode0
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media ContextsCode0
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic TypographyCode0
TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein EngineeringCode0
Multimodal Hypothetical Summary for Retrieval-based Multi-image Question AnsweringCode0
VideoQA in the Era of LLMs: An Empirical StudyCode0
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep ThinkingCode0
Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image GenerationCode0
Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged AnnotationsCode0
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic ScenariosCode0
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target GranularitiesCode0
Layout Generation Agents with Large Language ModelsCode0
TRINS: Towards Multimodal Language Models that Can ReadCode0
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context UnderstandingCode0
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPOCode0
Dynamic Pyramid Network for Efficient Multimodal Large Language ModelCode0
MFGDiffusion: Mask-Guided Smoke Synthesis for Enhanced Forest Fire DetectionCode0
VIS-Shepherd: Constructing Critic for LLM-based Data Visualization GenerationCode0
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language ModelCode0
Batch Augmentation with Unimodal Fine-tuning for Multimodal LearningCode0
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLMCode0
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene UnderstandingCode0
MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generationCode0
Show:102550
← PrevPage 7 of 7Next →

No leaderboard results yet.