SOTAVerified

Multimodal Large Language Model

Papers

Showing 151200 of 347 papers

TitleStatusHype
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language ModelCode0
SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM InversionCode0
Leveraging Multimodal LLM for Inspirational User Interface SearchCode0
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time ScalingCode0
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media ContextsCode0
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-TuningCode0
TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein EngineeringCode0
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of TricksCode0
Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image GenerationCode0
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPOCode0
LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic SurgeryCode0
Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged AnnotationsCode0
Visual Text Generation in the WildCode0
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target GranularitiesCode0
Layout Generation Agents with Large Language ModelsCode0
TRINS: Towards Multimodal Language Models that Can ReadCode0
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning0
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation0
VGR: Visual Grounded Reasoning0
Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model0
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition0
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese0
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks0
ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation0
VL-Mamba: Exploring State Space Models for Multimodal Learning0
VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection0
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks0
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach0
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models0
When neural implant meets multimodal LLM: A dual-loop system for neuromodulation and naturalistic neuralbehavioral research0
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image0
Multimodal large language model for wheat breeding: a new exploration of smart breeding0
A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization0
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability0
A Medical Multimodal Large Language Model for Pediatric Pneumonia0
A Neural Matrix Decomposition Recommender System Model based on the Multimodal Large Language Model0
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions0
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM0
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges0
Audio-Visual LLM for Video Understanding0
Automated radiotherapy treatment planning guided by GPT-4Vision0
Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction0
Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering0
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform0
BlueLM-2.5-3B Technical Report0
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches0
CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring0
Can Multimodal Large Language Model Think Analogically?0
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models0
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.