SOTAVerified

Multimodal Large Language Model

Papers

Showing 151200 of 347 papers

TitleStatusHype
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability0
A Medical Multimodal Large Language Model for Pediatric Pneumonia0
A Neural Matrix Decomposition Recommender System Model based on the Multimodal Large Language Model0
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions0
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM0
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges0
Audio-Visual LLM for Video Understanding0
Automated radiotherapy treatment planning guided by GPT-4Vision0
Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction0
Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering0
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform0
BlueLM-2.5-3B Technical Report0
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches0
CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring0
Can Multimodal Large Language Model Think Analogically?0
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models0
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion0
CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model0
ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images0
ChatGPT Meets Iris Biometrics0
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning0
ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model0
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI0
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance0
CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates0
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering0
CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation0
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation0
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation0
COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework0
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips0
CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step0
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model0
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic0
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference0
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation0
Distraction is All You Need for Multimodal Large Language Model Jailbreaking0
DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing0
DreamJourney: Perpetual View Generation with Video Diffusion Models0
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation0
EAGLE: Egocentric AGgregated Language-video Engine0
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM0
EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM0
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model0
Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak0
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios0
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model0
EventVL: Understand Event Streams via Multimodal Large Language Model0
FaceInsight: A Multimodal Large Language Model for Face Perception0
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.