SOTAVerified

Multimodal Large Language Model

Papers

Showing 201225 of 347 papers

TitleStatusHype
VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection0
MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generationCode0
One Token to Seg Them All: Language Instructed Reasoning Segmentation in VideosCode2
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet UpcyclingCode2
EAGLE: Egocentric AGgregated Language-video Engine0
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches0
CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation0
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference0
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context UnderstandingCode0
Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles0
MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning0
TextToucher: Fine-Grained Text-to-Touch GenerationCode1
A Medical Multimodal Large Language Model for Pediatric Pneumonia0
DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing0
Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction0
Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model0
OrthoDoc: Multimodal Large Language Model for Assisting Diagnosis in Computed Tomography0
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language ModelsCode1
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene UnderstandingCode0
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model0
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese0
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion0
ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure UnderstandingCode1
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model0
Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model0
Show:102550
← PrevPage 9 of 14Next →

No leaderboard results yet.