SOTAVerified

Multimodal Large Language Model

Papers

Showing 51100 of 347 papers

TitleStatusHype
UrbanWorld: An Urban World Model for 3D City GenerationCode2
Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLMCode2
Explore the Limits of Omni-modal Pretraining at ScaleCode2
A Survey of Multimodal Large Language Model from A Data-centric PerspectiveCode2
WorldGPT: Empowering LLM as Multimodal World ModelCode2
Paint by Inpaint: Learning to Add Image Objects by Removing Them FirstCode2
LaVy: Vietnamese Multimodal Large Language ModelCode2
UMBRAE: Unified Multimodal Brain DecodingCode2
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You WantCode2
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual ScenariosCode2
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially FastCode2
Jailbreaking Attack against Multimodal Large Language ModelCode2
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question AnsweringCode2
MLLM-Tool: A Multimodal Large Language Model For Tool Agent LearningCode2
LION: Empowering Multimodal Large Language Model with Dual-Level Visual KnowledgeCode2
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
LLMGA: Multimodal Large Language Model based Generation AssistantCode2
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language ModelsCode2
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and BenchmarksCode2
MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and DiagnosisCode1
The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural UnitsCode1
un^2CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIPCode1
Period-LLM: Extending the Periodic Capability of Multimodal Large Language ModelCode1
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K ResolutionCode1
Unifying Multimodal Large Language Model Capabilities and Modalities via Model MergingCode1
Multimodal LLM-Guided Semantic Correction in Text-to-Image DiffusionCode1
ChemMLLM: Chemical Multimodal Large Language ModelCode1
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and ExplanationCode1
Unifying Segment Anything in Microscopy with Multimodal Large Language ModelCode1
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction UnderstandingCode1
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly DetectionCode1
Enhancing Time Series Forecasting via Multi-Level Text Alignment with LLMsCode1
Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future DirectionsCode1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open SpaceCode1
Towards General Visual-Linguistic Face Forgery Detection(V2)Code1
Towards Text-Image Interleaved RetrievalCode1
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent FiguresCode1
EndoChat: Grounded Multimodal Large Language Model for Endoscopic SurgeryCode1
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysisCode1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene UnderstandingCode1
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question AnsweringCode1
MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and DetectionCode1
IDEA-Bench: How Far are Generative Models from Professional Designing?Code1
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial RelationsCode1
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction TuningCode1
Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot LearningCode1
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language ModelCode1
Meaning Typed Prompting: A Technique for Efficient, Reliable Structured Output GenerationCode1
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task AutomationCode1
Hespi: A pipeline for automatically detecting information from hebarium specimen sheetsCode1
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.