SOTAVerified

Multimodal Large Language Model

Papers

Showing 51100 of 347 papers

TitleStatusHype
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language ModelCode2
Paint by Inpaint: Learning to Add Image Objects by Removing Them FirstCode2
UMBRAE: Unified Multimodal Brain DecodingCode2
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse SamplingCode2
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language InterpretationCode2
Towards a Multimodal Large Language Model with Pixel-Level Insight for BiomedicineCode2
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code GenerationCode2
Jailbreaking Attack against Multimodal Large Language ModelCode2
LaVy: Vietnamese Multimodal Large Language ModelCode2
MLLM-Tool: A Multimodal Large Language Model For Tool Agent LearningCode2
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video GenerationCode2
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You WantCode2
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
Introducing Visual Perception Token into Multimodal Large Language ModelCode2
Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLMCode2
A Survey of Multimodal Large Language Model from A Data-centric PerspectiveCode2
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel DecodingCode2
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual ScenariosCode2
StoryTeller: Improving Long Video Description through Global Audio-Visual Character IdentificationCode2
DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-ResolutionCode1
CXR-LLAVA: a multimodal large language model for interpreting chest X-ray imagesCode1
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and ExplanationCode1
Harnessing Multimodal Large Language Models for Multimodal Sequential RecommendationCode1
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction UnderstandingCode1
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent FiguresCode1
Period-LLM: Extending the Periodic Capability of Multimodal Large Language ModelCode1
A Refer-and-Ground Multimodal Large Language Model for BiomedicineCode1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open SpaceCode1
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question AnsweringCode1
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language ModelCode1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene UnderstandingCode1
Hallucination Augmented Contrastive Learning for Multimodal Large Language ModelCode1
ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure UnderstandingCode1
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K ResolutionCode1
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task AutomationCode1
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language ModelsCode1
MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and DetectionCode1
FinVis-GPT: A Multimodal Large Language Model for Financial Chart AnalysisCode1
FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis AssistantCode1
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language ModelCode1
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4VCode1
From Text to Pixel: Advancing Long-Context Understanding in MLLMsCode1
ChemMLLM: Chemical Multimodal Large Language ModelCode1
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly DetectionCode1
MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and DiagnosisCode1
Meaning Typed Prompting: A Technique for Efficient, Reliable Structured Output GenerationCode1
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image SequencesCode1
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone SensorsCode1
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial RelationsCode1
LMEye: An Interactive Perception Network for Large Language ModelsCode1
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.