SOTAVerified

Multimodal Large Language Model

Papers

Showing 76100 of 347 papers

TitleStatusHype
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and ExplanationCode1
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction UnderstandingCode1
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language ModelCode1
A Refer-and-Ground Multimodal Large Language Model for BiomedicineCode1
MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and DetectionCode1
Period-LLM: Extending the Periodic Capability of Multimodal Large Language ModelCode1
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image SequencesCode1
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task AutomationCode1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene UnderstandingCode1
MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and DiagnosisCode1
ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure UnderstandingCode1
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K ResolutionCode1
From Text to Pixel: Advancing Long-Context Understanding in MLLMsCode1
LMEye: An Interactive Perception Network for Large Language ModelsCode1
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone SensorsCode1
LITE: Modeling Environmental Ecosystems with Multimodal Large Language ModelsCode1
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial RelationsCode1
FinVis-GPT: A Multimodal Large Language Model for Financial Chart AnalysisCode1
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly DetectionCode1
FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis AssistantCode1
ChemMLLM: Chemical Multimodal Large Language ModelCode1
LION : Empowering Multimodal Large Language Model with Dual-Level Visual KnowledgeCode1
Meaning Typed Prompting: A Technique for Efficient, Reliable Structured Output GenerationCode1
Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot LearningCode1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open SpaceCode1
Show:102550
← PrevPage 4 of 14Next →

No leaderboard results yet.