SOTAVerified

Multimodal Large Language Model

Papers

Showing 126150 of 347 papers

TitleStatusHype
LITE: Modeling Environmental Ecosystems with Multimodal Large Language ModelsCode1
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4VCode1
VideoQA in the Era of LLMs: An Empirical StudyCode0
Diagnosing and Mitigating Modality Interference in Multimodal Large Language ModelsCode0
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene UnderstandingCode0
Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged AnnotationsCode0
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target GranularitiesCode0
TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein EngineeringCode0
TRINS: Towards Multimodal Language Models that Can ReadCode0
VIS-Shepherd: Constructing Critic for LLM-based Data Visualization GenerationCode0
Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image GenerationCode0
Consistency-aware Fake Videos Detection on Short Video PlatformsCode0
Batch Augmentation with Unimodal Fine-tuning for Multimodal LearningCode0
SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM InversionCode0
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language ModelCode0
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic TypographyCode0
Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language ModelsCode0
Multimodal Hypothetical Summary for Retrieval-based Multi-image Question AnsweringCode0
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep ThinkingCode0
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic ScenariosCode0
Leveraging Multimodal LLM for Inspirational User Interface SearchCode0
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context UnderstandingCode0
Dynamic Pyramid Network for Efficient Multimodal Large Language ModelCode0
MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generationCode0
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-TuningCode0
Show:102550
← PrevPage 6 of 14Next →

No leaderboard results yet.