SOTAVerified

Multimodal Large Language Model

Papers

Showing 151200 of 347 papers

TitleStatusHype
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions0
Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes0
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models0
MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval0
Diagnosing and Mitigating Modality Interference in Multimodal Large Language ModelsCode0
OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model0
HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning0
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning0
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification0
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval0
MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling0
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation0
CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring0
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning0
ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling0
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPOCode0
Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering0
Batch Augmentation with Unimodal Fine-tuning for Multimodal LearningCode0
MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills0
Is your multimodal large language model a good science tutor?0
On Path to Multimodal Generalist: General-Level and General-Bench0
Consistency-aware Fake Videos Detection on Short Video PlatformsCode0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
FaceInsight: A Multimodal Large Language Model for Face Perception0
ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images0
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal ModelsCode0
CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates0
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model0
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment0
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning0
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep ThinkingCode0
Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model0
Towards Visual Text Grounding of Multimodal Large Language Model0
Universal Item Tokenization for Transferable Generative Recommendation0
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target GranularitiesCode0
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources0
Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training0
Dynamic Pyramid Network for Efficient Multimodal Large Language ModelCode0
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation0
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation0
LEGION: Learning to Ground and Explain for Synthetic Image Detection0
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability0
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model0
When neural implant meets multimodal LLM: A dual-loop system for neuromodulation and naturalistic neuralbehavioral research0
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing0
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning0
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance0
Hybrid Agents for Image Restoration0
Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition0
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of TricksCode0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.