Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–175 of 347 papers

Title	Date	Tasks	Status
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions	May 27, 2025	Audio-Visual SynchronizationConversational Response Generation	—Unverified
Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes	May 26, 2025	DeepFake DetectionFace Generation	—Unverified
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models	May 26, 2025	Language ModelingLanguage Modelling	—Unverified
MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval	May 26, 2025	Image RetrievalLarge Language Model	—Unverified
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models	May 26, 2025	image-classificationImage Classification	CodeCode Available
OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model	May 25, 2025	Language ModelingLanguage Modelling	—Unverified
HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning	May 23, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning	May 22, 2025	Language ModelingLanguage Modelling	—Unverified
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification	May 21, 2025	Data AugmentationLarge Language Model	—Unverified
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval	May 21, 2025	AttributeImage Retrieval	—Unverified
MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling	May 21, 2025	Emotion RecognitionFace Detection	—Unverified
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation	May 20, 2025	Image GenerationLanguage Modeling	—Unverified
CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring	May 20, 2025	Automated Essay ScoringDiversity	—Unverified
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning	May 20, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling	May 19, 2025	Graph GenerationKnowledge Distillation	—Unverified
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO	May 19, 2025	DecoderImage Generation	CodeCode Available
Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering	May 17, 2025	Document RankingLarge Language Model	—Unverified
Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning	May 10, 2025	Image AugmentationLarge Language Model	CodeCode Available
MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills	May 9, 2025	Image RetouchingLarge Language Model	—Unverified
Is your multimodal large language model a good science tutor?	May 9, 2025	Language ModelingLanguage Modelling	—Unverified
On Path to Multimodal Generalist: General-Level and General-Bench	May 7, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
Consistency-aware Fake Videos Detection on Short Video Platforms	Apr 30, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation	Apr 24, 2025	Caption GenerationDense Video Captioning	—Unverified
FaceInsight: A Multimodal Large Language Model for Face Perception	Apr 22, 2025	Language ModelingLanguage Modelling	—Unverified
ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images	Apr 17, 2025	Language ModelingLanguage Modelling	—Unverified

Show:10 25 50

← PrevPage 7 of 14Next →

No leaderboard results yet.