Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 326–347 of 347 papers

Title	Date	Tasks	Status
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions	May 27, 2025	Audio-Visual SynchronizationConversational Response Generation	—Unverified
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks	Jan 14, 2025	Language ModelingLanguage Modelling	—Unverified
On Fairness of Unified Multimodal Large Language Model for Image Generation	Feb 5, 2025	FairnessImage Generation	—Unverified
On Path to Multimodal Generalist: General-Level and General-Bench	May 7, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model	May 25, 2025	Language ModelingLanguage Modelling	—Unverified
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources	Apr 1, 2025	GPULarge Language Model	—Unverified
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy	Feb 27, 2025	Large Language ModelMinecraft	—Unverified
Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training	Mar 31, 2025	GPULanguage Modeling	—Unverified
ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling	May 19, 2025	Graph GenerationKnowledge Distillation	—Unverified
OrthoDoc: Multimodal Large Language Model for Assisting Diagnosis in Computed Tomography	Aug 30, 2024	Computed Tomography (CT)Diagnostic	—Unverified
PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis	Aug 18, 2024	Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA)	—Unverified
Parking, Perception, and Retail: Street-Level Determinants of Community Vitality in Harbin	Jun 5, 2025	Large Language ModelMorphological Analysis	—Unverified
PHRASED: Phrase Dictionary Biasing for Speech Translation	Jun 10, 2025	Language ModelingLanguage Modelling	—Unverified
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks	Mar 6, 2025	document understandingLanguage Modeling	—Unverified
Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model	Apr 9, 2025	Image Quality AssessmentImage Restoration	—Unverified
RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models	Apr 18, 2024	Fact CheckingLanguage Modeling	—Unverified
Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model	Nov 29, 2024	Autonomous VehiclesLanguage Modeling	—Unverified
RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction	Oct 7, 2024	Language ModelingLanguage Modelling	—Unverified
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation	May 30, 2025	Autonomous DrivingAutonomous Vehicles	—Unverified
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation	Jan 1, 2025	Autonomous DrivingAutonomous Vehicles	—Unverified
Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation	May 27, 2024	Instruction FollowingLanguage Modeling	—Unverified
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization	Apr 4, 2024	Grasp GenerationLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 14 of 14Next →

No leaderboard results yet.