SOTAVerified

Multimodal Large Language Model

Papers

Showing 301347 of 347 papers

TitleStatusHype
Automated radiotherapy treatment planning guided by GPT-4Vision0
The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge0
TRINS: Towards Multimodal Language Models that Can ReadCode0
Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak0
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language ModelCode0
Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation0
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLMCode0
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability0
Layout Generation Agents with Large Language ModelsCode0
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition0
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source SuitesCode0
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation0
RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models0
GUIDE: Graphical User Interface Data for Execution0
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security0
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization0
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition0
VL-Mamba: Exploring State Space Models for Multimodal Learning0
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization0
Multimodal Transformer for Comics Text-Cloze0
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection0
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery0
LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic SurgeryCode0
MMMModal -- Multi-Images Multi-Audio Multi-turn Multi-Modal0
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks0
Lumos : Empowering Multimodal LLMs with Scene Text Recognition0
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education0
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs0
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion0
MLLMReID: Multimodal Large Language Model-based Person Re-identification0
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation0
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation0
Audio-Visual LLM for Video Understanding0
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model0
MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation0
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation0
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language ModelCode0
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation0
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model0
Multimodal Large Language Model for Visual Navigation0
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips0
Investigating the Catastrophic Forgetting in Multimodal Large Language Models0
Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems0
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning0
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document UnderstandingCode0
A Survey on Multimodal Large Language ModelsCode0
Language Is Not All You Need: Aligning Perception with Language ModelsCode0
Show:102550
← PrevPage 7 of 7Next →

No leaderboard results yet.