Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–275 of 347 papers

Title	Date	Tasks	Status
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models	Nov 11, 2024	2D Pose EstimationCategory-Agnostic Pose Estimation	—Unverified
TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein Engineering	Nov 9, 2024	Information RetrievalLanguage Modeling	CodeCode Available
ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model	Nov 4, 2024	Language ModelingLanguage Modelling	—Unverified
Can Multimodal Large Language Model Think Analogically?	Nov 2, 2024	Language ModelingLanguage Modelling	—Unverified
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach	Oct 31, 2024	Language ModelingLanguage Modelling	—Unverified
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	Oct 24, 2024	DiversityLanguage Modeling	—Unverified
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks	Oct 24, 2024	image-classificationImage Classification	—Unverified
Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations	Oct 22, 2024	Camouflaged Object SegmentationLarge Language Model	CodeCode Available
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound	Oct 19, 2024	Instruction FollowingKnowledge Distillation	—Unverified
Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models	Oct 15, 2024	HallucinationLarge Language Model	CodeCode Available
MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description	Oct 15, 2024	Language ModelingLanguage Modelling	—Unverified
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization	Oct 14, 2024	Explanation GenerationImage Forgery Detection	—Unverified
ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation	Oct 11, 2024	DiagnosticLanguage Modeling	—Unverified
RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction	Oct 7, 2024	Language ModelingLanguage Modelling	—Unverified
SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM Inversion	Oct 3, 2024	Adversarial AttackDenoising	CodeCode Available
OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects	Oct 2, 2024	Language ModelingLanguage Modelling	—Unverified
VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection	Sep 30, 2024	Anomaly DetectionLanguage Modeling	—Unverified
MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation	Sep 29, 2024	Language ModelingLanguage Modelling	CodeCode Available
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches	Sep 26, 2024	Language ModelingLanguage Modelling	—Unverified
EAGLE: Egocentric AGgregated Language-video Engine	Sep 26, 2024	Action RecognitionActivity Recognition	—Unverified
CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation	Sep 24, 2024	Contrastive LearningLanguage Modeling	—Unverified
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference	Sep 18, 2024	Image CaptioningLarge Language Model	—Unverified
Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles	Sep 10, 2024	Autonomous VehiclesLanguage Modeling	—Unverified
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding	Sep 10, 2024	BenchmarkingLanguage Modeling	CodeCode Available
MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning	Sep 9, 2024	Federated LearningImage Captioning	—Unverified

Show:10 25 50

← PrevPage 11 of 14Next →

No leaderboard results yet.