| CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models | Nov 11, 2024 | 2D Pose EstimationCategory-Agnostic Pose Estimation | —Unverified | 0 |
| TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein Engineering | Nov 9, 2024 | Information RetrievalLanguage Modeling | CodeCode Available | 0 |
| ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model | Nov 4, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Can Multimodal Large Language Model Think Analogically? | Nov 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach | Oct 31, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms | Oct 24, 2024 | DiversityLanguage Modeling | —Unverified | 0 |
| Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks | Oct 24, 2024 | image-classificationImage Classification | —Unverified | 0 |
| Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations | Oct 22, 2024 | Camouflaged Object SegmentationLarge Language Model | CodeCode Available | 0 |
| LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound | Oct 19, 2024 | Instruction FollowingKnowledge Distillation | —Unverified | 0 |
| Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models | Oct 15, 2024 | HallucinationLarge Language Model | CodeCode Available | 0 |
| MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description | Oct 15, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization | Oct 14, 2024 | Explanation GenerationImage Forgery Detection | —Unverified | 0 |
| ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation | Oct 11, 2024 | DiagnosticLanguage Modeling | —Unverified | 0 |
| RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction | Oct 7, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM Inversion | Oct 3, 2024 | Adversarial AttackDenoising | CodeCode Available | 0 |
| OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects | Oct 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection | Sep 30, 2024 | Anomaly DetectionLanguage Modeling | —Unverified | 0 |
| MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation | Sep 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches | Sep 26, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| EAGLE: Egocentric AGgregated Language-video Engine | Sep 26, 2024 | Action RecognitionActivity Recognition | —Unverified | 0 |
| CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation | Sep 24, 2024 | Contrastive LearningLanguage Modeling | —Unverified | 0 |
| Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference | Sep 18, 2024 | Image CaptioningLarge Language Model | —Unverified | 0 |
| Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles | Sep 10, 2024 | Autonomous VehiclesLanguage Modeling | —Unverified | 0 |
| MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding | Sep 10, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning | Sep 9, 2024 | Federated LearningImage Captioning | —Unverified | 0 |
| A Medical Multimodal Large Language Model for Pediatric Pneumonia | Sep 4, 2024 | DiagnosticLanguage Modeling | —Unverified | 0 |
| DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing | Sep 2, 2024 | Image GenerationLanguage Modelling | —Unverified | 0 |
| Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction | Sep 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model | Sep 1, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| OrthoDoc: Multimodal Large Language Model for Assisting Diagnosis in Computed Tomography | Aug 30, 2024 | Computed Tomography (CT)Diagnostic | —Unverified | 0 |
| AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding | Aug 30, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model | Aug 21, 2024 | Computational EfficiencyLanguage Modeling | —Unverified | 0 |
| Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model | Aug 21, 2024 | Emotion RecognitionLanguage Modeling | —Unverified | 0 |
| CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion | Aug 21, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis | Aug 18, 2024 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | —Unverified | 0 |
| ChatGPT Meets Iris Biometrics | Aug 9, 2024 | Face RecognitionIris Recognition | —Unverified | 0 |
| VideoQA in the Era of LLMs: An Empirical Study | Aug 8, 2024 | Multimodal Large Language ModelVideo Question Answering | CodeCode Available | 0 |
| VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks | Jul 29, 2024 | Deep LearningDomain Generalization | —Unverified | 0 |
| LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models | Jul 27, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models | Jul 26, 2024 | DisentanglementLanguage Modeling | —Unverified | 0 |
| Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | Jul 25, 2024 | Image to textLanguage Modeling | —Unverified | 0 |
| Visual Text Generation in the Wild | Jul 19, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| A Neural Matrix Decomposition Recommender System Model based on the Multimodal Large Language Model | Jul 12, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing | Jul 8, 2024 | Image GenerationLanguage Modeling | —Unverified | 0 |
| MobileFlow: A Multimodal LLM For Mobile GUI Agent | Jul 5, 2024 | Action AnalysisLanguage Modelling | —Unverified | 0 |
| MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration | Jul 4, 2024 | DenoisingImage Restoration | —Unverified | 0 |
| Guardrails for avoiding harmful medical product recommendations and off-label promotion in generative AI models | Jun 24, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception | Jun 22, 2024 | Common Sense ReasoningLanguage Modelling | —Unverified | 0 |