| VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection | Sep 30, 2024 | Anomaly DetectionLanguage Modeling | —Unverified | 0 |
| MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation | Sep 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 |
| CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling | Sep 28, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| EAGLE: Egocentric AGgregated Language-video Engine | Sep 26, 2024 | Action RecognitionActivity Recognition | —Unverified | 0 |
| CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches | Sep 26, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation | Sep 24, 2024 | Contrastive LearningLanguage Modeling | —Unverified | 0 |
| Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference | Sep 18, 2024 | Image CaptioningLarge Language Model | —Unverified | 0 |
| MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding | Sep 10, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles | Sep 10, 2024 | Autonomous VehiclesLanguage Modeling | —Unverified | 0 |
| MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning | Sep 9, 2024 | Federated LearningImage Captioning | —Unverified | 0 |
| TextToucher: Fine-Grained Text-to-Touch Generation | Sep 9, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| A Medical Multimodal Large Language Model for Pediatric Pneumonia | Sep 4, 2024 | DiagnosticLanguage Modeling | —Unverified | 0 |
| DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing | Sep 2, 2024 | Image GenerationLanguage Modelling | —Unverified | 0 |
| Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction | Sep 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model | Sep 1, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| OrthoDoc: Multimodal Large Language Model for Assisting Diagnosis in Computed Tomography | Aug 30, 2024 | Computed Tomography (CT)Diagnostic | —Unverified | 0 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 |
| AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding | Aug 30, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion | Aug 21, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding | Aug 21, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model | Aug 21, 2024 | Computational EfficiencyLanguage Modeling | —Unverified | 0 |
| Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model | Aug 21, 2024 | Emotion RecognitionLanguage Modeling | —Unverified | 0 |