| Layout Generation Agents with Large Language Models | May 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models | Oct 15, 2024 | HallucinationLarge Language Model | CodeCode Available | 0 | 5 |
| Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts | Nov 18, 2024 | BenchmarkingMultimodal Large Language Model | CodeCode Available | 0 | 5 |
| MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Apr 9, 2025 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 | 5 |
| MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation | Sep 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Leveraging Multimodal LLM for Inspirational User Interface Search | Jan 29, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model | Dec 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection | Mar 5, 2024 | Concept AlignmentExplanation Generation | —Unverified | 0 | 0 |
| SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability | Mar 18, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| ST^3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming | Dec 28, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| StreetviewLLM: Extracting Geographic Information Using a Chain-of-Thought Multimodal Large Language Model | Nov 19, 2024 | Decision MakingLanguage Modeling | —Unverified | 0 | 0 |
| Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization | Mar 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults | Dec 22, 2024 | Data AugmentationFault Diagnosis | —Unverified | 0 | 0 |
| TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model | Jul 8, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| The NTNU System at the S&I Challenge 2025 SLA Open Track | Jun 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge | Jun 18, 2024 | Few-Shot Object DetectionLanguage Modeling | —Unverified | 0 | 0 |
| Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation | May 27, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 | 0 |
| TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation | Apr 24, 2025 | Caption GenerationDense Video Captioning | —Unverified | 0 | 0 |
| MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques | Jun 5, 2025 | cross-modal alignmentLarge Language Model | —Unverified | 0 | 0 |
| Towards Visual Text Grounding of Multimodal Large Language Model | Apr 7, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security | Apr 8, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation | May 20, 2025 | Image GenerationLanguage Modeling | —Unverified | 0 | 0 |
| UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion | Jan 24, 2024 | Conditional Image GenerationDenoising | —Unverified | 0 | 0 |
| Universal Item Tokenization for Transferable Generative Recommendation | Apr 6, 2025 | General KnowledgeLarge Language Model | —Unverified | 0 | 0 |