| OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography | Jun 26, 2025 | DeciphermentLarge Language Model | CodeCode Available | 0 | 5 |
| Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models | Oct 15, 2024 | HallucinationLarge Language Model | CodeCode Available | 0 | 5 |
| Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering | Dec 19, 2024 | Contrastive LearningLanguage Modeling | CodeCode Available | 0 | 5 |
| MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Apr 9, 2025 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 | 5 |
| MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios | Dec 27, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 | 5 |
| Leveraging Multimodal LLM for Inspirational User Interface Search | Jan 29, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding | Sep 10, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 | 5 |
| Dynamic Pyramid Network for Efficient Multimodal Large Language Model | Mar 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation | Sep 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning | Nov 17, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 0 | 5 |