| Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets | May 21, 2025 | Dataset GenerationDescriptive | —Unverified | 0 |
| Multimodal RAG-driven Anomaly Detection and Classification in Laser Powder Bed Fusion using Large Language Models | May 20, 2025 | Anomaly DetectionDescriptive | —Unverified | 0 |
| Descriptive Image-Text Matching with Graded Contextual Similarity | May 15, 2025 | DescriptiveImage-text matching | —Unverified | 0 |
| The Human-Data-Model Interaction Canvas for Visual Analytics | May 12, 2025 | Descriptive | —Unverified | 0 |
| Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models | May 11, 2025 | DescriptiveDiagnostic | CodeCode Available | 1 |
| Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration | May 11, 2025 | BenchmarkingDescriptive | —Unverified | 0 |
| Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding | May 10, 2025 | DescriptiveEmotion Recognition | CodeCode Available | 1 |
| KCluster: An LLM-based Clustering Approach to Knowledge Component Discovery | May 9, 2025 | ClusteringDescriptive | CodeCode Available | 0 |
| SweRank: Software Issue Localization with Code Ranking | May 7, 2025 | Descriptive | —Unverified | 0 |
| Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model | May 7, 2025 | Data AugmentationDescriptive | —Unverified | 0 |