| DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification | Jul 4, 2024 | DescriptiveDiversity | CodeCode Available | 2 | 5 |
| FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression | Jan 1, 2025 | Descriptive | CodeCode Available | 2 | 5 |
| GRiT: A Generative Region-to-text Transformer for Object Understanding | Dec 1, 2022 | DecoderDense Captioning | CodeCode Available | 2 | 5 |
| Customization Assistant for Text-to-image Generation | Dec 5, 2023 | DescriptiveImage Generation | CodeCode Available | 2 | 5 |
| PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning | Nov 21, 2022 | 3D Classification3D Object Detection | CodeCode Available | 2 | 5 |
| Q-Insight: Understanding Image Quality via Visual Reinforcement Learning | Mar 28, 2025 | DescriptiveImage Quality Assessment | CodeCode Available | 2 | 5 |
| Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models | Dec 14, 2023 | DescriptiveImage Quality Assessment | CodeCode Available | 2 | 5 |
| CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models | Jun 11, 2025 | counterfactualDescriptive | CodeCode Available | 2 | 5 |
| RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent | Jun 11, 2024 | AI AgentDescriptive | CodeCode Available | 2 | 5 |
| An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control | Mar 7, 2024 | Descriptive | CodeCode Available | 2 | 5 |