| GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment | Oct 17, 2023 | AttributeObject | CodeCode Available | 2 |
| GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning | May 22, 2025 | AttributeImage Generation | CodeCode Available | 2 |
| GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest | Jul 7, 2023 | AttributeCommon Sense Reasoning | CodeCode Available | 2 |
| Hierarchical Fine-Grained Image Forgery Detection and Localization | Mar 30, 2023 | AttributeClassification | CodeCode Available | 2 |
| EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents | Jan 21, 2025 | AttributeQuestion Answering | CodeCode Available | 2 |
| DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution | Jan 1, 2025 | Attribute | CodeCode Available | 2 |
| Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation | Mar 26, 2025 | AttributeSemantic Segmentation | CodeCode Available | 2 |
| Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models | Jan 25, 2025 | AttributeContrastive Learning | CodeCode Available | 2 |
| DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution | May 25, 2024 | Attribute | CodeCode Available | 2 |
| DigiFace-1M: 1 Million Digital Face Images for Face Recognition | Oct 5, 2022 | AttributeFace Recognition | CodeCode Available | 2 |