| GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning | May 22, 2025 | AttributeImage Generation | CodeCode Available | 2 |
| DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling | May 16, 2025 | Attribute | CodeCode Available | 2 |
| Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation | May 7, 2025 | 3D GenerationAttribute | CodeCode Available | 2 |
| Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs | Apr 21, 2025 | AttributeCamera Pose Estimation | CodeCode Available | 2 |
| DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning | Apr 20, 2025 | AttributeFace Swapping | CodeCode Available | 2 |
| Objaverse++: Curated 3D Object Dataset with Quality Annotations | Apr 9, 2025 | 3D GenerationAttribute | CodeCode Available | 2 |
| OpenFACADES: An Open Framework for Architectural Caption and Attribute Data Enrichment via Street View Imagery | Apr 1, 2025 | Attribute | CodeCode Available | 2 |
| Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation | Mar 26, 2025 | AttributeSemantic Segmentation | CodeCode Available | 2 |
| Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models | Mar 18, 2025 | AnatomyAttribute | CodeCode Available | 2 |
| Is CLIP ideal? No. Can we fix it? Yes! | Mar 10, 2025 | AttributeNegation | CodeCode Available | 2 |
| Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving | Feb 11, 2025 | AttributeAutonomous Driving | CodeCode Available | 2 |
| Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models | Jan 25, 2025 | AttributeContrastive Learning | CodeCode Available | 2 |
| EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents | Jan 21, 2025 | AttributeQuestion Answering | CodeCode Available | 2 |
| MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control | Jan 4, 2025 | AttributeDenoising | CodeCode Available | 2 |
| DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution | Jan 1, 2025 | Attribute | CodeCode Available | 2 |
| LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations | Dec 11, 2024 | AttributeImage Generation | CodeCode Available | 2 |
| QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos | Dec 5, 2024 | AttributeQuantization | CodeCode Available | 2 |
| DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting | Nov 26, 2024 | AttributeDiversity | CodeCode Available | 2 |
| ResCLIP: Residual Attention for Training-free Dense Vision-language Inference | Nov 24, 2024 | AttributeSemantic Segmentation | CodeCode Available | 2 |
| Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings | Nov 12, 2024 | AttributeComputational Efficiency | CodeCode Available | 2 |
| Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis | Nov 11, 2024 | AttributeImage Generation | CodeCode Available | 2 |
| On the Role of Attention Heads in Large Language Model Safety | Oct 17, 2024 | AttributeLanguage Modeling | CodeCode Available | 2 |
| TRESTLE: A Model of Concept Formation in Structured Domains | Oct 14, 2024 | Attribute | CodeCode Available | 2 |
| LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation | Sep 30, 2024 | AttributeCollaborative Filtering | CodeCode Available | 2 |
| PerCo (SD): Open Perceptual Compression | Sep 30, 2024 | AttributeImage Compression | CodeCode Available | 2 |