| GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning | May 22, 2025 | AttributeImage Generation | CodeCode Available | 2 |
| DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling | May 16, 2025 | Attribute | CodeCode Available | 2 |
| Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation | May 7, 2025 | 3D GenerationAttribute | CodeCode Available | 2 |
| Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs | Apr 21, 2025 | AttributeCamera Pose Estimation | CodeCode Available | 2 |
| DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning | Apr 20, 2025 | AttributeFace Swapping | CodeCode Available | 2 |
| Objaverse++: Curated 3D Object Dataset with Quality Annotations | Apr 9, 2025 | 3D GenerationAttribute | CodeCode Available | 2 |
| OpenFACADES: An Open Framework for Architectural Caption and Attribute Data Enrichment via Street View Imagery | Apr 1, 2025 | Attribute | CodeCode Available | 2 |
| Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation | Mar 26, 2025 | AttributeSemantic Segmentation | CodeCode Available | 2 |
| Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models | Mar 18, 2025 | AnatomyAttribute | CodeCode Available | 2 |
| Is CLIP ideal? No. Can we fix it? Yes! | Mar 10, 2025 | AttributeNegation | CodeCode Available | 2 |
| Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving | Feb 11, 2025 | AttributeAutonomous Driving | CodeCode Available | 2 |
| Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models | Jan 25, 2025 | AttributeContrastive Learning | CodeCode Available | 2 |
| EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents | Jan 21, 2025 | AttributeQuestion Answering | CodeCode Available | 2 |
| MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control | Jan 4, 2025 | AttributeDenoising | CodeCode Available | 2 |
| DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution | Jan 1, 2025 | Attribute | CodeCode Available | 2 |
| LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations | Dec 11, 2024 | AttributeImage Generation | CodeCode Available | 2 |
| QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos | Dec 5, 2024 | AttributeQuantization | CodeCode Available | 2 |
| DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting | Nov 26, 2024 | AttributeDiversity | CodeCode Available | 2 |
| ResCLIP: Residual Attention for Training-free Dense Vision-language Inference | Nov 24, 2024 | AttributeSemantic Segmentation | CodeCode Available | 2 |
| Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings | Nov 12, 2024 | AttributeComputational Efficiency | CodeCode Available | 2 |
| Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis | Nov 11, 2024 | AttributeImage Generation | CodeCode Available | 2 |
| On the Role of Attention Heads in Large Language Model Safety | Oct 17, 2024 | AttributeLanguage Modeling | CodeCode Available | 2 |
| TRESTLE: A Model of Concept Formation in Structured Domains | Oct 14, 2024 | Attribute | CodeCode Available | 2 |
| PerCo (SD): Open Perceptual Compression | Sep 30, 2024 | AttributeImage Compression | CodeCode Available | 2 |
| LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation | Sep 30, 2024 | AttributeCollaborative Filtering | CodeCode Available | 2 |
| Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration | Sep 28, 2024 | AllAttribute | CodeCode Available | 2 |
| Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks | Aug 7, 2024 | AttributeIn-Context Learning | CodeCode Available | 2 |
| T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation | Jul 19, 2024 | AttributeLanguage Modeling | CodeCode Available | 2 |
| ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement | Jul 9, 2024 | AttributeDisentanglement | CodeCode Available | 2 |
| UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models | Jun 27, 2024 | AttributeBenchmarking | CodeCode Available | 2 |
| RouteFinder: Towards Foundation Models for Vehicle Routing Problems | Jun 21, 2024 | AttributeMulti-Task Learning | CodeCode Available | 2 |
| Task Me Anything | Jun 17, 2024 | 2kAttribute | CodeCode Available | 2 |
| A Synthetic Dataset for Personal Attribute Inference | Jun 11, 2024 | AttributeAuthor Profiling | CodeCode Available | 2 |
| Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring | Jun 11, 2024 | AttributeDomain Generalization | CodeCode Available | 2 |
| MVGamba: Unify 3D Content Generation as State Space Sequence Modeling | Jun 10, 2024 | 3D GenerationAttribute | CodeCode Available | 2 |
| Binarized Diffusion Model for Image Super-Resolution | Jun 9, 2024 | AttributeBinarization | CodeCode Available | 2 |
| Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning | Jun 1, 2024 | AttributePhysics-informed machine learning | CodeCode Available | 2 |
| DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution | May 25, 2024 | Attribute | CodeCode Available | 2 |
| LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation | Apr 30, 2024 | AttributeSemantic Segmentation | CodeCode Available | 2 |
| CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding | Apr 22, 2024 | Attribute | CodeCode Available | 2 |
| CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching | Apr 4, 2024 | AttributeImage Captioning | CodeCode Available | 2 |
| LLM Attributor: Interactive Visual Attribution for LLM Generation | Apr 1, 2024 | ArticlesAttribute | CodeCode Available | 2 |
| Measuring Style Similarity in Diffusion Models | Apr 1, 2024 | AttributeStyle Detection | CodeCode Available | 2 |
| SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects | Mar 29, 2024 | 3D Object Detection3D Object Detection From Monocular Images | CodeCode Available | 2 |
| Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding | Mar 27, 2024 | AttributeDecision Making | CodeCode Available | 2 |
| Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions | Mar 25, 2024 | Attribute | CodeCode Available | 2 |
| Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt | Mar 18, 2024 | AttributeDecoder | CodeCode Available | 2 |
| Faceptor: A Generalist Model for Face Perception | Mar 14, 2024 | Age EstimationAttribute | CodeCode Available | 2 |
| Task Attribute Distance for Few-Shot Learning: Theoretical Analysis and Applications | Mar 6, 2024 | AttributeData Augmentation | CodeCode Available | 2 |
| RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations | Feb 27, 2024 | AttributeLanguage Modeling | CodeCode Available | 2 |