| Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving | Feb 11, 2025 | AttributeAutonomous Driving | CodeCode Available | 2 |
| Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models | Jan 25, 2025 | AttributeContrastive Learning | CodeCode Available | 2 |
| EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents | Jan 21, 2025 | AttributeQuestion Answering | CodeCode Available | 2 |
| MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control | Jan 4, 2025 | AttributeDenoising | CodeCode Available | 2 |
| DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution | Jan 1, 2025 | Attribute | CodeCode Available | 2 |
| LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations | Dec 11, 2024 | AttributeImage Generation | CodeCode Available | 2 |
| QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos | Dec 5, 2024 | AttributeQuantization | CodeCode Available | 2 |
| DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting | Nov 26, 2024 | AttributeDiversity | CodeCode Available | 2 |
| ResCLIP: Residual Attention for Training-free Dense Vision-language Inference | Nov 24, 2024 | AttributeSemantic Segmentation | CodeCode Available | 2 |
| Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings | Nov 12, 2024 | AttributeComputational Efficiency | CodeCode Available | 2 |