| Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer | Dec 1, 2024 | Image AnimationPortrait Animation | CodeCode Available | 5 |
| GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation | Nov 27, 2024 | Depth EstimationDiversity | CodeCode Available | 5 |
| TS3-Codec: Transformer-Based Simple Streaming Single Codec | Nov 27, 2024 | Audio Compression | CodeCode Available | 5 |
| ShowUI: One Vision-Language-Action Model for GUI Visual Agent | Nov 26, 2024 | Instruction FollowingNatural Language Visual Grounding | CodeCode Available | 5 |
| StableAnimator: High-Quality Identity-Preserving Human Image Animation | Nov 26, 2024 | DenoisingFace Reenactment | CodeCode Available | 5 |
| Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection | Nov 23, 2024 | Face SwappingSynthetic Image Detection | CodeCode Available | 5 |
| OminiControl: Minimal and Universal Control for Diffusion Transformer | Nov 22, 2024 | | CodeCode Available | 5 |
| DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving | Nov 22, 2024 | Autonomous DrivingDenoising | CodeCode Available | 5 |
| XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | Nov 22, 2024 | GPU | CodeCode Available | 5 |
| MambaIRv2: Attentive State Space Restoration | Nov 22, 2024 | Computational EfficiencyImage Restoration | CodeCode Available | 5 |
| Multimodal Autoregressive Pre-training of Large Vision Encoders | Nov 21, 2024 | DecoderImage Classification | CodeCode Available | 5 |
| Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions | Nov 21, 2024 | Reinforcement Learning (RL) | CodeCode Available | 5 |
| DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Nov 21, 2024 | Long-tailed Object DetectionObject | CodeCode Available | 5 |
| OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs | Nov 21, 2024 | Retrieval | CodeCode Available | 5 |
| VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models | Nov 20, 2024 | BenchmarkingImage Generation | CodeCode Available | 5 |
| The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use | Nov 15, 2024 | | CodeCode Available | 5 |
| That Chip Has Sailed: A Critique of Unfounded Skepticism Around AI for Chip Design | Nov 15, 2024 | Deep Reinforcement Learning | CodeCode Available | 5 |
| Watermark Anything with Localized Messages | Nov 11, 2024 | | CodeCode Available | 5 |
| Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models | Nov 7, 2024 | Image Generation | CodeCode Available | 5 |
| Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent | Nov 4, 2024 | Logical ReasoningMathematical Problem-Solving | CodeCode Available | 5 |
| Randomized Autoregressive Visual Generation | Nov 1, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 5 |
| CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes | Nov 1, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 5 |
| Neural Fields in Robotics: A Survey | Oct 26, 2024 | 3D ReconstructionAutonomous Driving | CodeCode Available | 5 |
| DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation | Oct 24, 2024 | Image RestorationPrompt Learning | CodeCode Available | 5 |
| ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | Oct 23, 2024 | | CodeCode Available | 5 |
| R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models | Oct 23, 2024 | Diversity | CodeCode Available | 5 |
| TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis | Oct 21, 2024 | Anomaly DetectionImputation | CodeCode Available | 5 |
| Allegro: Open the Black Box of Commercial-Level Video Generation Model | Oct 20, 2024 | Video Generation | CodeCode Available | 5 |
| YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary | Oct 20, 2024 | object-detectionObject Detection | CodeCode Available | 5 |
| DepthSplat: Connecting Gaussian Splatting and Depth | Oct 17, 2024 | Depth EstimationNovel View Synthesis | CodeCode Available | 5 |
| FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation | Oct 16, 2024 | Audio GenerationGPU | CodeCode Available | 5 |
| Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities | Oct 15, 2024 | Language Modelling | CodeCode Available | 5 |
| KBLaM: Knowledge Base augmented Language Model | Oct 14, 2024 | 8kGPU | CodeCode Available | 5 |
| FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification | Oct 14, 2024 | Image Generation | CodeCode Available | 5 |
| Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts | Oct 14, 2024 | Mixture-of-ExpertsTime Series | CodeCode Available | 5 |
| OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models | Oct 12, 2024 | Mathreinforcement-learning | CodeCode Available | 5 |
| Conditional Generative Models for Contrast-Enhanced Synthesis of T1w and T1 Maps in Brain MRI | Oct 11, 2024 | Uncertainty Quantification | CodeCode Available | 5 |
| Low Bitrate High-Quality RVQGAN-based Discrete Speech Tokenizer | Oct 10, 2024 | | CodeCode Available | 5 |
| RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation | Oct 10, 2024 | Zero-shot Generalization | CodeCode Available | 5 |
| Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations | Oct 10, 2024 | Time Series ForecastingVideo Recognition | CodeCode Available | 5 |
| IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation | Oct 9, 2024 | AttributeImage Generation | CodeCode Available | 5 |
| Enabling Novel Mission Operations and Interactions with ROSA: The Robot Operating System Agent | Oct 9, 2024 | | CodeCode Available | 5 |
| Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think | Oct 9, 2024 | DenoisingImage Generation | CodeCode Available | 5 |
| MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering | Oct 9, 2024 | | CodeCode Available | 5 |
| Aria: An Open Multimodal Native Mixture-of-Experts Model | Oct 8, 2024 | Instruction FollowingMixture-of-Experts | CodeCode Available | 5 |
| MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion | Oct 4, 2024 | 4D reconstructionCamera Pose Estimation | CodeCode Available | 5 |
| LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning | Oct 3, 2024 | Efficient ExplorationMathematical Problem-Solving | CodeCode Available | 5 |
| Loki: An Open-Source Tool for Fact Verification | Oct 2, 2024 | Claim VerificationFact Checking | CodeCode Available | 5 |
| Maia-2: A Unified Model for Human-AI Alignment in Chess | Sep 30, 2024 | Decision Making | CodeCode Available | 5 |
| Showing Many Labels in Multi-label Classification Models: An Empirical Study of Adversarial Examples | Sep 26, 2024 | Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION | CodeCode Available | 5 |