| SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Oct 21, 2024 | Heuristic SearchObject | CodeCode Available | 4 | 5 |
| TotalSegmentator: robust segmentation of 104 anatomical structures in CT images | Aug 11, 2022 | Segmentation | CodeCode Available | 4 | 5 |
| Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks | Jun 24, 2023 | PhilosophyTransfer Learning | CodeCode Available | 4 | 5 |
| Benchmarking Neural Network Training Algorithms | Jun 12, 2023 | Benchmarking | CodeCode Available | 4 | 5 |
| RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination | May 28, 2025 | Neural Rendering | CodeCode Available | 4 | 5 |
| XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation | Jun 26, 2025 | AttributeImage Generation | CodeCode Available | 4 | 5 |
| Deepchecks: A Library for Testing and Validating Machine Learning Models and Data | Mar 16, 2022 | BIG-bench Machine Learning | CodeCode Available | 4 | 5 |
| Effective Whole-body Pose Estimation with Two-stages Distillation | Jul 29, 2023 | 2D Human Pose EstimationKnowledge Distillation | CodeCode Available | 4 | 5 |
| CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets | May 30, 2024 | 2k3D geometry | CodeCode Available | 4 | 5 |
| The Importance of Directional Feedback for LLM-based Optimizers | May 26, 2024 | | CodeCode Available | 4 | 5 |
| Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation | Jun 13, 2023 | Patch MatchingTranslation | CodeCode Available | 4 | 5 |
| Theseus: A Library for Differentiable Nonlinear Optimization | Jul 19, 2022 | GPU | CodeCode Available | 4 | 5 |
| SnAG: Scalable and Accurate Video Grounding | Apr 2, 2024 | Video GroundingVideo Understanding | CodeCode Available | 4 | 5 |
| From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion | Aug 2, 2023 | | CodeCode Available | 4 | 5 |
| DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models | Sep 25, 2023 | Language ModellingLarge Language Model | CodeCode Available | 4 | 5 |
| FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance | Aug 15, 2024 | TARVideo Generation | CodeCode Available | 4 | 5 |
| Old Optimizer, New Norm: An Anthology | Sep 30, 2024 | | CodeCode Available | 4 | 5 |
| Time-LLM: Time Series Forecasting by Reprogramming Large Language Models | Oct 3, 2023 | Time SeriesTime Series Forecasting | CodeCode Available | 4 | 5 |
| The Llama 3 Herd of Models | Jul 31, 2024 | answerability predictionLanguage Modeling | CodeCode Available | 4 | 5 |
| ControlVAE: Tuning, Analytical Properties, and Performance Analysis | Oct 31, 2020 | DisentanglementImage Generation | CodeCode Available | 4 | 5 |
| UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height | Sep 17, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 4 | 5 |
| Diffusion Policy Policy Optimization | Sep 1, 2024 | continuous-controlContinuous Control | CodeCode Available | 4 | 5 |
| Scaling Granite Code Models to 128K Context | Jul 18, 2024 | 2k4k | CodeCode Available | 4 | 5 |
| AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society | Feb 12, 2025 | | CodeCode Available | 4 | 5 |
| Expressive Whole-Body 3D Gaussian Avatar | Jul 31, 2024 | 3DGSDiversity | CodeCode Available | 4 | 5 |
| An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition | Jul 21, 2015 | Optical Character Recognition (OCR)Scene Text Recognition | CodeCode Available | 4 | 5 |
| SiamMask: A Framework for Fast Online Object Tracking and Segmentation | Jul 5, 2022 | Multiple Object TrackingObject | CodeCode Available | 4 | 5 |
| RewardBench 2: Advancing Reward Model Evaluation | Jun 2, 2025 | Instruction Followingmodel | CodeCode Available | 4 | 5 |
| VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning | Jun 20, 2025 | NavigateVision-Language Navigation | CodeCode Available | 4 | 5 |
| HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation | Mar 15, 2024 | | CodeCode Available | 4 | 5 |
| The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits | Feb 27, 2024 | All | CodeCode Available | 4 | 5 |
| SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models | Dec 10, 2024 | Action RecognitionSpatial Reasoning | CodeCode Available | 4 | 5 |
| LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL | Mar 10, 2025 | Logical ReasoningMultimodal Reasoning | CodeCode Available | 4 | 5 |
| UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation | Jun 3, 2025 | Image Editing | CodeCode Available | 4 | 5 |
| Unified Reward Model for Multimodal Understanding and Generation | Mar 7, 2025 | Image Generationmodel | CodeCode Available | 4 | 5 |
| TorchRL: A data-driven decision-making library for PyTorch | Jun 1, 2023 | Computational EfficiencyDecision Making | CodeCode Available | 4 | 5 |
| What Makes Good In-Context Examples for GPT-3? | Jan 17, 2021 | Few-Shot LearningNatural Language Understanding | CodeCode Available | 4 | 5 |
| LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models | Feb 16, 2024 | | CodeCode Available | 4 | 5 |
| AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones | Nov 28, 2024 | 3D ReconstructionNovel View Synthesis | CodeCode Available | 4 | 5 |
| TOFU: A Task of Fictitious Unlearning for LLMs | Jan 11, 2024 | | CodeCode Available | 4 | 5 |
| Sundial: A Family of Highly Capable Time Series Foundation Models | Feb 2, 2025 | Representation LearningTime Series | CodeCode Available | 4 | 5 |
| FP8 Formats for Deep Learning | Sep 12, 2022 | Deep LearningQuantization | CodeCode Available | 4 | 5 |
| Gaussian Splatting SLAM | Dec 11, 2023 | 3DGS3D Reconstruction | CodeCode Available | 4 | 5 |
| Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers | Feb 29, 2024 | RetrievalText Retrieval | CodeCode Available | 4 | 5 |
| Fairness Implications of Encoding Protected Categorical Attributes | Jan 27, 2022 | FairnessFeature Engineering | CodeCode Available | 4 | 5 |
| Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models | Jun 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 | 5 |
| LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model | Dec 28, 2023 | Instance SegmentationLanguage Modeling | CodeCode Available | 4 | 5 |
| Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | Apr 10, 2024 | Book summarizationLanguage Modeling | CodeCode Available | 4 | 5 |
| X^2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction | Mar 27, 2025 | CT ReconstructionDecoder | CodeCode Available | 4 | 5 |
| LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing | Nov 1, 2023 | AllImage Generation | CodeCode Available | 4 | 5 |