| Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation | Jun 13, 2023 | Patch MatchingTranslation | CodeCode Available | 4 |
| Theseus: A Library for Differentiable Nonlinear Optimization | Jul 19, 2022 | GPU | CodeCode Available | 4 |
| SnAG: Scalable and Accurate Video Grounding | Apr 2, 2024 | Video GroundingVideo Understanding | CodeCode Available | 4 |
| From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion | Aug 2, 2023 | | CodeCode Available | 4 |
| DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models | Sep 25, 2023 | Language ModellingLarge Language Model | CodeCode Available | 4 |
| FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance | Aug 15, 2024 | TARVideo Generation | CodeCode Available | 4 |
| Old Optimizer, New Norm: An Anthology | Sep 30, 2024 | | CodeCode Available | 4 |
| Time-LLM: Time Series Forecasting by Reprogramming Large Language Models | Oct 3, 2023 | Time SeriesTime Series Forecasting | CodeCode Available | 4 |
| The Llama 3 Herd of Models | Jul 31, 2024 | answerability predictionLanguage Modeling | CodeCode Available | 4 |
| ControlVAE: Tuning, Analytical Properties, and Performance Analysis | Oct 31, 2020 | DisentanglementImage Generation | CodeCode Available | 4 |
| UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height | Sep 17, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 4 |
| Diffusion Policy Policy Optimization | Sep 1, 2024 | continuous-controlContinuous Control | CodeCode Available | 4 |
| Scaling Granite Code Models to 128K Context | Jul 18, 2024 | 2k4k | CodeCode Available | 4 |
| Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement | Nov 10, 2024 | AttributeImage Generation | CodeCode Available | 4 |
| Recognize Anything: A Strong Image Tagging Model | Jun 6, 2023 | modelSemantic Parsing | CodeCode Available | 4 |
| Replace Anyone in Videos | Sep 30, 2024 | Video GenerationVideo Inpainting | CodeCode Available | 4 |
| Phased Consistency Models | May 28, 2024 | Image GenerationVideo Generation | CodeCode Available | 4 |
| A Survey on Vision-Language-Action Models for Autonomous Driving | Jun 30, 2025 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 4 |
| EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything | Dec 1, 2023 | Decoderimage-classification | CodeCode Available | 4 |
| InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning | Feb 9, 2024 | Data AugmentationGSM8K | CodeCode Available | 4 |
| Training-free Regional Prompting for Diffusion Transformers | Nov 4, 2024 | Image GenerationText to Image Generation | CodeCode Available | 4 |
| Your ViT is Secretly an Image Segmentation Model | Mar 24, 2025 | DecoderImage Segmentation | CodeCode Available | 4 |
| SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation | Jan 24, 2024 | Image SegmentationMamba | CodeCode Available | 4 |
| MedMamba: Vision Mamba for Medical Image Classification | Mar 6, 2024 | Classificationimage-classification | CodeCode Available | 4 |
| CLAIMED -- the open source framework for building coarse-grained operators for accelerated discovery in science | Jul 12, 2023 | | CodeCode Available | 4 |
| SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator | Dec 16, 2024 | GSM8KLanguage Modeling | CodeCode Available | 4 |
| SVFR: A Unified Framework for Generalized Video Face Restoration | Jan 2, 2025 | ColorizationRepresentation Learning | CodeCode Available | 4 |
| Hidden Biases of End-to-End Driving Datasets | Dec 12, 2024 | Bench2DriveCARLA Leaderboard 2.0 | CodeCode Available | 4 |
| MoH: Multi-Head Attention as Mixture-of-Head Attention | Oct 15, 2024 | Mixture-of-Experts | CodeCode Available | 4 |
| Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference | Jul 16, 2024 | | CodeCode Available | 4 |
| Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free | May 10, 2025 | AttributeMixture-of-Experts | CodeCode Available | 4 |
| Partition Generative Modeling: Masked Modeling Without Masks | May 24, 2025 | Computational EfficiencyLanguage Modeling | CodeCode Available | 4 |
| You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement | Feb 8, 2024 | Image EnhancementLow-light Image Deblurring and Enhancement | CodeCode Available | 4 |
| InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language | May 9, 2023 | Language Modelling | CodeCode Available | 4 |
| Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion | Oct 5, 2023 | Image GenerationText to Image Generation | CodeCode Available | 4 |
| Retrieval-Augmented Generation with Hierarchical Knowledge | Mar 13, 2025 | Multi-hop Question AnsweringQuestion Answering | CodeCode Available | 4 |
| Light-A-Video: Training-free Video Relighting via Progressive Light Fusion | Feb 12, 2025 | Image Relighting | CodeCode Available | 4 |
| Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V | Oct 17, 2023 | Interactive SegmentationReferring Expression | CodeCode Available | 4 |
| UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining | Mar 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning | Mar 18, 2025 | 3D Face AnimationCommon Sense Reasoning | CodeCode Available | 4 |
| Scaling Law for Quantization-Aware Training | May 20, 2025 | Quantization | CodeCode Available | 4 |
| Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs | Oct 18, 2022 | Aspect ExtractionKnowledge Graphs | CodeCode Available | 4 |
| LIMA: Less Is More for Alignment | May 18, 2023 | Language Modellingreinforcement-learning | CodeCode Available | 4 |
| VToonify: Controllable High-Resolution Portrait Video Style Transfer | Sep 22, 2022 | Face AlignmentStyle Transfer | CodeCode Available | 4 |
| PP-YOLOE: An evolved version of YOLO | Mar 30, 2022 | 2D Object DetectionDense Object Detection | CodeCode Available | 4 |
| LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation | Nov 7, 2024 | Contrastive LearningImage Captioning | CodeCode Available | 4 |
| SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions | Mar 25, 2024 | DecoderGPU | CodeCode Available | 4 |
| BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text | Mar 27, 2024 | ArticlesLanguage Modeling | CodeCode Available | 4 |
| JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models | Mar 28, 2024 | | CodeCode Available | 4 |
| Self-attention Does Not Need O(n^2) Memory | Dec 10, 2021 | | CodeCode Available | 4 |