| DFlash: Block Diffusion for Flash Speculative Decoding | Feb 5, 2026 | | —Unverified | 4 |
| Causal World Modeling for Robot Control | Jan 29, 2026 | | —Unverified | 4 |
| A Pragmatic VLA Foundation Model | Feb 26, 2026 | | —Unverified | 4 |
| Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement | Nov 10, 2024 | AttributeImage Generation | CodeCode Available | 4 |
| Recognize Anything: A Strong Image Tagging Model | Jun 6, 2023 | modelSemantic Parsing | CodeCode Available | 4 |
| Replace Anyone in Videos | Sep 30, 2024 | Video GenerationVideo Inpainting | CodeCode Available | 4 |
| Phased Consistency Models | May 28, 2024 | Image GenerationVideo Generation | CodeCode Available | 4 |
| A Survey on Vision-Language-Action Models for Autonomous Driving | Jun 30, 2025 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 4 |
| EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything | Dec 1, 2023 | Decoderimage-classification | CodeCode Available | 4 |
| InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning | Feb 9, 2024 | Data AugmentationGSM8K | CodeCode Available | 4 |
| Training-free Regional Prompting for Diffusion Transformers | Nov 4, 2024 | Image GenerationText to Image Generation | CodeCode Available | 4 |
| Your ViT is Secretly an Image Segmentation Model | Mar 24, 2025 | DecoderImage Segmentation | CodeCode Available | 4 |
| SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation | Jan 24, 2024 | Image SegmentationMamba | CodeCode Available | 4 |
| MedMamba: Vision Mamba for Medical Image Classification | Mar 6, 2024 | Classificationimage-classification | CodeCode Available | 4 |
| CLAIMED -- the open source framework for building coarse-grained operators for accelerated discovery in science | Jul 12, 2023 | | CodeCode Available | 4 |
| SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator | Dec 16, 2024 | GSM8KLanguage Modeling | CodeCode Available | 4 |
| SVFR: A Unified Framework for Generalized Video Face Restoration | Jan 2, 2025 | ColorizationRepresentation Learning | CodeCode Available | 4 |
| Hidden Biases of End-to-End Driving Datasets | Dec 12, 2024 | Bench2DriveCARLA Leaderboard 2.0 | CodeCode Available | 4 |
| MoH: Multi-Head Attention as Mixture-of-Head Attention | Oct 15, 2024 | Mixture-of-Experts | CodeCode Available | 4 |
| Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference | Jul 16, 2024 | | CodeCode Available | 4 |
| Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free | May 10, 2025 | AttributeMixture-of-Experts | CodeCode Available | 4 |
| Partition Generative Modeling: Masked Modeling Without Masks | May 24, 2025 | Computational EfficiencyLanguage Modeling | CodeCode Available | 4 |
| You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement | Feb 8, 2024 | Image EnhancementLow-light Image Deblurring and Enhancement | CodeCode Available | 4 |
| InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language | May 9, 2023 | Language Modelling | CodeCode Available | 4 |
| Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion | Oct 5, 2023 | Image GenerationText to Image Generation | CodeCode Available | 4 |
| Retrieval-Augmented Generation with Hierarchical Knowledge | Mar 13, 2025 | Multi-hop Question AnsweringQuestion Answering | CodeCode Available | 4 |
| Light-A-Video: Training-free Video Relighting via Progressive Light Fusion | Feb 12, 2025 | Image Relighting | CodeCode Available | 4 |
| Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V | Oct 17, 2023 | Interactive SegmentationReferring Expression | CodeCode Available | 4 |
| UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining | Mar 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning | Mar 18, 2025 | 3D Face AnimationCommon Sense Reasoning | CodeCode Available | 4 |
| Scaling Law for Quantization-Aware Training | May 20, 2025 | Quantization | CodeCode Available | 4 |
| Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs | Oct 18, 2022 | Aspect ExtractionKnowledge Graphs | CodeCode Available | 4 |
| LIMA: Less Is More for Alignment | May 18, 2023 | Language Modellingreinforcement-learning | CodeCode Available | 4 |
| VToonify: Controllable High-Resolution Portrait Video Style Transfer | Sep 22, 2022 | Face AlignmentStyle Transfer | CodeCode Available | 4 |
| PP-YOLOE: An evolved version of YOLO | Mar 30, 2022 | 2D Object DetectionDense Object Detection | CodeCode Available | 4 |
| LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation | Nov 7, 2024 | Contrastive LearningImage Captioning | CodeCode Available | 4 |
| SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions | Mar 25, 2024 | DecoderGPU | CodeCode Available | 4 |
| BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text | Mar 27, 2024 | ArticlesLanguage Modeling | CodeCode Available | 4 |
| JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models | Mar 28, 2024 | | CodeCode Available | 4 |
| Self-attention Does Not Need O(n^2) Memory | Dec 10, 2021 | | CodeCode Available | 4 |
| Diffusion Models in Low-Level Vision: A Survey | Jun 17, 2024 | DenoisingSurvey | CodeCode Available | 4 |
| G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering | Feb 12, 2024 | Common Sense ReasoningGraph Classification | CodeCode Available | 4 |
| VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model | May 6, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 4 |
| Dólares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and English | Feb 12, 2024 | | CodeCode Available | 4 |
| SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference | Feb 25, 2025 | modelVideo Generation | CodeCode Available | 4 |
| Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation | Jan 16, 2024 | DecoderMachine Translation | CodeCode Available | 4 |
| DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search | Aug 15, 2024 | Automated Theorem ProvingLanguage Modeling | CodeCode Available | 4 |
| Conditional Prompt Learning for Vision-Language Models | Mar 10, 2022 | Domain GeneralizationPrompt Engineering | CodeCode Available | 4 |
| DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing | Feb 4, 2024 | Image Generation | CodeCode Available | 4 |
| Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 | Nov 17, 2023 | | CodeCode Available | 4 |