| T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT | May 1, 2025 | Image GenerationReinforcement Learning (RL) | CodeCode Available | 4 |
| Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding | Jan 14, 2025 | Embodied Question AnsweringHallucination | CodeCode Available | 4 |
| Nomic Embed: Training a Reproducible Long Context Text Embedder | Feb 2, 2024 | | CodeCode Available | 4 |
| AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora | May 29, 2025 | graph constructionKnowledge Graphs | CodeCode Available | 4 |
| A Survey on Large Language Model based Autonomous Agents | Aug 22, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Discovering faster matrix multiplication algorithms with reinforcement learning | Oct 5, 2022 | Deep Reinforcement Learningreinforcement-learning | CodeCode Available | 4 |
| SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Oct 21, 2024 | Heuristic SearchObject | CodeCode Available | 4 |
| TotalSegmentator: robust segmentation of 104 anatomical structures in CT images | Aug 11, 2022 | Segmentation | CodeCode Available | 4 |
| Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks | Jun 24, 2023 | PhilosophyTransfer Learning | CodeCode Available | 4 |
| R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model | Mar 7, 2025 | Multimodal Reasoningreinforcement-learning | CodeCode Available | 4 |
| Benchmarking Neural Network Training Algorithms | Jun 12, 2023 | Benchmarking | CodeCode Available | 4 |
| Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis | Feb 6, 2025 | Speech Synthesis | CodeCode Available | 4 |
| Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications | Jan 11, 2024 | image-classificationImage Classification | CodeCode Available | 4 |
| Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation | Jun 6, 2022 | Image SegmentationInstance Segmentation | CodeCode Available | 4 |
| Baichuan 2: Open Large-scale Language Models | Sep 19, 2023 | Feature EngineeringGSM8K | CodeCode Available | 4 |
| SEED-Story: Multimodal Long Story Generation with Large Language Model | Jul 11, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 4 |
| VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents | Oct 14, 2024 | RAGRetrieval | CodeCode Available | 4 |
| Otter: A Multi-Modal Model with In-Context Instruction Tuning | May 5, 2023 | GPUIn-Context Learning | CodeCode Available | 4 |
| Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation | Feb 4, 2025 | BenchmarkingInformation Retrieval | CodeCode Available | 4 |
| Safurai 001: New Qualitative Approach for Code LLM Evaluation | Sep 20, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models | Mar 14, 2022 | CPUQuantization | CodeCode Available | 4 |
| RePaint: Inpainting using Denoising Diffusion Probabilistic Models | Jan 24, 2022 | DenoisingImage Inpainting | CodeCode Available | 4 |
| A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL | Nov 13, 2024 | DiversityIn-Context Learning | CodeCode Available | 4 |
| MTEB: Massive Text Embedding Benchmark | Oct 13, 2022 | BenchmarkingInformation Retrieval | CodeCode Available | 4 |
| R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning | Mar 7, 2025 | RAGReinforcement Learning (RL) | CodeCode Available | 4 |
| Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective | Feb 6, 2025 | | CodeCode Available | 4 |
| Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data | Jul 22, 2021 | Blind Super-ResolutionSuper-Resolution | CodeCode Available | 4 |
| FinBen: A Holistic Financial Benchmark for Large Language Models | Feb 20, 2024 | Question AnsweringRAG | CodeCode Available | 4 |
| SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models | Nov 7, 2024 | GPUQuantization | CodeCode Available | 4 |
| Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering | Feb 25, 2026 | | —Unverified | 3 |
| Self-Distillation Enables Continual Learning | Jan 27, 2026 | | —Unverified | 3 |
| Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks | Feb 6, 2026 | | —Unverified | 3 |
| LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans | Mar 19, 2026 | | —Unverified | 3 |
| CL-bench: A Benchmark for Context Learning | Feb 3, 2026 | | —Unverified | 3 |
| LLM-in-Sandbox Elicits General Agentic Intelligence | Feb 12, 2026 | | —Unverified | 3 |
| tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction | Mar 2, 2026 | | —Unverified | 3 |
| SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes | Feb 9, 2026 | | —Unverified | 3 |
| GEM: A Gym for Agentic LLMs | Mar 1, 2026 | | —Unverified | 3 |
| DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing | Feb 13, 2026 | | —Unverified | 3 |
| A Survey of Token Compression for Efficient Multimodal Large Language Models | Feb 1, 2026 | | —Unverified | 3 |
| LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory | Mar 3, 2026 | | —Unverified | 3 |
| HY3D-Bench: Generation of 3D Assets | Feb 3, 2026 | | —Unverified | 3 |
| Deep Delta Learning | Jan 29, 2026 | | —Unverified | 3 |
| HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing | Mar 7, 2026 | | —Unverified | 3 |
| AI Can Learn Scientific Taste | Mar 15, 2026 | | —Unverified | 3 |
| VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction | Mar 12, 2026 | | —Unverified | 3 |
| ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion | Jan 22, 2026 | | —Unverified | 3 |
| JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion | Jan 29, 2026 | | —Unverified | 3 |
| A Survey of Data Agents: Emerging Paradigm or Overstated Hype? | Feb 24, 2026 | | —Unverified | 3 |
| Generative Refocusing: Flexible Defocus Control from a Single Image | Mar 18, 2026 | | —Unverified | 3 |