| ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMs | Jan 24, 2026 | | —Unverified | 4 |
| Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs | Jan 22, 2026 | | —Unverified | 4 |
| SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds | Jan 22, 2026 | | —Unverified | 4 |
| SpatialTrackerV2: 3D Point Tracking Made Easy | Jul 16, 2025 | 3D ReconstructionCamera Pose Estimation | CodeCode Available | 4 |
| Streaming 4D Visual Geometry Transformer | Jul 15, 2025 | 4D reconstructionPhilosophy | CodeCode Available | 4 |
| ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching | Jul 12, 2025 | Dialogue Generationtext-to-speech | CodeCode Available | 4 |
| XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL | Jul 7, 2025 | Text to SQLText-To-SQL | CodeCode Available | 4 |
| Energy-Based Transformers are Scalable Learners and Thinkers | Jul 2, 2025 | DenoisingImage Denoising | CodeCode Available | 4 |
| Kwai Keye-VL Technical Report | Jul 2, 2025 | Instruction FollowingReinforcement Learning (RL) | CodeCode Available | 4 |
| A Survey on Vision-Language-Action Models for Autonomous Driving | Jun 30, 2025 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 4 |
| WorldVLA: Towards Autoregressive Action World Model | Jun 26, 2025 | Action Generationmodel | CodeCode Available | 4 |
| XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation | Jun 26, 2025 | AttributeImage Generation | CodeCode Available | 4 |
| DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation | Jun 25, 2025 | Code GenerationDenoising | CodeCode Available | 4 |
| From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents | Jun 23, 2025 | Information RetrievalRetrieval | CodeCode Available | 4 |
| VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning | Jun 20, 2025 | NavigateVision-Language Navigation | CodeCode Available | 4 |
| YOLOv11-RGBT: Towards a Comprehensive Single-Stage Multispectral Object Detection Framework | Jun 17, 2025 | Multispectral Object Detectionobject-detection | CodeCode Available | 4 |
| ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching | Jun 16, 2025 | DecoderSpeech Synthesis | CodeCode Available | 4 |
| OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics | Jun 14, 2025 | Benchmarking | CodeCode Available | 4 |
| DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents | Jun 13, 2025 | Information RetrievalRetrieval | CodeCode Available | 4 |
| Ming-Omni: A Unified Multimodal Model for Perception and Generation | Jun 11, 2025 | Image Generationtext-to-speech | CodeCode Available | 4 |
| Efficient Part-level 3D Object Generation via Dual Volume Packing | Jun 11, 2025 | DiversityObject | CodeCode Available | 4 |
| SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement | Jun 9, 2025 | Music Generation | CodeCode Available | 4 |
| MiMo-VL Technical Report | Jun 4, 2025 | Multimodal Reasoning | CodeCode Available | 4 |
| Seed-Coder: Let the Code Model Curate Data for Itself | Jun 4, 2025 | Code CompletionCode Generation | CodeCode Available | 4 |
| Pseudo-Simulation for Autonomous Driving | Jun 4, 2025 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 4 |
| UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation | Jun 3, 2025 | Image Editing | CodeCode Available | 4 |
| Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning | Jun 3, 2025 | Code Generationreinforcement-learning | CodeCode Available | 4 |
| ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding | Jun 2, 2025 | 3D GenerationLarge Language Model | CodeCode Available | 4 |
| RewardBench 2: Advancing Reward Model Evaluation | Jun 2, 2025 | Instruction Followingmodel | CodeCode Available | 4 |
| GigaAM: Efficient Self-Supervised Learner for Speech Recognition | Jun 1, 2025 | Automatic Speech RecognitionLanguage Modeling | CodeCode Available | 4 |
| AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora | May 29, 2025 | graph constructionKnowledge Graphs | CodeCode Available | 4 |
| RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination | May 28, 2025 | Neural Rendering | CodeCode Available | 4 |
| Skywork Open Reasoner 1 Technical Report | May 28, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 4 |
| ImgEdit: A Unified Image Editing Dataset and Benchmark | May 26, 2025 | Image Editing | CodeCode Available | 4 |
| Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution | May 26, 2025 | | CodeCode Available | 4 |
| DeepInverse: A Python package for solving imaging inverse problems with deep learning | May 26, 2025 | Image Reconstruction | CodeCode Available | 4 |
| On Path to Multimodal Historical Reasoning: HistBench and HistAgent | May 26, 2025 | Optical Character Recognition (OCR) | CodeCode Available | 4 |
| GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation | May 26, 2025 | Question AnsweringSynthetic Data Generation | CodeCode Available | 4 |
| OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation | May 26, 2025 | Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video | CodeCode Available | 4 |
| LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders | May 24, 2025 | Adversarial RobustnessOut-of-Distribution Generalization | CodeCode Available | 4 |
| Partition Generative Modeling: Masked Modeling Without Masks | May 24, 2025 | Computational EfficiencyLanguage Modeling | CodeCode Available | 4 |
| A Survey of LLM DATA | May 24, 2025 | Large Language ModelManagement | CodeCode Available | 4 |
| Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning | May 23, 2025 | DecoderImage Captioning | CodeCode Available | 4 |
| Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models | May 23, 2025 | | CodeCode Available | 4 |
| Qiskit Machine Learning: an open-source library for quantum machine learning tasks at scale on quantum hardware and classical simulators | May 23, 2025 | Quantum Machine Learning | CodeCode Available | 4 |
| QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning | May 23, 2025 | Question AnsweringReinforcement Learning (RL) | CodeCode Available | 4 |
| R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning | May 22, 2025 | MemorizationRAG | CodeCode Available | 4 |
| Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO | May 22, 2025 | Domain GeneralizationImage Generation | CodeCode Available | 4 |
| SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis | May 22, 2025 | DiversityInformation Retrieval | CodeCode Available | 4 |
| lmgame-Bench: How Good are LLMs at Playing Games? | May 21, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 4 |