| CoMotion: Concurrent Multi-person 3D Motion | Apr 16, 2025 | 3D Pose EstimationPose Estimation | CodeCode Available | 3 |
| Elucidating the Design Space of Multimodal Protein Language Models | Apr 15, 2025 | DiversityRepresentation Learning | CodeCode Available | 3 |
| DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning | Apr 15, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 3 |
| SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL | Apr 15, 2025 | Inference Optimization | CodeCode Available | 3 |
| REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers | Apr 15, 2025 | Image Generation | CodeCode Available | 3 |
| DataDecide: How to Predict Best Pretraining Data with Small Experiments | Apr 15, 2025 | ARCHellaSwag | CodeCode Available | 3 |
| Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning | Apr 15, 2025 | Automated Theorem ProvingLarge Language Model | CodeCode Available | 3 |
| REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites | Apr 15, 2025 | Autonomous Web NavigationBenchmarking | CodeCode Available | 3 |
| DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks | Apr 15, 2025 | | CodeCode Available | 3 |
| Efficient Reasoning Models: A Survey | Apr 15, 2025 | Knowledge DistillationModel Compression | CodeCode Available | 3 |
| A Clean Slate for Offline Reinforcement Learning | Apr 15, 2025 | Offline RLreinforcement-learning | CodeCode Available | 3 |
| Evaluation Report on MCP Servers | Apr 15, 2025 | Large Language Model | CodeCode Available | 3 |
| Ai2 Scholar QA: Organized Literature Synthesis with Attribution | Apr 15, 2025 | Question AnsweringRetrieval | CodeCode Available | 3 |
| A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce | Apr 15, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| RAKG:Document-level Retrieval Augmented Knowledge Graph Construction | Apr 14, 2025 | coreference-resolutionCoreference Resolution | CodeCode Available | 3 |
| The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report | Apr 14, 2025 | Super-Resolutionvalid | CodeCode Available | 3 |
| REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers | Apr 14, 2025 | | CodeCode Available | 3 |
| Deep Reasoning Translation via Reinforcement Learning | Apr 14, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 |
| GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents | Apr 14, 2025 | Vision-Language-Action | CodeCode Available | 3 |
| Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution | Apr 13, 2025 | GSM8KMath | CodeCode Available | 3 |
| TensorNEAT: A GPU-accelerated Library for NeuroEvolution of Augmenting Topologies | Apr 11, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |
| DocAgent: A Multi-Agent System for Automated Code Documentation Generation | Apr 11, 2025 | Code Documentation Generation | CodeCode Available | 3 |
| MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications | Apr 11, 2025 | GPU | CodeCode Available | 3 |
| GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation | Apr 11, 2025 | DecoderImage Generation | CodeCode Available | 3 |
| PixelFlow: Pixel-Space Generative Models with Flow | Apr 10, 2025 | Conditional Image GenerationImage Generation | CodeCode Available | 3 |
| Detect Anything 3D in the Wild | Apr 10, 2025 | 3D Object DetectionAutonomous Driving | CodeCode Available | 3 |
| Perception-R1: Pioneering Perception Policy with Reinforcement Learning | Apr 10, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 |
| Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory | Apr 10, 2025 | MathMMLU | CodeCode Available | 3 |
| Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction | Apr 10, 2025 | 3D Reconstruction4D reconstruction | CodeCode Available | 3 |
| VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning | Apr 9, 2025 | MVBenchObject Tracking | CodeCode Available | 3 |
| FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution | Apr 9, 2025 | 2kDecision Making | CodeCode Available | 3 |
| SEA-LION: Southeast Asian Languages in One Network | Apr 8, 2025 | | CodeCode Available | 3 |
| GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-III | Apr 8, 2025 | Computational EfficiencyCPU | CodeCode Available | 3 |
| DDT: Decoupled Diffusion Transformer | Apr 8, 2025 | DenoisingImage Generation | CodeCode Available | 3 |
| PromptHMR: Promptable Human Mesh Recovery | Apr 8, 2025 | 3D Human Pose EstimationHuman Mesh Recovery | CodeCode Available | 3 |
| Playing Non-Embedded Card-Based Games with Reinforcement Learning | Apr 7, 2025 | Board GamesDecision Making | CodeCode Available | 3 |
| DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation | Apr 7, 2025 | 3D geometryRGBD Semantic Segmentation | CodeCode Available | 3 |
| Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization | Apr 5, 2025 | 3D GenerationVideo Alignment | CodeCode Available | 3 |
| TrafficLLM: Enhancing Large Language Models for Network Traffic Analysis with Generic Traffic Representation | Apr 5, 2025 | | CodeCode Available | 3 |
| Scaling Analysis of Interleaved Speech-Text Language Models | Apr 3, 2025 | Transfer Learning | CodeCode Available | 3 |
| GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation | Apr 3, 2025 | Image GenerationWorld Knowledge | CodeCode Available | 3 |
| Affordable AI Assistants with Knowledge Graph of Thoughts | Apr 3, 2025 | Knowledge GraphsLLM real-life tasks | CodeCode Available | 3 |
| Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving | Apr 3, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning | Apr 3, 2025 | Image GenerationInstruction Following | CodeCode Available | 3 |
| Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation | Apr 3, 2025 | MambaTalking Head Generation | CodeCode Available | 3 |
| End-to-End Driving with Online Trajectory Evaluation via BEV World Model | Apr 2, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 3 |
| YourBench: Easy Custom Evaluation Sets for Everyone | Apr 2, 2025 | MMLU | CodeCode Available | 3 |
| AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction | Apr 1, 2025 | Image Generation | CodeCode Available | 3 |
| MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs | Apr 1, 2025 | Knowledge GraphsMathematical Reasoning | CodeCode Available | 3 |
| Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB | Apr 1, 2025 | Decision MakingRAG | CodeCode Available | 3 |