| AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents | Mar 31, 2025 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving | Mar 31, 2025 | Autonomous Driving | CodeCode Available | 3 |
| VideoGen-Eval: Agent-based System for Video Generation Evaluation | Mar 30, 2025 | DiversityVideo Generation | CodeCode Available | 3 |
| From Panels to Prose: Generating Literary Narratives from Comics | Mar 30, 2025 | Optical Character Recognition (OCR) | CodeCode Available | 3 |
| ToRL: Scaling Tool-Integrated RL | Mar 30, 2025 | Mathreinforcement-learning | CodeCode Available | 3 |
| AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos | Mar 30, 2025 | | CodeCode Available | 3 |
| Efficient Inference for Large Reasoning Models: A Survey | Mar 29, 2025 | Survey | CodeCode Available | 3 |
| LSNet: See Large, Focus Small | Mar 29, 2025 | | CodeCode Available | 3 |
| WeatherMesh-3: Fast and accurate operational global weather forecasting | Mar 28, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |
| Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video | Mar 27, 2025 | Camera Pose EstimationDepth Estimation | CodeCode Available | 3 |
| Vision-to-Music Generation: A Survey | Mar 27, 2025 | multimodal generationMusic Generation | CodeCode Available | 3 |
| A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond | Mar 27, 2025 | Survey | CodeCode Available | 3 |
| Optimal Stepsize for Diffusion Sampling | Mar 27, 2025 | DenoisingImage Generation | CodeCode Available | 3 |
| HyperGraphRAG: Retrieval-Augmented Generation with Hypergraph-Structured Knowledge Representation | Mar 27, 2025 | RAGRetrieval | CodeCode Available | 3 |
| Exploring the Evolution of Physics Cognition in Video Generation: A Survey | Mar 27, 2025 | Video Generation | CodeCode Available | 3 |
| Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning | Mar 26, 2025 | Few-Shot LearningVisual Reasoning | CodeCode Available | 3 |
| Vision as LoRA | Mar 26, 2025 | | CodeCode Available | 3 |
| StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs | Mar 26, 2025 | Benchmarking | CodeCode Available | 3 |
| Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency | Mar 26, 2025 | DenoisingScene Generation | CodeCode Available | 3 |
| Long-Context Autoregressive Video Modeling with Next-Frame Prediction | Mar 25, 2025 | Text GenerationVideo Generation | CodeCode Available | 3 |
| ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback | Mar 25, 2025 | Text to SQLText-To-SQL | CodeCode Available | 3 |
| iNatAg: Multi-Class Classification Models Enabled by a Large-Scale Benchmark Dataset with 4.7M Images of 2,959 Crop and Weed Species | Mar 25, 2025 | Multi-class Classification | CodeCode Available | 3 |
| Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective | Mar 24, 2025 | Decision Making | CodeCode Available | 3 |
| Frequency Dynamic Convolution for Dense Image Prediction | Mar 24, 2025 | object-detectionObject Detection | CodeCode Available | 3 |
| AdaWorld: Learning Adaptable World Models with Latent Actions | Mar 24, 2025 | Future prediction | CodeCode Available | 3 |
| Defeating Prompt Injections by Design | Mar 24, 2025 | | CodeCode Available | 3 |
| MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse | Mar 24, 2025 | Layout GenerationReinforcement Learning (RL) | CodeCode Available | 3 |
| Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models | Mar 24, 2025 | 4kImage Generation | CodeCode Available | 3 |
| PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos | Mar 23, 2025 | 4D reconstructionDeformable Object Manipulation | CodeCode Available | 3 |
| SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining | Mar 23, 2025 | 3DGSBenchmarking | CodeCode Available | 3 |
| Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook | Mar 23, 2025 | 3D GenerationMedical Report Generation | CodeCode Available | 3 |
| Multi-Modality Representation Learning for Antibody-Antigen Interactions Prediction | Mar 22, 2025 | Graph AttentionPrediction | CodeCode Available | 3 |
| Halton Scheduler For Masked Generative Image Transformer | Mar 21, 2025 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| NdLinear Is All You Need for Representation Learning | Mar 21, 2025 | AllRepresentation Learning | CodeCode Available | 3 |
| Unreal-MAP: Unreal-Engine-Based General Platform for Multi-Agent Reinforcement Learning | Mar 20, 2025 | Multi-agent Reinforcement Learning | CodeCode Available | 3 |
| XAttention: Block Sparse Attention with Antidiagonal Scoring | Mar 20, 2025 | Video GenerationVideo Understanding | CodeCode Available | 3 |
| A Comprehensive Survey on Long Context Language Modeling | Mar 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| NeuralFoil: An Airfoil Aerodynamics Analysis Tool Using Physics-Informed Machine Learning | Mar 20, 2025 | Feature EngineeringPhysics-informed machine learning | CodeCode Available | 3 |
| Unleashing Vecset Diffusion Model for Fast Shape Generation | Mar 20, 2025 | 3D Generation3D Shape Generation | CodeCode Available | 3 |
| Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't | Mar 20, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 3 |
| SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks | Mar 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Vision-Speech Models: Teaching Speech Models to Converse about Images | Mar 19, 2025 | parameter-efficient fine-tuning | CodeCode Available | 3 |
| TripNet: Learning Large-scale High-fidelity 3D Car Aerodynamics with Triplane Networks | Mar 19, 2025 | 3D geometry | CodeCode Available | 3 |
| Measuring AI Ability to Complete Long Tasks | Mar 18, 2025 | Logical Reasoning | CodeCode Available | 3 |
| MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 3 |
| MoonCast: High-Quality Zero-Shot Podcast Generation | Mar 18, 2025 | Speech Synthesistext-to-speech | CodeCode Available | 3 |
| A Survey on Human Interaction Motion Generation | Mar 17, 2025 | Human DynamicsMotion Generation | CodeCode Available | 3 |
| Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait | Mar 17, 2025 | Computational EfficiencyDiversity | CodeCode Available | 3 |
| R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization | Mar 17, 2025 | | CodeCode Available | 3 |
| VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning | Mar 17, 2025 | Grounded Video Question AnsweringQuestion Answering | CodeCode Available | 3 |