| Atom of Thoughts for Markov LLM Test-Time Scaling | Feb 17, 2025 | | CodeCode Available | 4 |
| A-MEM: Agentic Memory for LLM Agents | Feb 17, 2025 | Large Language Model | CodeCode Available | 4 |
| Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention | Feb 16, 2025 | | CodeCode Available | 4 |
| SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers | Feb 15, 2025 | Image AnimationPortrait Animation | CodeCode Available | 4 |
| KernelBench: Can LLMs Write Efficient GPU Kernels? | Feb 14, 2025 | GPU | CodeCode Available | 4 |
| SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models | Feb 13, 2025 | Question AnsweringRAG | CodeCode Available | 4 |
| Light-A-Video: Training-free Video Relighting via Progressive Light Fusion | Feb 12, 2025 | Image Relighting | CodeCode Available | 4 |
| AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society | Feb 12, 2025 | | CodeCode Available | 4 |
| Enhance-A-Video: Better Generated Video for Free | Feb 11, 2025 | Video Generation | CodeCode Available | 4 |
| Training Sparse Mixture Of Experts Text Embedding Models | Feb 11, 2025 | Mixture-of-ExpertsRAG | CodeCode Available | 4 |
| CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 4 |
| ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates | Feb 10, 2025 | Hierarchical Reinforcement LearningLanguage Modeling | CodeCode Available | 4 |
| Accelerating Data Processing and Benchmarking of AI Models for Pathology | Feb 10, 2025 | Benchmarking | CodeCode Available | 4 |
| Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM | Feb 10, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Self-Supervised Prompt Optimization | Feb 7, 2025 | | CodeCode Available | 4 |
| Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach | Feb 7, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Latent Swap Joint Diffusion for 2D Long-Form Latent Generation | Feb 7, 2025 | Audio GenerationDenoising | CodeCode Available | 4 |
| Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound | Feb 7, 2025 | Benchmarking | CodeCode Available | 4 |
| Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective | Feb 6, 2025 | | CodeCode Available | 4 |
| Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis | Feb 6, 2025 | Speech Synthesis | CodeCode Available | 4 |
| Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation | Feb 4, 2025 | BenchmarkingInformation Retrieval | CodeCode Available | 4 |
| Sundial: A Family of Highly Capable Time Series Foundation Models | Feb 2, 2025 | Representation LearningTime Series | CodeCode Available | 4 |
| Transcoders Beat Sparse Autoencoders for Interpretability | Jan 31, 2025 | | CodeCode Available | 4 |
| LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models | Jan 31, 2025 | Caption GenerationLanguage Modeling | CodeCode Available | 4 |
| Molecular-driven Foundation Model for Oncologic Pathology | Jan 28, 2025 | BenchmarkingDiagnostic | CodeCode Available | 4 |
| A foundation model for human-AI collaboration in medical literature mining | Jan 27, 2025 | Literature MiningSystematic Literature Review | CodeCode Available | 4 |
| Diffusion-Based Planning for Autonomous Driving with Flexible Guidance | Jan 26, 2025 | Autonomous DrivingImitation Learning | CodeCode Available | 4 |
| Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step | Jan 23, 2025 | Image GenerationText-to-Image Generation | CodeCode Available | 4 |
| TabularARGN: A Flexible and Efficient Auto-Regressive Framework for Generating High-Fidelity Synthetic Data | Jan 21, 2025 | FairnessImputation | CodeCode Available | 4 |
| Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models | Jan 20, 2025 | | CodeCode Available | 4 |
| A New Formulation of Lipschitz Constrained With Functional Gradient Learning for GANs | Jan 20, 2025 | DiversityImage Generation | CodeCode Available | 4 |
| Generating Structured Outputs from Language Models: Benchmark and Studies | Jan 18, 2025 | | CodeCode Available | 4 |
| DiffuEraser: A Diffusion Model for Video Inpainting | Jan 17, 2025 | modelOptical Flow Estimation | CodeCode Available | 4 |
| Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment | Jan 16, 2025 | Causal Inferencecounterfactual | CodeCode Available | 4 |
| MonSter: Marry Monodepth to Stereo Unleashes Power | Jan 15, 2025 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 4 |
| Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models | Jan 14, 2025 | BenchmarkingText-to-Video Generation | CodeCode Available | 4 |
| ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding | Jan 14, 2025 | RAGRetrieval | CodeCode Available | 4 |
| Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding | Jan 14, 2025 | Embodied Question AnsweringHallucination | CodeCode Available | 4 |
| 3DGS-to-PC: Convert a 3D Gaussian Splatting Scene into a Dense Point Cloud or Mesh | Jan 13, 2025 | 3DGSSurface Reconstruction | CodeCode Available | 4 |
| EdgeTAM: On-Device Track Anything Model | Jan 13, 2025 | modelVideo Segmentation | CodeCode Available | 4 |
| Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset | Jan 9, 2025 | Human Mesh RecoveryMotion Generation | CodeCode Available | 4 |
| The GAN is dead; long live the GAN! A Modern GAN Baseline | Jan 9, 2025 | Image Generation | CodeCode Available | 4 |
| RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark | Jan 8, 2025 | object-detectionObject Detection | CodeCode Available | 4 |
| Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control | Jan 7, 2025 | Video Generation | CodeCode Available | 4 |
| Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection | Jan 7, 2025 | Objectobject-detection | CodeCode Available | 4 |
| LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token | Jan 7, 2025 | GPUVisual Question Answering (VQA) | CodeCode Available | 4 |
| TransPixeler: Advancing Text-to-Video Generation with Transparency | Jan 6, 2025 | Text-to-Video GenerationVideo Generation | CodeCode Available | 4 |
| A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges | Jan 4, 2025 | FairnessHallucination | CodeCode Available | 4 |
| GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models | Jan 2, 2025 | Scene Understandingtext annotation | CodeCode Available | 4 |
| SVFR: A Unified Framework for Generalized Video Face Restoration | Jan 2, 2025 | ColorizationRepresentation Learning | CodeCode Available | 4 |