| Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization | Jun 6, 2024 | DenoisingImage Generation | CodeCode Available | 3 | 5 |
| WHAC: World-grounded Humans and Cameras | Mar 19, 2024 | Camera Pose EstimationPose Estimation | CodeCode Available | 3 | 5 |
| GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations | Feb 19, 2024 | Card GamesLogical Reasoning | CodeCode Available | 3 | 5 |
| Generative AI Act II: Test Time Scaling Drives Cognition Engineering | Apr 18, 2025 | Prompt Engineering | CodeCode Available | 3 | 5 |
| ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models | Oct 25, 2024 | | CodeCode Available | 3 | 5 |
| Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning | Feb 12, 2025 | RAGText to SQL | CodeCode Available | 3 | 5 |
| Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI | Jan 25, 2024 | | CodeCode Available | 3 | 5 |
| Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | Oct 7, 2024 | Natural Language Visual GroundingNavigate | CodeCode Available | 3 | 5 |
| AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models | May 22, 2025 | BenchmarkingFairness | CodeCode Available | 3 | 5 |
| From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models | Mar 18, 2024 | Chart UnderstandingData Visualization | CodeCode Available | 3 | 5 |
| DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models | Jun 17, 2024 | Document ClassificationVisual Grounding | CodeCode Available | 3 | 5 |
| Chain of Draft: Thinking Faster by Writing Less | Feb 25, 2025 | | CodeCode Available | 3 | 5 |
| Data Augmentation for Sequential Recommendation: A Survey | Sep 20, 2024 | Data AugmentationRecommendation Systems | CodeCode Available | 3 | 5 |
| Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale | Sep 25, 2024 | Large Language Model | CodeCode Available | 3 | 5 |
| MLVU: Benchmarking Multi-task Long Video Understanding | Jun 6, 2024 | BenchmarkingVideo Understanding | CodeCode Available | 3 | 5 |
| UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition | Nov 27, 2023 | Image ClassificationObject Detection | CodeCode Available | 3 | 5 |
| ECON: Explicit Clothed humans Optimized via Normal integration | Dec 14, 2022 | 3D Human ReconstructionSurface Reconstruction | CodeCode Available | 3 | 5 |
| Partially Rewriting a Transformer in Natural Language | Jan 31, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| A Clean Slate for Offline Reinforcement Learning | Apr 15, 2025 | Offline RLreinforcement-learning | CodeCode Available | 3 | 5 |
| MarioGPT: Open-Ended Text2Level Generation through Large Language Models | Feb 12, 2023 | | CodeCode Available | 3 | 5 |
| PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural Map | Feb 9, 2025 | | CodeCode Available | 3 | 5 |
| VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models | Jun 19, 2024 | GPULanguage Modeling | CodeCode Available | 3 | 5 |
| OS-ATLAS: A Foundation Action Model for Generalist GUI Agents | Oct 30, 2024 | Natural Language Visual Grounding | CodeCode Available | 3 | 5 |
| HadaCore: Tensor Core Accelerated Hadamard Transform Kernel | Dec 12, 2024 | GPUMMLU | CodeCode Available | 3 | 5 |
| Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling | Aug 16, 2024 | Retrieval | CodeCode Available | 3 | 5 |
| Description Boosting for Zero-Shot Entity and Relation Classification | Jun 4, 2024 | RelationRelation Classification | CodeCode Available | 3 | 5 |
| LibCity: A Unified Library Towards Efficient and Comprehensive Urban Spatial-Temporal Prediction | Apr 27, 2023 | Prediction | CodeCode Available | 3 | 5 |
| Bird-Eye Transformers for Text Generation Models | Oct 8, 2022 | AttributeInductive Bias | CodeCode Available | 3 | 5 |
| Lightplane: Highly-Scalable Components for Neural 3D Fields | Apr 30, 2024 | 3D Reconstruction | CodeCode Available | 3 | 5 |
| Apollo: Band-sequence Modeling for High-Quality Audio Restoration | Sep 13, 2024 | Computational EfficiencySpeech Enhancement | CodeCode Available | 3 | 5 |
| ExTrans: Multilingual Deep Reasoning Translation via Exemplar-Enhanced Reinforcement Learning | May 19, 2025 | Machine Translationreinforcement-learning | CodeCode Available | 3 | 5 |
| Image Quality Assessment for Magnetic Resonance Imaging | Mar 15, 2022 | DenoisingImage Enhancement | CodeCode Available | 3 | 5 |
| RoadBEV: Road Surface Reconstruction in Bird's Eye View | Apr 9, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 3 | 5 |
| MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse | Mar 24, 2025 | Layout GenerationReinforcement Learning (RL) | CodeCode Available | 3 | 5 |
| Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks | Nov 22, 2022 | Math | CodeCode Available | 3 | 5 |
| XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | Jul 14, 2022 | 2D Human Pose Estimation2D Object Detection | CodeCode Available | 3 | 5 |
| UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface | Mar 3, 2025 | Instance SegmentationReasoning Segmentation | CodeCode Available | 3 | 5 |
| PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation | Aug 14, 2024 | Speech Synthesistext-to-speech | CodeCode Available | 3 | 5 |
| RoMa: Robust Dense Feature Matching | May 24, 2023 | Camera Pose EstimationDecoder | CodeCode Available | 3 | 5 |
| Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models | Jul 22, 2024 | Image Animation | CodeCode Available | 3 | 5 |
| ViTamin: Designing Scalable Vision Models in the Vision-Language Era | Apr 2, 2024 | | CodeCode Available | 3 | 5 |
| Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization | Aug 15, 2024 | Speech Synthesis | CodeCode Available | 3 | 5 |
| Deep Learning for Multivariate Time Series Imputation: A Survey | Feb 6, 2024 | Deep LearningImputation | CodeCode Available | 3 | 5 |
| InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions | Feb 27, 2025 | Human-Object Interaction DetectionObject | CodeCode Available | 3 | 5 |
| PathoTune: Adapting Visual Foundation Model to Pathological Specialists | Mar 25, 2024 | model | CodeCode Available | 3 | 5 |
| SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL | Apr 15, 2025 | Inference Optimization | CodeCode Available | 3 | 5 |
| Bench: Extending Long Context Evaluation Beyond 100K Tokens | Feb 21, 2024 | | CodeCode Available | 3 | 5 |
| CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving | Oct 11, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 3 | 5 |
| MMSearch-R1: Incentivizing LMMs to Search | Jun 25, 2025 | RAGRetrieval-augmented Generation | CodeCode Available | 3 | 5 |
| Taming Stable Diffusion for Text to 360° Panorama Image Generation | Apr 11, 2024 | DenoisingImage Generation | CodeCode Available | 3 | 5 |