| Deep Generative Models on 3D Representations: A Survey | Oct 27, 2022 | 3D-Aware Image Synthesis3D Shape Generation | CodeCode Available | 3 | 5 |
| A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce | Apr 15, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 | 5 |
| ST-MoE: Designing Stable and Transferable Sparse Expert Models | Feb 17, 2022 | ARCCommon Sense Reasoning | CodeCode Available | 3 | 5 |
| Landmark Attention: Random-Access Infinite Context Length for Transformers | May 25, 2023 | Retrieval | CodeCode Available | 3 | 5 |
| Evaluating Hallucinations in Chinese Large Language Models | Oct 5, 2023 | HallucinationQuestion Answering | CodeCode Available | 3 | 5 |
| ViTPose++: Vision Transformer for Generic Body Pose Estimation | Dec 7, 2022 | 2D Human Pose EstimationAnimal Pose Estimation | CodeCode Available | 3 | 5 |
| FAN: Fourier Analysis Networks | Oct 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| FilterNet: Harnessing Frequency Filters for Time Series Forecasting | Nov 3, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 3 | 5 |
| QuEst: Graph Transformer for Quantum Circuit Reliability Estimation | Oct 30, 2022 | | CodeCode Available | 3 | 5 |
| WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction | Sep 24, 2024 | Managementspeech-recognition | CodeCode Available | 3 | 5 |
| KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction | May 29, 2025 | Question Answering | CodeCode Available | 3 | 5 |
| BERGEN: A Benchmarking Library for Retrieval-Augmented Generation | Jul 1, 2024 | BenchmarkingRAG | CodeCode Available | 3 | 5 |
| MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications | Apr 11, 2025 | GPU | CodeCode Available | 3 | 5 |
| Evaluating Text-to-Visual Generation with Image-to-Text Generation | Apr 1, 2024 | Image to textQuestion Answering | CodeCode Available | 3 | 5 |
| Attention Is All You Need | Jun 12, 2017 | Abstractive Text SummarizationAll | CodeCode Available | 3 | 5 |
| CodeTF: One-stop Transformer Library for State-of-the-art Code LLM | May 31, 2023 | | CodeCode Available | 3 | 5 |
| StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization | Dec 10, 2024 | Story Visualization | CodeCode Available | 3 | 5 |
| DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation | Dec 24, 2024 | Video EditingVideo Generation | CodeCode Available | 3 | 5 |
| Residual Kolmogorov-Arnold Network for Enhanced Deep Learning | Oct 7, 2024 | Computational EfficiencyDeep Learning | CodeCode Available | 3 | 5 |
| Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation | Apr 3, 2025 | MambaTalking Head Generation | CodeCode Available | 3 | 5 |
| AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One | Dec 10, 2023 | AllBenchmarking | CodeCode Available | 3 | 5 |
| A Survey on LoRA of Large Language Models | Jul 8, 2024 | Federated Learningparameter-efficient fine-tuning | CodeCode Available | 3 | 5 |
| VisionZip: Longer is Better but Not Necessary in Vision Language Models | Dec 5, 2024 | Video UnderstandingVisual Question Answering | CodeCode Available | 3 | 5 |
| Humans in 4D: Reconstructing and Tracking Humans with Transformers | May 31, 2023 | 3D Human Pose EstimationAction Recognition | CodeCode Available | 3 | 5 |
| Sigmoid Loss for Language Image Pre-Training | Mar 27, 2023 | Contrastive LearningDisentanglement | CodeCode Available | 3 | 5 |
| Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding | Feb 9, 2025 | Image CaptioningImage-text Retrieval | CodeCode Available | 3 | 5 |
| T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback | May 29, 2024 | Video Generation | CodeCode Available | 3 | 5 |
| Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning | Jun 10, 2024 | Multi-hop Question AnsweringQuestion Answering | CodeCode Available | 3 | 5 |
| Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation | Mar 20, 2024 | | CodeCode Available | 3 | 5 |
| Restoring Images in Adverse Weather Conditions via Histogram Transformer | Jul 14, 2024 | Image Restoration | CodeCode Available | 3 | 5 |
| GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding | Dec 17, 2024 | 3D Semantic Occupancy PredictionAutonomous Driving | CodeCode Available | 3 | 5 |
| NeuSpeech: Decode Neural signal as Speech | Mar 4, 2024 | Brain Computer InterfaceEEG | CodeCode Available | 3 | 5 |
| YOLOv4: Optimal Speed and Accuracy of Object Detection | Apr 23, 2020 | BIG-bench Machine LearningData Augmentation | CodeCode Available | 3 | 5 |
| ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding | Oct 23, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation | Feb 4, 2025 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 3 | 5 |
| EAT: Self-Supervised Pre-Training with Efficient Audio Transformer | Jan 7, 2024 | Audio ClassificationSelf-Supervised Learning | CodeCode Available | 3 | 5 |
| State Space Models for Event Cameras | Feb 23, 2024 | Event-based visionObject Detection | CodeCode Available | 3 | 5 |
| Inference Performance Optimization for Large Language Models on CPUs | Jul 10, 2024 | CPUGPU | CodeCode Available | 3 | 5 |
| OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models | Oct 2, 2024 | Benchmarking | CodeCode Available | 3 | 5 |
| UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving | Mar 31, 2025 | Autonomous Driving | CodeCode Available | 3 | 5 |
| Visual Prompt Tuning | Mar 23, 2022 | Image ClassificationLong-tail Learning | CodeCode Available | 3 | 5 |
| MoonCast: High-Quality Zero-Shot Podcast Generation | Mar 18, 2025 | Speech Synthesistext-to-speech | CodeCode Available | 3 | 5 |
| Finetuned Language Models Are Zero-Shot Learners | Sep 3, 2021 | ARCCommon Sense Reasoning | CodeCode Available | 3 | 5 |
| IMAGGarment-1: Fine-Grained Garment Generation for Controllable Fashion Design | Apr 17, 2025 | | CodeCode Available | 3 | 5 |
| Revisiting Pre-Trained Models for Chinese Natural Language Processing | Apr 29, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity | Mar 21, 2024 | Question AnsweringRAG | CodeCode Available | 3 | 5 |
| vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving | Jul 22, 2024 | CPUGPU | CodeCode Available | 3 | 5 |
| Frequency Dynamic Convolution for Dense Image Prediction | Mar 24, 2025 | object-detectionObject Detection | CodeCode Available | 3 | 5 |
| Accelerating Goal-Conditioned RL Algorithms and Research | Aug 20, 2024 | GPUreinforcement-learning | CodeCode Available | 3 | 5 |
| Jukebox: A Generative Model for Music | Apr 30, 2020 | model | CodeCode Available | 3 | 5 |