| PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies | Jun 9, 2022 | 3D Classification3D Part Segmentation | CodeCode Available | 3 | 5 |
| Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining | Apr 2, 2024 | Image ReconstructionRain Removal | CodeCode Available | 3 | 5 |
| Accelerating Transformer Inference for Translation via Parallel Decoding | May 17, 2023 | Machine TranslationTranslation | CodeCode Available | 3 | 5 |
| DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis | May 23, 2024 | Image GenerationMamba | CodeCode Available | 3 | 5 |
| GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing | Mar 13, 2025 | Image GenerationLanguage Modeling | CodeCode Available | 3 | 5 |
| ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features | Feb 6, 2025 | Image SegmentationSegmentation | CodeCode Available | 3 | 5 |
| A Distractor-Aware Memory for Visual Object Tracking with SAM2 | Nov 26, 2024 | Object TrackingSemi-Supervised Video Object Segmentation | CodeCode Available | 3 | 5 |
| TAPIP3D: Tracking Any Point in Persistent 3D Geometry | Apr 20, 2025 | 3D geometryDepth And Camera Motion | CodeCode Available | 3 | 5 |
| CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation | Jan 2, 2024 | | CodeCode Available | 3 | 5 |
| Data Generation for Hardware-Friendly Post-Training Quantization | Oct 29, 2024 | Data AugmentationGPU | CodeCode Available | 3 | 5 |
| LLMmap: Fingerprinting For Large Language Models | Jul 22, 2024 | RAG | CodeCode Available | 3 | 5 |
| SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition | Feb 27, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 | 5 |
| PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | Oct 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback | Mar 25, 2025 | Text to SQLText-To-SQL | CodeCode Available | 3 | 5 |
| MagicPIG: LSH Sampling for Efficient LLM Generation | Oct 21, 2024 | CPUGPU | CodeCode Available | 3 | 5 |
| MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs | Nov 22, 2024 | image-classificationImage Classification | CodeCode Available | 3 | 5 |
| Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task | Sep 6, 2024 | Video Generation | CodeCode Available | 3 | 5 |
| What Language Model to Train if You Have One Million GPU Hours? | Oct 27, 2022 | GPULanguage Modeling | CodeCode Available | 3 | 5 |
| FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model | Oct 17, 2024 | Computational EfficiencyImage Cropping | CodeCode Available | 3 | 5 |
| An Evolved Universal Transformer Memory | Oct 17, 2024 | | CodeCode Available | 3 | 5 |
| Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation | Jun 30, 2024 | AllDeblurring | CodeCode Available | 3 | 5 |
| DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation | Apr 7, 2025 | 3D geometryRGBD Semantic Segmentation | CodeCode Available | 3 | 5 |
| SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation | Apr 15, 2024 | Brain Tumor SegmentationDecoder | CodeCode Available | 3 | 5 |
| CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution | Mar 10, 2025 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 3 | 5 |
| Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels | Feb 21, 2023 | Classification | CodeCode Available | 3 | 5 |
| From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step | May 23, 2024 | GSM8K | CodeCode Available | 3 | 5 |
| Diffusion Feedback Helps CLIP See Better | Jul 29, 2024 | image-classificationImage Classification | CodeCode Available | 3 | 5 |
| HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems | Nov 5, 2024 | HallucinationRAG | CodeCode Available | 3 | 5 |
| CAX: Cellular Automata Accelerated in JAX | Oct 3, 2024 | ARCArtificial Life | CodeCode Available | 3 | 5 |
| Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? | Jun 19, 2024 | RAGRetrieval | CodeCode Available | 3 | 5 |
| Anything-3D: Towards Single-view Anything Reconstruction in the Wild | Apr 19, 2023 | 3D ReconstructionDiversity | CodeCode Available | 3 | 5 |
| Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait | Mar 17, 2025 | Computational EfficiencyDiversity | CodeCode Available | 3 | 5 |
| Simplifying Deep Temporal Difference Learning | Jul 5, 2024 | Q-LearningReinforcement Learning (RL) | CodeCode Available | 3 | 5 |
| GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation | Feb 3, 2025 | Graph Neural NetworkKnowledge Graphs | CodeCode Available | 3 | 5 |
| XAttention: Block Sparse Attention with Antidiagonal Scoring | Mar 20, 2025 | Video GenerationVideo Understanding | CodeCode Available | 3 | 5 |
| 4M: Massively Multimodal Masked Modeling | Dec 11, 2023 | Decoder | CodeCode Available | 3 | 5 |
| Unifying Flow, Stereo and Depth Estimation | Nov 10, 2022 | Depth EstimationOptical Flow Estimation | CodeCode Available | 3 | 5 |
| EgoLife: Towards Egocentric Life Assistant | Mar 5, 2025 | Question AnsweringVideo Understanding | CodeCode Available | 3 | 5 |
| AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback | May 22, 2023 | Instruction Following | CodeCode Available | 3 | 5 |
| Planning with Diffusion for Flexible Behavior Synthesis | May 20, 2022 | Decision MakingDenoising | CodeCode Available | 3 | 5 |
| TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs | Oct 25, 2023 | Autonomous DrivingGPU | CodeCode Available | 3 | 5 |
| MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs | Feb 24, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 3 | 5 |
| C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models | May 15, 2023 | Multiple-choice | CodeCode Available | 3 | 5 |
| BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment | Apr 27, 2021 | Analog Video RestorationSnow Removal | CodeCode Available | 3 | 5 |
| Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding | Feb 14, 2025 | 3D Object Detection3D visual grounding | CodeCode Available | 3 | 5 |
| Data Engineering for Scaling Language Models to 128K Context | Feb 15, 2024 | 4kContinual Pretraining | CodeCode Available | 3 | 5 |
| A Multiscale Visualization of Attention in the Transformer Model | Jun 12, 2019 | | CodeCode Available | 3 | 5 |
| Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping | Feb 21, 2024 | Decision MakingDecoder | CodeCode Available | 3 | 5 |
| RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection | Mar 25, 2024 | 3D Object Detection3D Object Detection (RoI) | CodeCode Available | 3 | 5 |
| Streaming Deep Reinforcement Learning Finally Works | Oct 18, 2024 | Atari GamesDeep Reinforcement Learning | CodeCode Available | 3 | 5 |