| Moonshine: Speech Recognition for Live Transcription and Voice Commands | Oct 21, 2024 | DecoderPosition | CodeCode Available | 9 |
| Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold | May 18, 2023 | Image ManipulationPoint Tracking | CodeCode Available | 7 |
| Extending Context Window of Large Language Models via Positional Interpolation | Jun 27, 2023 | Document SummarizationLanguage Modeling | CodeCode Available | 6 |
| YaRN: Efficient Context Window Extension of Large Language Models | Aug 31, 2023 | Position | CodeCode Available | 6 |
| LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression | Oct 10, 2023 | Code CompletionFew-Shot Learning | CodeCode Available | 5 |
| Cosmos World Foundation Model Platform for Physical AI | Jan 7, 2025 | modelPosition | CodeCode Available | 5 |
| Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models | May 24, 2025 | Position | CodeCode Available | 5 |
| Desiderata for next generation of ML model serving | Oct 26, 2022 | modelPosition | CodeCode Available | 4 |
| KeyPoint Relative Position Encoding for Face Recognition | Mar 21, 2024 | Face RecognitionGait Recognition | CodeCode Available | 4 |
| Programming Is Hard -- Or at Least It Used to Be: Educational Opportunities And Challenges of AI Code Generation | Dec 2, 2022 | Code GenerationPosition | CodeCode Available | 4 |
| GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction | May 27, 2024 | 3D Semantic Occupancy PredictionAutonomous Driving | CodeCode Available | 4 |
| MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis | Jul 2, 2024 | AttributeImage Generation | CodeCode Available | 4 |
| VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | Feb 27, 2024 | Contrastive LearningMedical Image Analysis | CodeCode Available | 3 |
| VideoRoPE: What Makes for Good Video Rotary Position Embedding? | Feb 7, 2025 | HallucinationPosition | CodeCode Available | 3 |
| When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training | Nov 20, 2024 | Computational EfficiencyPosition | CodeCode Available | 3 |
| Scaling Diffusion Transformers to 16 Billion Parameters | Jul 16, 2024 | AttributeConditional Image Generation | CodeCode Available | 3 |
| RoFormer: Enhanced Transformer with Rotary Position Embedding | Apr 20, 2021 | PositionSemantic Text Matching | CodeCode Available | 3 |
| Rotary Position Embedding for Vision Transformer | Mar 20, 2024 | Position | CodeCode Available | 3 |
| Transformers Can Do Arithmetic with the Right Embeddings | May 27, 2024 | GPUPosition | CodeCode Available | 3 |
| PETR: Position Embedding Transformation for Multi-View 3D Object Detection | Mar 10, 2022 | 3D Object DetectionObject | CodeCode Available | 3 |
| PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images | Jun 2, 2022 | 3D Lane Detection3D Object Detection | CodeCode Available | 3 |
| ElasTST: Towards Robust Varied-Horizon Forecasting with Elastic Time-Series Transformer | Nov 4, 2024 | PositionTime Series | CodeCode Available | 3 |
| Position: Graph Foundation Models are Already Here | Feb 3, 2024 | Position | CodeCode Available | 3 |
| Relation DETR: Exploring Explicit Position Relation Prior for Object Detection | Jul 16, 2024 | 2D Object Detectionobject-detection | CodeCode Available | 3 |
| FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization | Apr 21, 2024 | Anomaly DetectionPosition | CodeCode Available | 2 |