| Moonshine: Speech Recognition for Live Transcription and Voice Commands | Oct 21, 2024 | DecoderPosition | CodeCode Available | 9 |
| Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold | May 18, 2023 | Image ManipulationPoint Tracking | CodeCode Available | 7 |
| YaRN: Efficient Context Window Extension of Large Language Models | Aug 31, 2023 | Position | CodeCode Available | 6 |
| Extending Context Window of Large Language Models via Positional Interpolation | Jun 27, 2023 | Document SummarizationLanguage Modeling | CodeCode Available | 6 |
| Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models | May 24, 2025 | Position | CodeCode Available | 5 |
| Cosmos World Foundation Model Platform for Physical AI | Jan 7, 2025 | modelPosition | CodeCode Available | 5 |
| LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression | Oct 10, 2023 | Code CompletionFew-Shot Learning | CodeCode Available | 5 |
| MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis | Jul 2, 2024 | AttributeImage Generation | CodeCode Available | 4 |
| GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction | May 27, 2024 | 3D Semantic Occupancy PredictionAutonomous Driving | CodeCode Available | 4 |
| KeyPoint Relative Position Encoding for Face Recognition | Mar 21, 2024 | Face RecognitionGait Recognition | CodeCode Available | 4 |
| Programming Is Hard -- Or at Least It Used to Be: Educational Opportunities And Challenges of AI Code Generation | Dec 2, 2022 | Code GenerationPosition | CodeCode Available | 4 |
| Desiderata for next generation of ML model serving | Oct 26, 2022 | modelPosition | CodeCode Available | 4 |
| VideoRoPE: What Makes for Good Video Rotary Position Embedding? | Feb 7, 2025 | HallucinationPosition | CodeCode Available | 3 |
| When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training | Nov 20, 2024 | Computational EfficiencyPosition | CodeCode Available | 3 |
| ElasTST: Towards Robust Varied-Horizon Forecasting with Elastic Time-Series Transformer | Nov 4, 2024 | PositionTime Series | CodeCode Available | 3 |
| Relation DETR: Exploring Explicit Position Relation Prior for Object Detection | Jul 16, 2024 | 2D Object Detectionobject-detection | CodeCode Available | 3 |
| Scaling Diffusion Transformers to 16 Billion Parameters | Jul 16, 2024 | AttributeConditional Image Generation | CodeCode Available | 3 |
| Transformers Can Do Arithmetic with the Right Embeddings | May 27, 2024 | GPUPosition | CodeCode Available | 3 |
| Rotary Position Embedding for Vision Transformer | Mar 20, 2024 | Position | CodeCode Available | 3 |
| VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | Feb 27, 2024 | Contrastive LearningMedical Image Analysis | CodeCode Available | 3 |
| Position: Graph Foundation Models are Already Here | Feb 3, 2024 | Position | CodeCode Available | 3 |
| PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images | Jun 2, 2022 | 3D Lane Detection3D Object Detection | CodeCode Available | 3 |
| PETR: Position Embedding Transformation for Multi-View 3D Object Detection | Mar 10, 2022 | 3D Object DetectionObject | CodeCode Available | 3 |
| RoFormer: Enhanced Transformer with Rotary Position Embedding | Apr 20, 2021 | PositionSemantic Text Matching | CodeCode Available | 3 |
| Shifting AI Efficiency From Model-Centric to Data-Centric Compression | May 25, 2025 | Position | CodeCode Available | 2 |
| Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs | Apr 17, 2025 | Position | CodeCode Available | 2 |
| Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images | Mar 21, 2025 | Image SegmentationMamba | CodeCode Available | 2 |
| An Approach for Air Drawing Using Background Subtraction and Contour Extraction | Mar 3, 2025 | Hand DetectionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization | Jan 17, 2025 | Anomaly DetectionImage-text matching | CodeCode Available | 2 |
| Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization | Dec 23, 2024 | Position | CodeCode Available | 2 |
| V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding | Dec 12, 2024 | Position | CodeCode Available | 2 |
| Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View Synthesis | Nov 6, 2024 | 3DGSNeRF | CodeCode Available | 2 |
| TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention | Oct 7, 2024 | Position | CodeCode Available | 2 |
| 1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024 | Sep 28, 2024 | Position | CodeCode Available | 2 |
| PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders | Aug 16, 2024 | 3D Object Classification3D Point Cloud Classification | CodeCode Available | 2 |
| OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection | Jul 15, 2024 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 |
| PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration | Jul 14, 2024 | Inductive BiasPoint Cloud Registration | CodeCode Available | 2 |
| Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training | Jul 12, 2024 | Position | CodeCode Available | 2 |
| PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer | Jul 10, 2024 | DecoderHandwritten Mathmatical Expression Recognition | CodeCode Available | 2 |
| PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance | Jun 13, 2024 | Motion GenerationPosition | CodeCode Available | 2 |
| GLACE: Global Local Accelerated Coordinate Encoding | Jun 6, 2024 | Camera Pose EstimationPose Estimation | CodeCode Available | 2 |
| GSGAN: Adversarial Learning for Hierarchical Generation of 3D Gaussian Splats | Jun 5, 2024 | 3D-Aware Image Synthesis3D Generation | CodeCode Available | 2 |
| Position: Foundation Agents as the Paradigm Shift for Decision Making | May 27, 2024 | Decision MakingPosition | CodeCode Available | 2 |
| CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario | May 6, 2024 | PositionPrediction | CodeCode Available | 2 |
| Commonsense Prototype for Outdoor Unsupervised 3D Object Detection | Apr 25, 2024 | 3D Object DetectionObject | CodeCode Available | 2 |
| FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization | Apr 21, 2024 | Anomaly DetectionPosition | CodeCode Available | 2 |
| LongEmbed: Extending Embedding Models for Long Context Retrieval | Apr 18, 2024 | 4k8k | CodeCode Available | 2 |
| An End-to-End Structure with Novel Position Mechanism and Improved EMD for Stock Forecasting | Mar 25, 2024 | PositionTime Series | CodeCode Available | 2 |
| Counting-Stars: A Multi-evidence, Position-aware, and Scalable Benchmark for Evaluating Long-Context Large Language Models | Mar 18, 2024 | 4kPosition | CodeCode Available | 2 |
| ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents | Feb 21, 2024 | Active LearningPosition | CodeCode Available | 2 |