| Moonshine: Speech Recognition for Live Transcription and Voice Commands | Oct 21, 2024 | DecoderPosition | CodeCode Available | 9 | 5 |
| Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold | May 18, 2023 | Image ManipulationPoint Tracking | CodeCode Available | 7 | 5 |
| Extending Context Window of Large Language Models via Positional Interpolation | Jun 27, 2023 | Document SummarizationLanguage Modeling | CodeCode Available | 6 | 5 |
| YaRN: Efficient Context Window Extension of Large Language Models | Aug 31, 2023 | Position | CodeCode Available | 6 | 5 |
| Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models | May 24, 2025 | Position | CodeCode Available | 5 | 5 |
| Cosmos World Foundation Model Platform for Physical AI | Jan 7, 2025 | modelPosition | CodeCode Available | 5 | 5 |
| LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression | Oct 10, 2023 | Code CompletionFew-Shot Learning | CodeCode Available | 5 | 5 |
| Desiderata for next generation of ML model serving | Oct 26, 2022 | modelPosition | CodeCode Available | 4 | 5 |
| MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis | Jul 2, 2024 | AttributeImage Generation | CodeCode Available | 4 | 5 |
| KeyPoint Relative Position Encoding for Face Recognition | Mar 21, 2024 | Face RecognitionGait Recognition | CodeCode Available | 4 | 5 |
| Programming Is Hard -- Or at Least It Used to Be: Educational Opportunities And Challenges of AI Code Generation | Dec 2, 2022 | Code GenerationPosition | CodeCode Available | 4 | 5 |
| GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction | May 27, 2024 | 3D Semantic Occupancy PredictionAutonomous Driving | CodeCode Available | 4 | 5 |
| PETR: Position Embedding Transformation for Multi-View 3D Object Detection | Mar 10, 2022 | 3D Object DetectionObject | CodeCode Available | 3 | 5 |
| PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images | Jun 2, 2022 | 3D Lane Detection3D Object Detection | CodeCode Available | 3 | 5 |
| ElasTST: Towards Robust Varied-Horizon Forecasting with Elastic Time-Series Transformer | Nov 4, 2024 | PositionTime Series | CodeCode Available | 3 | 5 |
| When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training | Nov 20, 2024 | Computational EfficiencyPosition | CodeCode Available | 3 | 5 |
| Position: Graph Foundation Models are Already Here | Feb 3, 2024 | Position | CodeCode Available | 3 | 5 |
| Transformers Can Do Arithmetic with the Right Embeddings | May 27, 2024 | GPUPosition | CodeCode Available | 3 | 5 |
| Rotary Position Embedding for Vision Transformer | Mar 20, 2024 | Position | CodeCode Available | 3 | 5 |
| VideoRoPE: What Makes for Good Video Rotary Position Embedding? | Feb 7, 2025 | HallucinationPosition | CodeCode Available | 3 | 5 |
| Relation DETR: Exploring Explicit Position Relation Prior for Object Detection | Jul 16, 2024 | 2D Object Detectionobject-detection | CodeCode Available | 3 | 5 |
| RoFormer: Enhanced Transformer with Rotary Position Embedding | Apr 20, 2021 | PositionSemantic Text Matching | CodeCode Available | 3 | 5 |
| VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | Feb 27, 2024 | Contrastive LearningMedical Image Analysis | CodeCode Available | 3 | 5 |
| Scaling Diffusion Transformers to 16 Billion Parameters | Jul 16, 2024 | AttributeConditional Image Generation | CodeCode Available | 3 | 5 |
| Point Transformer V2: Grouped Vector Attention and Partition-based Pooling | Oct 11, 2022 | 3D Point Cloud Classification3D Semantic Segmentation | CodeCode Available | 2 | 5 |
| PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders | Aug 16, 2024 | 3D Object Classification3D Point Cloud Classification | CodeCode Available | 2 | 5 |
| PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration | Jul 14, 2024 | Inductive BiasPoint Cloud Registration | CodeCode Available | 2 | 5 |
| PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training | Sep 19, 2023 | 2kPosition | CodeCode Available | 2 | 5 |
| MPNet: Masked and Permuted Pre-training for Language Understanding | Apr 20, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance | Jun 13, 2024 | Motion GenerationPosition | CodeCode Available | 2 | 5 |
| A Length-Extrapolatable Transformer | Dec 20, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training | Nov 15, 2023 | Passage RetrievalPosition | CodeCode Available | 2 | 5 |
| Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation | Mar 17, 2020 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| Lost in the Middle: How Language Models Use Long Contexts | Jul 6, 2023 | Language ModellingPosition | CodeCode Available | 2 | 5 |
| Mega: Moving Average Equipped Gated Attention | Sep 21, 2022 | Image ClassificationInductive Bias | CodeCode Available | 2 | 5 |
| OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection | Jul 15, 2024 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 | 5 |
| PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer | Jul 10, 2024 | DecoderHandwritten Mathmatical Expression Recognition | CodeCode Available | 2 | 5 |
| How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning | Feb 5, 2024 | In-Context LearningMetric Learning | CodeCode Available | 2 | 5 |
| GSGAN: Adversarial Learning for Hierarchical Generation of 3D Gaussian Splats | Jun 5, 2024 | 3D-Aware Image Synthesis3D Generation | CodeCode Available | 2 | 5 |
| LayoutDM: Discrete Diffusion Model for Controllable Layout Generation | Mar 14, 2023 | Layout Generationmodel | CodeCode Available | 2 | 5 |
| FLAT: Chinese NER Using Flat-Lattice Transformer | Apr 24, 2020 | Chinese Named Entity Recognitionnamed-entity-recognition | CodeCode Available | 2 | 5 |
| FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization | Apr 21, 2024 | Anomaly DetectionPosition | CodeCode Available | 2 | 5 |
| Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization | Dec 23, 2024 | Position | CodeCode Available | 2 | 5 |
| GLACE: Global Local Accelerated Coordinate Encoding | Jun 6, 2024 | Camera Pose EstimationPose Estimation | CodeCode Available | 2 | 5 |
| Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster | Nov 14, 2023 | GPUPosition | CodeCode Available | 2 | 5 |
| CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow | Nov 18, 2022 | Optical Flow EstimationPosition | CodeCode Available | 2 | 5 |
| Extending LLMs' Context Window with 100 Samples | Jan 13, 2024 | Position | CodeCode Available | 2 | 5 |
| Machine Learning in Asset Management—Part 1: Portfolio Construction—Trading Strategies | Feb 10, 2020 | Algorithmic TradingAsset Management | CodeCode Available | 2 | 5 |
| FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization | Jan 17, 2025 | Anomaly DetectionImage-text matching | CodeCode Available | 2 | 5 |
| LongEmbed: Extending Embedding Models for Long Context Retrieval | Apr 18, 2024 | 4k8k | CodeCode Available | 2 | 5 |