| Moonshine: Speech Recognition for Live Transcription and Voice Commands | Oct 21, 2024 | DecoderPosition | CodeCode Available | 9 |
| Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold | May 18, 2023 | Image ManipulationPoint Tracking | CodeCode Available | 7 |
| YaRN: Efficient Context Window Extension of Large Language Models | Aug 31, 2023 | Position | CodeCode Available | 6 |
| Extending Context Window of Large Language Models via Positional Interpolation | Jun 27, 2023 | Document SummarizationLanguage Modeling | CodeCode Available | 6 |
| Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models | May 24, 2025 | Position | CodeCode Available | 5 |
| Cosmos World Foundation Model Platform for Physical AI | Jan 7, 2025 | modelPosition | CodeCode Available | 5 |
| LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression | Oct 10, 2023 | Code CompletionFew-Shot Learning | CodeCode Available | 5 |
| MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis | Jul 2, 2024 | AttributeImage Generation | CodeCode Available | 4 |
| GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction | May 27, 2024 | 3D Semantic Occupancy PredictionAutonomous Driving | CodeCode Available | 4 |
| KeyPoint Relative Position Encoding for Face Recognition | Mar 21, 2024 | Face RecognitionGait Recognition | CodeCode Available | 4 |