| Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Jan 7, 2025 | 2kLanguage Modeling | CodeCode Available | 5 |
| Long-context LLMs Struggle with Long In-context Learning | Apr 2, 2024 | 2kIn-Context Learning | CodeCode Available | 5 |
| Scaling Granite Code Models to 128K Context | Jul 18, 2024 | 2k4k | CodeCode Available | 4 |
| CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets | May 30, 2024 | 2k3D geometry | CodeCode Available | 4 |
| MovieChat+: Question-aware Sparse Memory for Long Video Question Answering | Apr 26, 2024 | 2kQuestion Answering | CodeCode Available | 4 |
| SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection | Mar 11, 2024 | 2D Object Detection2k | CodeCode Available | 4 |
| Highly Accurate Dichotomous Image Segmentation | Mar 6, 2022 | 2k3D Reconstruction | CodeCode Available | 4 |
| FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution | Apr 9, 2025 | 2kDecision Making | CodeCode Available | 3 |
| MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction | Feb 17, 2025 | 2kAutonomous Driving | CodeCode Available | 3 |
| 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data | Aug 7, 2024 | 16k2k | CodeCode Available | 3 |
| Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis | Jun 10, 2024 | 2k3DGS | CodeCode Available | 3 |
| CAMixerSR: Only Details Need More "Attention" | Feb 29, 2024 | 2k8k | CodeCode Available | 3 |
| MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization | Jul 14, 2025 | 2kImage Generation | CodeCode Available | 2 |
| MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization | Jul 10, 2025 | 2kQuantization | CodeCode Available | 2 |
| TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes | Mar 30, 2025 | 2kImage Generation | CodeCode Available | 2 |
| FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning | Mar 30, 2025 | 2kGPU | CodeCode Available | 2 |
| Ultra-Resolution Adaptation with Ease | Mar 20, 2025 | 2k4k | CodeCode Available | 2 |
| Elevating Flow-Guided Video Inpainting with Reference Generation | Dec 12, 2024 | 2kVideo Inpainting | CodeCode Available | 2 |
| VFIMamba: Video Frame Interpolation with State Space Models | Jul 2, 2024 | 2k4k | CodeCode Available | 2 |
| Task Me Anything | Jun 17, 2024 | 2kAttribute | CodeCode Available | 2 |
| Linear Attention Sequence Parallelism | Apr 3, 2024 | 2k | CodeCode Available | 2 |
| AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension | Feb 12, 2024 | 2kAutomatic Speech Recognition | CodeCode Available | 2 |
| STICKERCONV: Generating Multimodal Empathetic Responses from Scratch | Jan 20, 2024 | 2kEmpathetic Response Generation | CodeCode Available | 2 |
| Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models | Dec 21, 2023 | 2k | CodeCode Available | 2 |
| HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models | Dec 21, 2023 | 2kImage Inpainting | CodeCode Available | 2 |