| Long-context LLMs Struggle with Long In-context Learning | Apr 2, 2024 | 2kIn-Context Learning | CodeCode Available | 5 |
| Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Jan 7, 2025 | 2kLanguage Modeling | CodeCode Available | 5 |
| MovieChat+: Question-aware Sparse Memory for Long Video Question Answering | Apr 26, 2024 | 2kQuestion Answering | CodeCode Available | 4 |
| CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets | May 30, 2024 | 2k3D geometry | CodeCode Available | 4 |
| Highly Accurate Dichotomous Image Segmentation | Mar 6, 2022 | 2k3D Reconstruction | CodeCode Available | 4 |
| SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection | Mar 11, 2024 | 2D Object Detection2k | CodeCode Available | 4 |
| Scaling Granite Code Models to 128K Context | Jul 18, 2024 | 2k4k | CodeCode Available | 4 |
| Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis | Jun 10, 2024 | 2k3DGS | CodeCode Available | 3 |
| CAMixerSR: Only Details Need More "Attention" | Feb 29, 2024 | 2k8k | CodeCode Available | 3 |
| FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution | Apr 9, 2025 | 2kDecision Making | CodeCode Available | 3 |
| 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data | Aug 7, 2024 | 16k2k | CodeCode Available | 3 |
| MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction | Feb 17, 2025 | 2kAutonomous Driving | CodeCode Available | 3 |
| AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension | Feb 12, 2024 | 2kAutomatic Speech Recognition | CodeCode Available | 2 |
| Linear Attention Sequence Parallelism | Apr 3, 2024 | 2k | CodeCode Available | 2 |
| Task Me Anything | Jun 17, 2024 | 2kAttribute | CodeCode Available | 2 |
| Elevating Flow-Guided Video Inpainting with Reference Generation | Dec 12, 2024 | 2kVideo Inpainting | CodeCode Available | 2 |
| HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models | Dec 21, 2023 | 2kImage Inpainting | CodeCode Available | 2 |
| GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis | Dec 4, 2023 | 2kDepth Estimation | CodeCode Available | 2 |
| STICKERCONV: Generating Multimodal Empathetic Responses from Scratch | Jan 20, 2024 | 2kEmpathetic Response Generation | CodeCode Available | 2 |
| FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset | Mar 26, 2022 | 2k3D Face Reconstruction | CodeCode Available | 2 |
| HHAvatar: Gaussian Head Avatar with Dynamic Hairs | Dec 5, 2023 | 2k | CodeCode Available | 2 |
| XGen-7B Technical Report | Sep 7, 2023 | 2k8k | CodeCode Available | 2 |
| 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation | Jan 1, 2022 | 2kDepth Estimation | CodeCode Available | 2 |
| VFIMamba: Video Frame Interpolation with State Space Models | Jul 2, 2024 | 2k4k | CodeCode Available | 2 |
| Hyena Hierarchy: Towards Larger Convolutional Language Models | Feb 21, 2023 | 2k8k | CodeCode Available | 2 |
| Any-resolution Training for High-resolution Image Synthesis | Apr 14, 2022 | 2kImage Generation | CodeCode Available | 2 |
| High-fidelity 3D Human Digitization from Single 2K Resolution Images | Mar 27, 2023 | 2k3D Human Reconstruction | CodeCode Available | 2 |
| TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes | Mar 30, 2025 | 2kImage Generation | CodeCode Available | 2 |
| Towards Metrical Reconstruction of Human Faces | Apr 13, 2022 | 2k3D Face Reconstruction | CodeCode Available | 2 |
| PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training | Sep 19, 2023 | 2kPosition | CodeCode Available | 2 |
| MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization | Jul 14, 2025 | 2kImage Generation | CodeCode Available | 2 |
| MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization | Jul 10, 2025 | 2kQuantization | CodeCode Available | 2 |
| RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars | May 22, 2023 | 2kImage Matting | CodeCode Available | 2 |
| Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models | Dec 21, 2023 | 2k | CodeCode Available | 2 |
| Ultra-Resolution Adaptation with Ease | Mar 20, 2025 | 2k4k | CodeCode Available | 2 |
| FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning | Mar 30, 2025 | 2kGPU | CodeCode Available | 2 |
| LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models | Aug 30, 2023 | 2k4k | CodeCode Available | 1 |
| MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes | May 18, 2022 | 2kCPU | CodeCode Available | 1 |
| How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs | Oct 24, 2024 | 2kMachine Translation | CodeCode Available | 1 |
| Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking | Jul 13, 2018 | 2kEntity Linking | CodeCode Available | 1 |
| 360MonoDepth: High-Resolution 360° Monocular Depth Estimation | Nov 30, 2021 | 2kDepth Estimation | CodeCode Available | 1 |
| Identifying concept libraries from language about object structure | May 11, 2022 | 2kMachine Translation | CodeCode Available | 1 |
| Meticulous Object Segmentation | Dec 13, 2020 | 2k4k | CodeCode Available | 1 |
| ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic | Mar 6, 2021 | 2k8k | CodeCode Available | 1 |
| Gated Linear Attention Transformers with Hardware-Efficient Training | Dec 11, 2023 | 2kLanguage Modeling | CodeCode Available | 1 |
| CascadeV: An Implementation of Wurstchen Architecture for Video Generation | Jan 28, 2025 | 2kVideo Generation | CodeCode Available | 1 |
| ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression | Dec 4, 2024 | 2kLogical Reasoning | CodeCode Available | 1 |
| CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input | Apr 13, 2023 | 2k4k | CodeCode Available | 1 |
| BuildingNet: Learning to Label 3D Buildings | Oct 11, 2021 | 2k3D Building Mesh Labeling | CodeCode Available | 1 |
| End-to-End Speech Recognition from Federated Acoustic Models | Apr 29, 2021 | 2k4k | CodeCode Available | 1 |