| DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | Jun 17, 2024 | 16kLanguage Modeling | CodeCode Available | 9 |
| Global Structure-from-Motion Revisited | Jul 29, 2024 | 16k | CodeCode Available | 7 |
| Code Llama: Open Foundation Models for Code | Aug 24, 2023 | 16kCode Generation | CodeCode Available | 6 |
| FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | May 27, 2022 | 16k4k | CodeCode Available | 6 |
| Learning to (Learn at Test Time): RNNs with Expressive Hidden States | Jul 5, 2024 | 16k8k | CodeCode Available | 5 |
| Long-form factuality in large language models | Mar 27, 2024 | 16kForm | CodeCode Available | 4 |
| FlashDMoE: Fast Distributed MoE in a Single Kernel | Jun 5, 2025 | 16kCPU | CodeCode Available | 3 |
| M+: Extending MemoryLLM with Scalable Long-Term Memory | Feb 1, 2025 | 16kGPU | CodeCode Available | 3 |
| SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation | Oct 4, 2024 | 16kCode Generation | CodeCode Available | 3 |
| LinFusion: 1 GPU, 1 Minute, 16K Image | Sep 3, 2024 | 16kCausal Inference | CodeCode Available | 3 |
| 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data | Aug 7, 2024 | 16k2k | CodeCode Available | 3 |
| Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset | May 17, 2024 | 16kBenchmarking | CodeCode Available | 3 |
| SnapKV: LLM Knows What You are Looking for Before Generation | Apr 22, 2024 | 16kGPU | CodeCode Available | 3 |
| Training-Free Long-Context Scaling of Large Language Models | Feb 27, 2024 | 16k | CodeCode Available | 3 |
| LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding | Aug 28, 2023 | 16kCode Completion | CodeCode Available | 3 |
| Investigating Efficiently Extending Transformers for Long Input Summarization | Aug 8, 2022 | 16kLong-range modeling | CodeCode Available | 3 |
| UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents | May 27, 2025 | 16k | CodeCode Available | 2 |
| Training Long-Context LLMs Efficiently via Chunk-wise Optimization | May 22, 2025 | 16kGPU | CodeCode Available | 2 |
| Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key | Jan 16, 2025 | 16kHallucination | CodeCode Available | 2 |
| LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K | Feb 6, 2024 | 16kBenchmarking | CodeCode Available | 2 |
| Giraffe: Adventures in Expanding Context Lengths in LLMs | Aug 21, 2023 | 16k4k | CodeCode Available | 2 |
| LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding | Jun 29, 2023 | 16kImage Captioning | CodeCode Available | 2 |
| MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention | May 24, 2025 | 16k4k | CodeCode Available | 1 |
| Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs | Feb 4, 2025 | 16kDescriptive | CodeCode Available | 1 |
| Denial-of-Service Poisoning Attacks against Large Language Models | Oct 14, 2024 | 16kSpeech-to-Text | CodeCode Available | 1 |
| Neural Fourier Modelling: A Highly Compact Approach to Time-Series Analysis | Oct 7, 2024 | 16kAnomaly Detection | CodeCode Available | 1 |
| LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs | Sep 3, 2024 | 16kBenchmarking | CodeCode Available | 1 |
| SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images | Jul 16, 2024 | 16k | CodeCode Available | 1 |
| LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors | Jun 20, 2024 | 16kInstruction Following | CodeCode Available | 1 |
| NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention | Mar 2, 2024 | 16kCPU | CodeCode Available | 1 |
| Hydragen: High-Throughput LLM Inference with Shared Prefixes | Feb 7, 2024 | 16kChatbot | CodeCode Available | 1 |
| Analyzing the Effectiveness of Large Language Models on Text-to-SQL Synthesis | Jan 22, 2024 | 16kProgram Synthesis | CodeCode Available | 1 |
| Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers | Oct 16, 2023 | 16kHallucination | CodeCode Available | 1 |
| Scaling Laws of RoPE-based Extrapolation | Oct 8, 2023 | 16k | CodeCode Available | 1 |
| Home Electricity Data Generator (HEDGE): An open-access tool for the generation of electric vehicle, residential demand, and PV generation profiles | Oct 2, 2023 | 16k | CodeCode Available | 1 |
| Detecting and Preventing Hallucinations in Large Vision Language Models | Aug 11, 2023 | 16kHallucination | CodeCode Available | 1 |
| The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks | Jun 14, 2023 | 16kClassification | CodeCode Available | 1 |
| Faster Causal Attention Over Large Sequences Through Sparse Flash Attention | Jun 1, 2023 | 16k8k | CodeCode Available | 1 |
| In-Context Learning with Many Demonstration Examples | Feb 9, 2023 | 16k8k | CodeCode Available | 1 |
| An In-Depth Exploration of Person Re-Identification and Gait Recognition in Cloth-Changing Conditions | Jan 1, 2023 | 16kGait Recognition | CodeCode Available | 1 |
| CIRCLe: Color Invariant Representation Learning for Unbiased Classification of Skin Lesions | Aug 29, 2022 | 16kFairness | CodeCode Available | 1 |
| There’s a Time and Place for Reasoning Beyond the Image | May 1, 2022 | 16kArticles | CodeCode Available | 1 |
| Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction | Mar 24, 2022 | 16kData Augmentation | CodeCode Available | 1 |
| There is a Time and Place for Reasoning Beyond the Image | Mar 1, 2022 | 16kArticles | CodeCode Available | 1 |
| MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale | Nov 30, 2021 | 16kImage Classification | CodeCode Available | 1 |
| Complex Temporal Question Answering on Knowledge Graphs | Sep 18, 2021 | 16kEntity Embeddings | CodeCode Available | 1 |
| DeepDarts: Modeling Keypoints as Objects for Automatic Scorekeeping in Darts using a Single Camera | May 20, 2021 | 16kData Augmentation | CodeCode Available | 1 |
| BNLP: Natural language processing toolkit for Bengali language | Jan 31, 2021 | 16kNER | CodeCode Available | 1 |
| Long Range Arena: A Benchmark for Efficient Transformers | Nov 8, 2020 | 16kBenchmarking | CodeCode Available | 1 |
| COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval | Oct 24, 2020 | 16kRetrieval | CodeCode Available | 1 |