| Training-Free Exponential Context Extension via Cascading KV Cache | Jun 24, 2024 | Book summarizationComputational Efficiency | CodeCode Available | 0 |
| Void in Language Models | May 20, 2025 | MMLUResponse Generation | CodeCode Available | 0 |
| DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors | May 29, 2025 | MMLUMultiple-choice | CodeCode Available | 0 |
| WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging | Feb 25, 2025 | MMLUMultiple-choice | CodeCode Available | 0 |
| RoToR: Towards More Reliable Responses for Order-Invariant Inputs | Feb 10, 2025 | Graph Question AnsweringMMLU | CodeCode Available | 0 |
| Inconsistencies in Masked Language Models | Dec 30, 2022 | LAMBADAMMLU | CodeCode Available | 0 |
| metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models | Jul 4, 2024 | ARCGSM8K | CodeCode Available | 0 |
| LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning | May 24, 2025 | Computational EfficiencyMMLU | CodeCode Available | 0 |
| OpenGrok: Enhancing SNS Data Processing with Distilled Knowledge and Mask-like Mechanisms | Feb 11, 2025 | Knowledge DistillationMMLU | CodeCode Available | 0 |
| CHAIR -- Classifier of Hallucination as Improver | Jan 5, 2025 | HallucinationMMLU | CodeCode Available | 0 |