| Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate | Jul 8, 2025 | Continual LearningMixture-of-Experts | CodeCode Available | 0 | 5 |
| Training-Free Exponential Context Extension via Cascading KV Cache | Jun 24, 2024 | Book summarizationComputational Efficiency | CodeCode Available | 0 | 5 |
| The Price of Format: Diversity Collapse in LLMs | May 25, 2025 | DiversityGSM8K | CodeCode Available | 0 | 5 |
| Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs | Jul 15, 2025 | DiversityMMLU | CodeCode Available | 0 | 5 |
| metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models | Jul 4, 2024 | ARCGSM8K | CodeCode Available | 0 | 5 |
| TODO: Enhancing LLM Alignment with Ternary Preferences | Nov 2, 2024 | ARCMMLU | CodeCode Available | 0 | 5 |
| Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon | Feb 11, 2025 | MMLU | CodeCode Available | 0 | 5 |
| Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy | Jan 20, 2025 | MMLU | CodeCode Available | 0 | 5 |
| Evaluation of Large Language Models via Coupled Token Generation | Feb 3, 2025 | ChatbotLarge Language Model | CodeCode Available | 0 | 5 |
| SHA256 at SemEval-2025 Task 4: Selective Amnesia -- Constrained Unlearning for Large Language Models via Knowledge Isolation | Apr 17, 2025 | AttributeMachine Unlearning | CodeCode Available | 0 | 5 |
| ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation | Jun 16, 2024 | Continual LearningGSM8K | CodeCode Available | 0 | 5 |
| Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation | May 30, 2025 | Continual PretrainingFairness | CodeCode Available | 0 | 5 |
| CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions | Oct 4, 2024 | Instruction FollowingMMLU | CodeCode Available | 0 | 5 |
| Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations | Aug 27, 2023 | Instruction FollowingMMLU | CodeCode Available | 0 | 5 |
| RoToR: Towards More Reliable Responses for Order-Invariant Inputs | Feb 10, 2025 | Graph Question AnsweringMMLU | CodeCode Available | 0 | 5 |
| EmPO: Emotion Grounding for Empathetic Response Generation through Preference Optimization | Jun 27, 2024 | DiversityEmpathetic Response Generation | CodeCode Available | 0 | 5 |
| Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations | Jul 7, 2025 | AttributeMMLU | CodeCode Available | 0 | 5 |
| Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models | Sep 27, 2023 | HumanEvalLanguage Modeling | CodeCode Available | 0 | 5 |
| Void in Language Models | May 20, 2025 | MMLUResponse Generation | CodeCode Available | 0 | 5 |
| ChatBench: From Static Benchmarks to Human-AI Evaluation | Mar 22, 2025 | MathMMLU | CodeCode Available | 0 | 5 |
| QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-Tuning | Feb 3, 2025 | Data ValuationLanguage Modeling | CodeCode Available | 0 | 5 |
| CHAIR -- Classifier of Hallucination as Improver | Jan 5, 2025 | HallucinationMMLU | CodeCode Available | 0 | 5 |
| Effective Skill Unlearning through Intervention and Abstention | Mar 27, 2025 | General KnowledgeMath | CodeCode Available | 0 | 5 |
| ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling | Feb 21, 2024 | MMLURetrieval | CodeCode Available | 0 | 5 |
| Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective | Feb 20, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| Capability-Based Scaling Laws for LLM Red-Teaming | May 26, 2025 | MMLUPrompt Engineering | CodeCode Available | 0 | 5 |
| DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors | May 29, 2025 | MMLUMultiple-choice | CodeCode Available | 0 | 5 |
| Post-Hoc Reversal: Are We Selecting Models Prematurely? | Apr 11, 2024 | Language ModellingMMLU | CodeCode Available | 0 | 5 |
| Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function | Jun 3, 2024 | DiversityMMLU | CodeCode Available | 0 | 5 |
| Review-Instruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models | May 16, 2025 | DiversityMMLU | CodeCode Available | 0 | 5 |
| Probing then Editing Response Personality of Large Language Models | Apr 14, 2025 | MMLU | CodeCode Available | 0 | 5 |
| Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs | May 31, 2025 | MMLU | CodeCode Available | 0 | 5 |
| OpenGrok: Enhancing SNS Data Processing with Distilled Knowledge and Mask-like Mechanisms | Feb 11, 2025 | Knowledge DistillationMMLU | CodeCode Available | 0 | 5 |
| BnMMLU: Measuring Massive Multitask Language Understanding in Bengali | May 25, 2025 | General KnowledgeMMLU | CodeCode Available | 0 | 5 |
| ORBIT: Cost-Effective Dataset Curation for Large Language Model Domain Adaptation with an Astronomy Case Study | Dec 19, 2024 | AstronomyDomain Adaptation | CodeCode Available | 0 | 5 |
| Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning | Oct 14, 2024 | In-Context LearningMMLU | CodeCode Available | 0 | 5 |
| MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs | Sep 3, 2024 | MMLU | CodeCode Available | 0 | 5 |
| Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models | Dec 2, 2024 | MMLUMultiple-choice | CodeCode Available | 0 | 5 |
| DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance | Jan 29, 2025 | DiversityMMLU | CodeCode Available | 0 | 5 |
| Inconsistencies in Masked Language Models | Dec 30, 2022 | LAMBADAMMLU | CodeCode Available | 0 | 5 |
| LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning | May 24, 2025 | Computational EfficiencyMMLU | CodeCode Available | 0 | 5 |
| LM-Cocktail: Resilient Tuning of Language Models via Model Merging | Nov 22, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Instruction Tuning with Human Curriculum | Oct 14, 2023 | ARCMMLU | CodeCode Available | 0 | 5 |
| BenTo: Benchmark Task Reduction with In-Context Transferability | Oct 17, 2024 | In-Context LearningMMLU | CodeCode Available | 0 | 5 |
| Input Conditioned Graph Generation for Language Agents | Jun 17, 2024 | Graph GenerationMMLU | CodeCode Available | 0 | 5 |
| Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation | Jun 20, 2024 | GSM8KLanguage Model Evaluation | CodeCode Available | 0 | 5 |
| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 | 5 |
| Llama 3 Meets MoE: Efficient Upcycling | Dec 13, 2024 | Mixture-of-ExpertsMMLU | CodeCode Available | 0 | 5 |
| LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient | Feb 2, 2025 | MMLU | CodeCode Available | 0 | 5 |
| An Empirical Study of Mamba-based Language Models | Jun 12, 2024 | 16kIn-Context Learning | CodeCode Available | 0 | 5 |