| Inference-Time-Compute: More Faithful? A Research Note | Jan 14, 2025 | AttributeMMLU | —Unverified | 0 |
| CHAIR -- Classifier of Hallucination as Improver | Jan 5, 2025 | HallucinationMMLU | CodeCode Available | 0 |
| Unraveling Indirect In-Context Learning Using Influence Functions | Jan 1, 2025 | In-Context LearningInformativeness | —Unverified | 0 |
| Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation | Dec 31, 2024 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs | Dec 31, 2024 | Conformal PredictionDecision Making | —Unverified | 0 |
| SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity | Dec 30, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design | Dec 19, 2024 | MMLUQuantization | —Unverified | 0 |
| MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark | Dec 19, 2024 | MMLUMultiple-choice | CodeCode Available | 2 |
| ORBIT: Cost-Effective Dataset Curation for Large Language Model Domain Adaptation with an Astronomy Case Study | Dec 19, 2024 | AstronomyDomain Adaptation | CodeCode Available | 0 |
| ChainRank-DPO: Chain Rank Direct Preference Optimization for LLM Rankers | Dec 18, 2024 | MMLUReranking | —Unverified | 0 |