| metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models | Jul 4, 2024 | ARCGSM8K | CodeCode Available | 0 |
| The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale | Jun 25, 2024 | ARCLanguage Modeling | CodeCode Available | 1 |
| LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic | Jun 25, 2024 | ARCLogical Reasoning | —Unverified | 0 |
| VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Jun 25, 2024 | ARCBenchmarking | CodeCode Available | 0 |
| PORT: Preference Optimization on Reasoning Traces | Jun 23, 2024 | ARCGSM8K | —Unverified | 0 |
| AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models | Jun 19, 2024 | ARCMixture-of-Experts | CodeCode Available | 1 |
| Circular transformation of the European steel industry renders scrap metal a strategic resource | Jun 17, 2024 | ARC | —Unverified | 0 |
| Promises, Outlooks and Challenges of Diffusion Language Modeling | Jun 17, 2024 | ARCHellaSwag | —Unverified | 0 |
| Cross-Modal Learning for Anomaly Detection in Complex Industrial Process: Methodology and Benchmark | Jun 13, 2024 | Anomaly DetectionARC | CodeCode Available | 1 |
| Regularizing Numerical Extremals Along Singular Arcs: A Lie-Theoretic Approach | Jun 11, 2024 | ARC | —Unverified | 0 |