| IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations | Apr 1, 2024 | BenchmarkingMath | —Unverified | 0 |
| Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models | Apr 1, 2024 | In-Context LearningMath | CodeCode Available | 0 |
| What is in Your Safe Data? Identifying Benign Data that Breaks Safety | Apr 1, 2024 | Math | CodeCode Available | 1 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| ML2SC: Deploying Machine Learning Models as Smart Contracts on the Blockchain | Mar 28, 2024 | Math | —Unverified | 0 |
| Large Language Models Are Struggle to Cope with Unreasonability in Math Problems | Mar 28, 2024 | Math | —Unverified | 0 |
| Scaling up ridge regression for brain encoding in a massive individual fMRI dataset | Mar 28, 2024 | CPUMath | CodeCode Available | 0 |
| Few-Shot Recalibration of Language Models | Mar 27, 2024 | MathMMLU | —Unverified | 0 |
| The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian | Mar 27, 2024 | Language ModellingMath | —Unverified | 0 |
| Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization | Mar 26, 2024 | Automated Theorem ProvingGSM8K | CodeCode Available | 1 |