| MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation | Mar 13, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| Tempest: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search | Mar 13, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge | Mar 12, 2025 | CPUGPU | —Unverified | 0 |
| PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs | Mar 12, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Reinforcement Learning is all You Need | Mar 12, 2025 | AllLanguage Modeling | —Unverified | 0 |
| Membership Inference Attacks fueled by Few-Short Learning to detect privacy leakage tackling data integrity | Mar 12, 2025 | Deep LearningFew-Shot Learning | —Unverified | 0 |
| NVP-HRI: Zero Shot Natural Voice and Posture-based Human-Robot Interaction via Large Language Model | Mar 12, 2025 | HallucinationLanguage Modeling | CodeCode Available | 0 |
| SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability | Mar 12, 2025 | DisentanglementLanguage Modeling | —Unverified | 0 |
| Language-Enhanced Representation Learning for Single-Cell Transcriptomics | Mar 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Why LLMs Cannot Think and How to Fix It | Mar 12, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |