| Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models | Feb 9, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| Surprise Potential as a Measure of Interactivity in Driving Scenarios | Feb 8, 2025 | Benchmarking | —Unverified | 0 |
| Mol-MoE: Training Preference-Guided Routers for Molecule Generation | Feb 8, 2025 | BenchmarkingDrug Design | CodeCode Available | 0 |
| Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization | Feb 6, 2025 | BenchmarkingUncertainty Quantification | —Unverified | 0 |
| EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models | Feb 6, 2025 | BenchmarkingEmotional Intelligence | —Unverified | 0 |
| Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs | Feb 6, 2025 | BenchmarkingEpidemiology | CodeCode Available | 0 |
| LUND-PROBE -- LUND Prostate Radiotherapy Open Benchmarking and Evaluation dataset | Feb 6, 2025 | BenchmarkingComputed Tomography (CT) | —Unverified | 0 |
| Verifiable Format Control for Large Language Model Generations | Feb 6, 2025 | BenchmarkingInstruction Following | —Unverified | 0 |
| PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data | Feb 6, 2025 | BenchmarkingTime Series | CodeCode Available | 0 |
| Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated Samples | Feb 6, 2025 | BenchmarkingDeepFake Detection | CodeCode Available | 0 |