| 4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding | Mar 22, 2025 | BenchmarkingObject | CodeCode Available | 0 |
| Benchmark Dataset for Pore-Scale CO2-Water Interaction | Mar 22, 2025 | Benchmarking | —Unverified | 0 |
| CausalRivers -- Scaling up benchmarking of causal discovery for real-world time-series | Mar 21, 2025 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer | Mar 21, 2025 | BenchmarkingVideo Generation | CodeCode Available | 2 |
| ContextGNN goes to Elliot: Towards Benchmarking Relational Deep Learning for Static Link Prediction (aka Personalized Item Recommendation) | Mar 20, 2025 | BenchmarkingLink Prediction | CodeCode Available | 0 |
| QCPINN: Quantum-Classical Physics-Informed Neural Networks for Solving PDEs | Mar 20, 2025 | BenchmarkingPhysics-informed machine learning | CodeCode Available | 1 |
| A Statistical Analysis for Per-Instance Evaluation of Stochastic Optimizers: How Many Repeats Are Enough? | Mar 20, 2025 | Benchmarking | —Unverified | 0 |
| Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | Mar 20, 2025 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 4 |
| The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination | Mar 20, 2025 | BenchmarkingLarge Language Model | CodeCode Available | 1 |
| DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs | Mar 20, 2025 | BenchmarkingHallucination | —Unverified | 0 |