| FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning | May 12, 2025 | 16kBenchmarking | —Unverified | 0 |
| Benchmarking projective simulation in navigation problems | Apr 23, 2018 | BenchmarkingQ-Learning | —Unverified | 0 |
| Foundations for learning from noisy quantum experiments | Apr 28, 2022 | Benchmarking | —Unverified | 0 |
| Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate | May 28, 2025 | Benchmarking | —Unverified | 0 |
| A Survey on LLM-based News Recommender Systems | Feb 13, 2025 | BenchmarkingFairness | —Unverified | 0 |
| HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects | Jul 17, 2024 | BenchmarkingHuman-Object Interaction Detection | —Unverified | 0 |
| Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning | Nov 20, 2023 | BenchmarkingInverse Rendering | —Unverified | 0 |
| Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization | Jun 16, 2023 | Bayesian OptimizationBenchmarking | —Unverified | 0 |
| FRED: The Florence RGB-Event Drone Dataset | Jun 5, 2025 | BenchmarkingTrajectory Forecasting | —Unverified | 0 |
| Benchmarking Processor Performance by Multi-Threaded Machine Learning Algorithms | Sep 11, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |