| Better Bill GPT: Comparing Large Language Models against Legal Invoice Reviewers | Apr 2, 2025 | BenchmarkingManagement | —Unverified | 0 |
| When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks | Apr 2, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Horizon Scans can be accelerated using novel information retrieval and artificial intelligence tools | Apr 2, 2025 | Active LearningArticles | —Unverified | 0 |
| FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking | Apr 2, 2025 | 3D Scene ReconstructionBenchmarking | —Unverified | 0 |
| BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing | Apr 2, 2025 | 3D ReconstructionBenchmarking | CodeCode Available | 1 |
| Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions | Apr 2, 2025 | BenchmarkingSegmentation | —Unverified | 0 |
| Accelerating IoV Intrusion Detection: Benchmarking GPU-Accelerated vs CPU-Based ML Libraries | Apr 2, 2025 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework | Apr 2, 2025 | BenchmarkingSynthetic Data Generation | CodeCode Available | 2 |
| Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models | Apr 1, 2025 | Benchmarking | —Unverified | 0 |
| TDBench: Benchmarking Vision-Language Models in Understanding Top-Down Images | Apr 1, 2025 | Autonomous NavigationBenchmarking | CodeCode Available | 0 |