| Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning | Jan 22, 2025 | Benchmarking | CodeCode Available | 0 |
| CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization | Jan 22, 2025 | Benchmarkingregression | —Unverified | 0 |
| Benchmarking Generative AI for Scoring Medical Student Interviews in Objective Structured Clinical Examinations (OSCEs) | Jan 21, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes | Jan 21, 2025 | Benchmarking | —Unverified | 0 |
| Optimally-Weighted Maximum Mean Discrepancy Framework for Continual Learning | Jan 21, 2025 | BenchmarkingContinual Learning | —Unverified | 0 |
| Benchmarking Image Perturbations for Testing Automated Driving Assistance Systems | Jan 21, 2025 | Autonomous VehiclesBenchmarking | CodeCode Available | 0 |
| Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing | Jan 20, 2025 | BenchmarkingEvolutionary Algorithms | —Unverified | 0 |
| Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model | Jan 20, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Large Language Models via Random Variables | Jan 20, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models | Jan 19, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 1 |