| Multi-Fidelity Methods for Optimization: A Survey | Feb 15, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator | Feb 15, 2024 | BenchmarkingDiagnostic | CodeCode Available | 2 |
| Large-scale Benchmarking of Metaphor-based Optimization Heuristics | Feb 15, 2024 | BenchmarkingExperimental Design | —Unverified | 0 |
| The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse | Feb 15, 2024 | BenchmarkingModel Editing | CodeCode Available | 0 |
| Recommendations for Baselines and Benchmarking Approximate Gaussian Processes | Feb 15, 2024 | BenchmarkingGaussian Processes | —Unverified | 0 |
| Evaluation of simulation methods for tumor subclonal reconstruction | Feb 14, 2024 | Benchmarking | —Unverified | 0 |
| Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking | Feb 14, 2024 | BenchmarkingLanguage Modelling | CodeCode Available | 1 |
| MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models | Feb 14, 2024 | BenchmarkingDiversity | CodeCode Available | 2 |
| Design and Realization of a Benchmarking Testbed for Evaluating Autonomous Platooning Algorithms | Feb 14, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Benchmarking multi-component signal processing methods in the time-frequency plane | Feb 13, 2024 | BenchmarkingDenoising | CodeCode Available | 0 |