| Sketch 'n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra | Sep 22, 2024 | Benchmarking | —Unverified | 0 |
| The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests | Sep 22, 2024 | Benchmarking | —Unverified | 0 |
| A Survey on Multimodal Benchmarks: In the Era of Large AI Models | Sep 21, 2024 | BenchmarkingSurvey | CodeCode Available | 2 |
| Efficient and Effective Model Extraction | Sep 21, 2024 | Benchmarkingmodel | CodeCode Available | 0 |
| CONGRA: Benchmarking Automatic Conflict Resolution | Sep 21, 2024 | Benchmarking | CodeCode Available | 0 |
| @Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology | Sep 21, 2024 | BenchmarkingDepth Estimation | —Unverified | 0 |
| Present and Future Generalization of Synthetic Image Detectors | Sep 21, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science Communicators | Sep 21, 2024 | Benchmarking | CodeCode Available | 0 |
| An Evolutionary Algorithm For the Vehicle Routing Problem with Drones with Interceptions | Sep 21, 2024 | BenchmarkingScheduling | —Unverified | 0 |
| Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection | Sep 20, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |