| Benchmarking Generative AI Models for Deep Learning Test Input Generation | Dec 23, 2024 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| Chumor 2.0: Towards Benchmarking Chinese Humor Understanding | Dec 23, 2024 | Benchmarking | CodeCode Available | 0 |
| Multimodal Deep Reinforcement Learning for Portfolio Optimization | Dec 23, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| SCBench: A Sports Commentary Benchmark for Video LLMs | Dec 23, 2024 | Benchmarking | —Unverified | 0 |
| StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs | Dec 23, 2024 | BenchmarkingLogical Reasoning | —Unverified | 0 |
| Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations | Dec 23, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios | Dec 21, 2024 | Benchmarking | CodeCode Available | 0 |
| First-frame Supervised Video Polyp Segmentation via Propagative and Semantic Dual-teacher Network | Dec 21, 2024 | BenchmarkingTransfer Learning | CodeCode Available | 0 |
| Patherea: Cell Detection and Classification for the 2020s | Dec 21, 2024 | BenchmarkingCell Detection | —Unverified | 0 |
| Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts | Dec 20, 2024 | BenchmarkingOptical Character Recognition | CodeCode Available | 0 |