| Video Quality Assessment: A Comprehensive Survey | Dec 4, 2024 | BenchmarkingSurvey | CodeCode Available | 2 |
| Commit0: Library Generation from Scratch | Dec 2, 2024 | BenchmarkingCode Generation | CodeCode Available | 2 |
| OpenQDC: Open Quantum Data Commons | Nov 29, 2024 | Benchmarking | CodeCode Available | 2 |
| GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks | Nov 28, 2024 | BenchmarkingObject Counting | CodeCode Available | 2 |
| HourVideo: 1-Hour Video-Language Understanding | Nov 7, 2024 | Benchmarkingcounterfactual | CodeCode Available | 2 |
| Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive Prototyping | Nov 5, 2024 | BenchmarkingCode Generation | CodeCode Available | 2 |
| LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators | Oct 31, 2024 | BenchmarkingText Generation | CodeCode Available | 2 |
| InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models | Oct 30, 2024 | Benchmarking | CodeCode Available | 2 |
| CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation | Oct 30, 2024 | BenchmarkingPassage Retrieval | CodeCode Available | 2 |
| PC-Gym: Benchmark Environments For Process Control Problems | Oct 29, 2024 | BenchmarkingChemical Process | CodeCode Available | 2 |