| SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models | Jun 13, 2024 | Benchmarking | CodeCode Available | 1 |
| BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics | Jun 13, 2024 | Benchmarking | CodeCode Available | 2 |
| ECBD: Evidence-Centered Benchmark Design for NLP | Jun 13, 2024 | Benchmarking | CodeCode Available | 0 |
| Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition | Jun 13, 2024 | Benchmarking | —Unverified | 0 |
| A Review of 315 Benchmark and Test Functions for Machine Learning Optimization Algorithms and Metaheuristics with Mathematical and Visual Descriptions | Jun 13, 2024 | Benchmarking | —Unverified | 0 |
| Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT | Jun 13, 2024 | BenchmarkingLLM-generated Text Detection | CodeCode Available | 1 |
| LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living | Jun 13, 2024 | BenchmarkingHuman-Object Interaction Detection | —Unverified | 0 |
| Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs | Jun 13, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 2 |
| Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs | Jun 13, 2024 | BenchmarkingGPU | CodeCode Available | 2 |
| SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution | Jun 13, 2024 | BenchmarkingImage Super-Resolution | CodeCode Available | 1 |