| AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM | Nov 26, 2024 | BenchmarkingText-to-Video Generation | CodeCode Available | 1 | 5 |
| Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents | Feb 27, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking MRI Reconstruction Neural Networks on Large Public Datasets | Mar 6, 2020 | BenchmarkingImage Reconstruction | CodeCode Available | 1 | 5 |
| Benchmarking Cognitive Biases in Large Language Models as Evaluators | Sep 29, 2023 | BenchmarkingIn-Context Learning | CodeCode Available | 1 | 5 |
| EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergence | Nov 1, 2023 | BenchmarkingCryogenic Electron Microscopy (cryo-EM) | CodeCode Available | 1 | 5 |
| Recent Advances on Neural Network Pruning at Initialization | Mar 11, 2021 | BenchmarkingNetwork Pruning | CodeCode Available | 1 | 5 |
| EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models | Jun 9, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks | Feb 7, 2025 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 1 | 5 |
| LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation | Nov 4, 2024 | BenchmarkingGraph Generation | CodeCode Available | 1 | 5 |
| EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography | Oct 31, 2024 | BenchmarkingElectromyography (EMG) | CodeCode Available | 1 | 5 |