| From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference | Oct 4, 2023 | BenchmarkingGPU | —Unverified | 0 |
| On the Performance of Multimodal Language Models | Oct 4, 2023 | BenchmarkingBinary Classification | —Unverified | 0 |
| T^3Bench: Benchmarking Current Progress in Text-to-3D Generation | Oct 4, 2023 | 3D GenerationBenchmarking | CodeCode Available | 3 |
| PGDQN: Preference-Guided Deep Q-Network | Oct 3, 2023 | Atari GamesBenchmarking | CodeCode Available | 1 |
| CausalTime: Realistically Generated Time-series for Benchmarking of Causal Discovery | Oct 3, 2023 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations | Oct 3, 2023 | Atomic ForcesBenchmarking | —Unverified | 0 |
| EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods | Oct 3, 2023 | Benchmarkingtext-guided-image-editing | —Unverified | 0 |
| Benchmarking and Improving Generator-Validator Consistency of Language Models | Oct 3, 2023 | BenchmarkingInstruction Following | —Unverified | 0 |
| GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth Benchmarking | Oct 3, 2023 | Benchmarkingcounterfactual | CodeCode Available | 1 |
| Learning Quantum Processes with Quantum Statistical Queries | Oct 3, 2023 | BenchmarkingCryptanalysis | CodeCode Available | 0 |