SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2911–2920 of 5548 papers

Title	Date	Tasks	Status	Hype
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference	Oct 4, 2023	BenchmarkingGPU	—Unverified	0
On the Performance of Multimodal Language Models	Oct 4, 2023	BenchmarkingBinary Classification	—Unverified	0
T^3Bench: Benchmarking Current Progress in Text-to-3D Generation	Oct 4, 2023	3D GenerationBenchmarking	CodeCode Available	3
PGDQN: Preference-Guided Deep Q-Network	Oct 3, 2023	Atari GamesBenchmarking	CodeCode Available	1
CausalTime: Realistically Generated Time-series for Benchmarking of Causal Discovery	Oct 3, 2023	BenchmarkingCausal Discovery	CodeCode Available	1
EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations	Oct 3, 2023	Atomic ForcesBenchmarking	—Unverified	0
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods	Oct 3, 2023	Benchmarkingtext-guided-image-editing	—Unverified	0
Benchmarking and Improving Generator-Validator Consistency of Language Models	Oct 3, 2023	BenchmarkingInstruction Following	—Unverified	0
GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth Benchmarking	Oct 3, 2023	Benchmarkingcounterfactual	CodeCode Available	1
Learning Quantum Processes with Quantum Statistical Queries	Oct 3, 2023	BenchmarkingCryptanalysis	CodeCode Available	0

Show:10 25 50

← PrevPage 292 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified