SOTAVerified

Benchmarking

Papers

Showing 29112920 of 5548 papers

TitleStatusHype
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference0
On the Performance of Multimodal Language Models0
T^3Bench: Benchmarking Current Progress in Text-to-3D GenerationCode3
PGDQN: Preference-Guided Deep Q-NetworkCode1
CausalTime: Realistically Generated Time-series for Benchmarking of Causal DiscoveryCode1
EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations0
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods0
Benchmarking and Improving Generator-Validator Consistency of Language Models0
GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth BenchmarkingCode1
Learning Quantum Processes with Quantum Statistical QueriesCode0
Show:102550
← PrevPage 292 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified