SOTAVerified

Benchmarking

Papers

Showing 211220 of 5548 papers

TitleStatusHype
EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and BenchmarkingCode2
EQ-Bench: An Emotional Intelligence Benchmark for Large Language ModelsCode2
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval ModelsCode2
EvalGIM: A Library for Evaluating Generative Image ModelsCode2
FedGraph: A Research Library and Benchmark for Federated Graph LearningCode2
EffiBench: Benchmarking the Efficiency of Automatically Generated CodeCode2
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail ModelsCode2
InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph PriorCode2
Benchmarking Agentic Workflow GenerationCode2
Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine PerceptionCode2
Show:102550
← PrevPage 22 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified