SOTAVerified

Benchmarking

Papers

Showing 901910 of 5548 papers

TitleStatusHype
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?Code1
MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph DataCode1
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of ThingsCode1
G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural NetworksCode1
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking SuiteCode1
The Trickle-down Impact of Reward (In-)consistency on RLHFCode1
Revisiting Neural Program Smoothing for FuzzingCode1
FORB: A Flat Object Retrieval Benchmark for Universal Image EmbeddingCode1
Unified Long-Term Time-Series Forecasting BenchmarkCode1
NLPBench: Evaluating Large Language Models on Solving NLP ProblemsCode1
Show:102550
← PrevPage 91 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified