SOTAVerified

Benchmarking

Papers

Showing 10511060 of 5548 papers

TitleStatusHype
Coarse-to-Fine Q-attention with Learned Path RankingCode1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial ImagesCode1
ArtFID: Quantitative Evaluation of Neural Style TransferCode1
Benchmarking LLMs for Political Science: A United Nations PerspectiveCode1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
CombiBench: Benchmarking LLM Capability for Combinatorial MathematicsCode1
CriticBench: Benchmarking LLMs for Critique-Correct ReasoningCode1
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World ScenariosCode1
ClearPose: Large-scale Transparent Object Dataset and BenchmarkCode1
Show:102550
← PrevPage 106 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified