SOTAVerified

Benchmarking

Papers

Showing 28312840 of 5548 papers

TitleStatusHype
BERT-based Chinese Text Classification for Emergency Domain with a Novel Loss Function0
AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI0
Benefits and Challenges of Dynamic Modelling of Cascading Failures in Power Systems0
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models0
Bench to the Future: A Pastcasting Benchmark for Forecasting Agents0
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning0
gSuite: A Flexible and Framework Independent Benchmark Suite for Graph Neural Network Inference on GPUs0
GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation0
Benchmarks as Microscopes: A Call for Model Metrology0
The Curious Case of Integrator Reach Sets, Part I: Basic Theory0
Show:102550
← PrevPage 284 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified