SOTAVerified

Benchmarking

Papers

Showing 861870 of 5548 papers

TitleStatusHype
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval0
BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life PredictionCode3
MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering0
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMsCode1
Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers0
CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs0
Safe Multi-Agent Navigation guided by Goal-Conditioned Safe Reinforcement LearningCode0
OpenFly: A Comprehensive Platform for Aerial Vision-Language Navigation0
A Real-time Spatio-Temporal Trajectory Planner for Autonomous Vehicles with Semantic Graph Optimization0
Overconfident Oracles: Limitations of In Silico Sequence Design Benchmarking0
Show:102550
← PrevPage 87 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified