SOTAVerified

Benchmarking

Papers

Showing 16211630 of 5548 papers

TitleStatusHype
Sketch 'n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra0
The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests0
A Survey on Multimodal Benchmarks: In the Era of Large AI ModelsCode2
Efficient and Effective Model ExtractionCode0
CONGRA: Benchmarking Automatic Conflict ResolutionCode0
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology0
Present and Future Generalization of Synthetic Image DetectorsCode0
Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science CommunicatorsCode0
An Evolutionary Algorithm For the Vehicle Routing Problem with Drones with Interceptions0
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection0
Show:102550
← PrevPage 163 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified