SOTAVerified

Benchmarking

Papers

Showing 821830 of 5548 papers

TitleStatusHype
Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum SystemsCode0
Technical report of a DMD-based Characterization Method for Vision Sensors0
Optimizing open-domain question answering with graph-based retrieval augmented generation0
A2Perf: Real-World Autonomous Agents Benchmark0
Evaluation of Architectural Synthesis Using Generative AI0
One ruler to measure them all: Benchmarking multilingual long-context language modelsCode1
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics0
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defensesCode1
Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models0
From Claims to Evidence: A Unified Framework and Critical Analysis of CNN vs. Transformer vs. Mamba in Medical Image SegmentationCode1
Show:102550
← PrevPage 83 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified