SOTAVerified

Benchmarking

Papers

Showing 32813290 of 5548 papers

TitleStatusHype
Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution TracesCode1
R2H: Building Multimodal Navigation Helpers that Respond to Help Requests0
When the Music Stops: Tip-of-the-Tongue Retrieval for MusicCode0
Benchmarking Machine Translation with Cultural AwarenessCode0
Robust Model-Based Optimization for Challenging Fitness LandscapesCode0
Exploring Large Language Models for Classical PhilologyCode1
Multilingual Large Language Models Are Not (Yet) Code-Switchers0
How Fragile is Relation Extraction under Entity Replacements?Code0
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought MethodCode1
A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting ApproachesCode0
Show:102550
← PrevPage 329 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified