SOTAVerified

Benchmarking

Papers

Showing 14211430 of 5548 papers

TitleStatusHype
Sequential Large Language Model-Based Hyper-parameter OptimizationCode0
Multi-input Multi-output Loewner Framework for Vibration-based Damage Detection on a Trainer Jet0
OGBench: Benchmarking Offline Goal-Conditioned RLCode3
SFTrack: A Robust Scale and Motion Adaptive Algorithm for Tracking Small and Fast Moving Objects0
AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance LabelsCode0
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding0
OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery0
A Survey of Small Language Models0
An Auditing Test To Detect Behavioral Shift in Language ModelsCode0
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs0
Show:102550
← PrevPage 143 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified