SOTAVerified

Benchmarking

Papers

Showing 32263250 of 5548 papers

TitleStatusHype
Benchmarking Pedestrian Odometry: The Brown Pedestrian Odometry Dataset (BPOD)0
Benchmarking PathCLIP for Pathology Image Analysis0
Kolmogorov-Arnold Network for Transistor Compact Modeling0
Koopman Theory-Inspired Method for Learning Time Advancement Operators in Unstable Flame Front Evolution0
Benchmarking Out-of-Distribution Generalization Capabilities of DNN-based Encoding Models for the Ventral Visual Cortex0
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models0
KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning0
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences0
Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection0
Benchmarking Open-Source Large Language Models on Healthcare Text Classification Tasks0
L3Cube-MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi0
L3 Fusion: Fast Transformed Convolutions on CPUs0
Advocating Character Error Rate for Multilingual ASR Evaluation0
Label Anchored Contrastive Learning for Language Understanding0
Comparison of Open-Source and Proprietary LLMs for Machine Reading Comprehension: A Practical Analysis for Industrial Applications0
Label-Efficient Point Cloud Semantic Segmentation: An Active Learning Approach0
Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models0
AI Cyber Risk Benchmark: Automated Exploitation Capabilities0
λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics0
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs0
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection0
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama0
Benchmarking Online Sequence-to-Sequence and Character-based Handwriting Recognition from IMU-Enhanced Pens0
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time0
Benchmarking Online Object Trackers for Underwater Robot Position Locking Applications0
Show:102550
← PrevPage 130 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified