SOTAVerified

Benchmarking

Papers

Showing 23612370 of 5548 papers

TitleStatusHype
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition0
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning0
CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing0
Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models0
AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI0
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation0
Advancing Retrieval-Augmented Generation for Persian: Development of Language Models, Comprehensive Benchmarks, and Best Practices for Optimization0
An Analysis of Model Robustness across Concurrent Distribution Shifts0
Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks0
IOLBENCH: Benchmarking LLMs on Linguistic ReasoningCode0
Show:102550
← PrevPage 237 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified