SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2361–2370 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition	Jan 10, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning	Jan 9, 2025	BenchmarkingQuestion Answering	—Unverified	0
CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing	Jan 9, 2025	BenchmarkingChatbot	—Unverified	0
Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models	Jan 9, 2025	BenchmarkingPhilosophical Reflection	—Unverified	0
AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI	Jan 9, 2025	Benchmarkingnamed-entity-recognition	—Unverified	0
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation	Jan 9, 2025	2k8k	—Unverified	0
Advancing Retrieval-Augmented Generation for Persian: Development of Language Models, Comprehensive Benchmarks, and Best Practices for Optimization	Jan 8, 2025	BenchmarkingGeneral Knowledge	—Unverified	0
An Analysis of Model Robustness across Concurrent Distribution Shifts	Jan 8, 2025	Benchmarking	—Unverified	0
Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks	Jan 8, 2025	BenchmarkingDeep Learning	—Unverified	0
IOLBENCH: Benchmarking LLMs on Linguistic Reasoning	Jan 8, 2025	Benchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 237 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified