SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1091–1100 of 5548 papers

Title	Date	Tasks	Status	Hype
Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models	Jan 9, 2025	BenchmarkingPhilosophical Reflection	—Unverified	0
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation	Jan 9, 2025	2k8k	—Unverified	0
Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks	Jan 8, 2025	BenchmarkingDeep Learning	—Unverified	0
Advancing Retrieval-Augmented Generation for Persian: Development of Language Models, Comprehensive Benchmarks, and Best Practices for Optimization	Jan 8, 2025	BenchmarkingGeneral Knowledge	—Unverified	0
IOLBENCH: Benchmarking LLMs on Linguistic Reasoning	Jan 8, 2025	Benchmarking	CodeCode Available	0
An Analysis of Model Robustness across Concurrent Distribution Shifts	Jan 8, 2025	Benchmarking	—Unverified	0
Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding	Jan 7, 2025	BenchmarkingCode Generation	—Unverified	0
Machine Learning for Identifying Grain Boundaries in Scanning Electron Microscopy (SEM) Images of Nanoparticle Superlattices	Jan 7, 2025	BenchmarkingClustering	—Unverified	0
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input	Jan 6, 2025	BenchmarkingForm	—Unverified	0
Underwater Image Restoration Through a Prior Guided Hybrid Sense Approach and Extensive Benchmark Analysis	Jan 6, 2025	BenchmarkingImage Enhancement	CodeCode Available	1

Show:10 25 50

← PrevPage 110 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified