SOTAVerified

Benchmarking

Papers

Showing 10911100 of 5548 papers

TitleStatusHype
Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models0
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation0
Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks0
Advancing Retrieval-Augmented Generation for Persian: Development of Language Models, Comprehensive Benchmarks, and Best Practices for Optimization0
IOLBENCH: Benchmarking LLMs on Linguistic ReasoningCode0
An Analysis of Model Robustness across Concurrent Distribution Shifts0
Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding0
Machine Learning for Identifying Grain Boundaries in Scanning Electron Microscopy (SEM) Images of Nanoparticle Superlattices0
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input0
Underwater Image Restoration Through a Prior Guided Hybrid Sense Approach and Extensive Benchmark AnalysisCode1
Show:102550
← PrevPage 110 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified