SOTAVerified

Benchmarking

Papers

Showing 2130 of 5548 papers

TitleStatusHype
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-XCode5
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and MetricsCode4
TerraTorch: The Geospatial Foundation Models ToolkitCode4
Stop Overthinking: A Survey on Efficient Reasoning for Large Language ModelsCode4
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic EvaluationCode4
Building reliable sim driving agents by scaling self-playCode4
A deep learning framework for efficient pathology image analysisCode4
Accelerating Data Processing and Benchmarking of AI Models for PathologyCode4
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and SoundCode4
Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented GenerationCode4
Show:102550
← PrevPage 3 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified