SOTAVerified

Benchmarking

Papers

Showing 10011025 of 5548 papers

TitleStatusHype
NetPress: Dynamically Generated LLM Benchmarks for Network ApplicationsCode1
Neural Methods for Logical Reasoning Over Knowledge GraphsCode1
Working Memory Capacity of ChatGPT: An Empirical StudyCode1
Neural Regression, Representational Similarity, Model Zoology & Neural Taskonomy at Scale in Rodent Visual CortexCode1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking PlatformCode1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code GenerationCode1
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable SummarizationCode1
NICO++: Towards Better Benchmarking for Domain GeneralizationCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
nnOOD: A Framework for Benchmarking Self-supervised Anomaly Localisation MethodsCode1
Coarse-to-Fine Q-attention with Learned Path RankingCode1
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089Code1
A Comparison of Image Denoising MethodsCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and ResultsCode1
NuCLS: A scalable crowdsourcing, deep learning approach and dataset for nucleus classification, localization and segmentationCode1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
Object Shape Error Response Using Bayesian 3-D Convolutional Neural Networks for Assembly Systems With Compliant PartsCode1
CODEMENV: Benchmarking Large Language Models on Code MigrationCode1
ClearPose: Large-scale Transparent Object Dataset and BenchmarkCode1
AI Agents That MatterCode1
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate ModelsCode1
AI Accelerator Survey and TrendsCode1
Show:102550
← PrevPage 41 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified