SOTAVerified

Benchmarking

Papers

Showing 501510 of 5548 papers

TitleStatusHype
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code GenerationCode1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
CODEMENV: Benchmarking Large Language Models on Code MigrationCode1
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for HallucinationsCode1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARKCode1
Benchmarking Differential Privacy and Federated Learning for BERT ModelsCode1
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization ModelingCode1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarkingCode1
Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor EnvironmentsCode1
Show:102550
← PrevPage 51 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified