SOTAVerified

Benchmarking

Papers

Showing 4150 of 5548 papers

TitleStatusHype
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous AgentsCode4
Aequitas Flow: Streamlining Fair ML ExperimentationCode4
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression ToolkitCode4
Benchmarking Retrieval-Augmented Generation for MedicineCode4
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBenchCode4
Pearl: A Production-ready Reinforcement Learning AgentCode4
Benchmarking Neural Network Training AlgorithmsCode4
OpenAGI: When LLM Meets Domain ExpertsCode4
Vision-Language Models for Vision Tasks: A SurveyCode4
MTEB: Massive Text Embedding BenchmarkCode4
Show:102550
← PrevPage 5 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified