SOTAVerified

Benchmarking

Papers

Showing 5160 of 5548 papers

TitleStatusHype
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBenchCode4
Molecular-driven Foundation Model for Oncologic PathologyCode4
shapiq: Shapley Interactions for Machine LearningCode4
Benchmarking Automatic Machine Learning FrameworksCode3
Advancing LLM Reasoning Generalists with Preference TreesCode3
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision MakingCode3
Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous DrivingCode3
DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning BenchmarksCode3
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery AgentsCode3
CORL: Research-oriented Deep Offline Reinforcement Learning LibraryCode3
Show:102550
← PrevPage 6 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified