SOTAVerified

Benchmarking

Papers

Showing 471480 of 5548 papers

TitleStatusHype
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice TextCode1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field EnlargementCode1
LEMUR Neural Network Dataset: Towards Seamless AutoMLCode1
TinyverseGP: Towards a Modular Cross-domain Benchmarking Framework for Genetic ProgrammingCode1
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMsCode1
Evolutionary Generation of Random Surreal Numbers for BenchmarkingCode1
An Empirical Study of GPT-4o Image Generation CapabilitiesCode1
V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language ModelsCode1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
Show:102550
← PrevPage 48 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified