SOTAVerified

Benchmarking

Papers

Showing 13911400 of 5548 papers

TitleStatusHype
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite ImageryCode1
CALE: Continuous Arcade Learning EnvironmentCode7
Low-Density 3D Point Cloud Classification0
Survey of Cultural Awareness in Language Models: Text and BeyondCode1
NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentationCode0
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning0
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes0
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail ModelsCode2
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation GenerationCode2
Evaluating Cultural and Social Awareness of LLM Web Agents0
Show:102550
← PrevPage 140 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified