SOTAVerified

Benchmarking

Papers

Showing 13011310 of 5548 papers

TitleStatusHype
A framework for benchmarking class-out-of-distribution detection and its application to ImageNetCode1
CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version)Code1
Mukayese: Turkish NLP Strikes BackCode1
Benchmarking Robustness of Machine Reading Comprehension ModelsCode1
Benchmarking Robustness of Text-Image Composed RetrievalCode1
Benchmarking Robustness to Adversarial Image ObfuscationsCode1
Benchmarking the Spectrum of Agent CapabilitiesCode1
Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K datasetCode1
Illuminating Darkness: Enhancing Real-world Low-light Scenes with Smartphone ImagesCode1
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value ExtractionCode1
Show:102550
← PrevPage 131 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified