SOTAVerified

Benchmarking

Papers

Showing 13011325 of 5548 papers

TitleStatusHype
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional DependenciesCode1
IntelliGraphs: Datasets for Benchmarking Knowledge Graph GenerationCode1
ConsumerBench: Benchmarking Generative AI Applications on End-User DevicesCode1
Benchmarking Robustness of Machine Reading Comprehension ModelsCode1
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasksCode1
Benchmarking Robustness to Adversarial Image ObfuscationsCode1
Intrinsic Image HarmonizationCode1
CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version)Code1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarkingCode1
Exploiting News Article Structure for Automatic Corpus Generation of Entailment DatasetsCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIsCode1
ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance ImagingCode1
New Protocols and Negative Results for Textual Entailment Data CollectionCode1
CombiBench: Benchmarking LLM Capability for Combinatorial MathematicsCode1
A Ladder of Causal DistancesCode1
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlCode1
Benchmarking Segmentation Models with Mask-Preserved Attribute EditingCode1
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and RethinkingCode1
Benchmarking Self-Supervised Learning on Diverse Pathology DatasetsCode1
CoDEx: A Comprehensive Knowledge Graph Completion BenchmarkCode1
JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 MinutesCode1
JoinGym: An Efficient Query Optimization Environment for Reinforcement LearningCode1
Jojajovai: A Parallel Guarani-Spanish Corpus for MT BenchmarkingCode1
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAMCode1
Show:102550
← PrevPage 53 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified