SOTAVerified

Benchmarking

Papers

Showing 951975 of 5548 papers

TitleStatusHype
Comics Datasets Framework: Mix of Comics datasets for detection benchmarkingCode1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking PlatformCode1
Benchmarking Distribution Shift in Tabular Data with TableShiftCode1
Benchmarking Object Detectors with COCO: A New Path ForwardCode1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality MetricsCode1
The Effect of Domain and Diacritics in Yorùbá-English Neural Machine TranslationCode1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code GenerationCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
Benchmarking Econometric and Machine Learning Methodologies in NowcastingCode1
MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization TaskCode1
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language ModelsCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data PerspectiveCode1
Benchmarking Graph Neural Networks for FMRI analysisCode1
Benchmarking End-to-End Behavioural Cloning on Video GamesCode1
Benchmarking Offline Reinforcement Learning on Real-Robot HardwareCode1
Mitigating Gender Bias in Captioning SystemsCode1
AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoringCode1
MLLM-DataEngine: An Iterative Refinement Approach for MLLMCode1
3DYoga90: A Hierarchical Video Dataset for Yoga Pose UnderstandingCode1
Clinical Prompt Learning with Frozen Language ModelsCode1
Coarse-to-Fine Q-attention with Learned Path RankingCode1
Show:102550
← PrevPage 39 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified