SOTAVerified

Benchmarking

Papers

Showing 10761100 of 5548 papers

TitleStatusHype
Hyperparameter optimization in deep multi-target predictionCode1
EventEA: Benchmarking Entity Alignment for Event-centric Knowledge GraphsCode1
Benchmarking Adversarial Patch Against Aerial DetectionCode1
Benchmarking Language Models for Code Syntax UnderstandingCode1
A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial ImagesCode1
ESB: A Benchmark For Multi-Domain End-to-End Speech RecognitionCode1
SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural NetworksCode1
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research ChallengesCode1
RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator ControlCode1
Graphs, Constraints, and Search for the Abstraction and Reasoning CorpusCode1
An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality RecognitionCode1
iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylationsCode1
KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial DocumentsCode1
WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based EnvironmentsCode1
CAB: Comprehensive Attention Benchmarking on Long Sequence ModelingCode1
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and RethinkingCode1
DCL-Net: Deep Correspondence Learning Network for 6D Pose EstimationCode1
Benchmarking saliency methods for chest X-ray interpretationCode1
Benchmarking Reinforcement Learning Techniques for Autonomous NavigationCode1
ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial ViewpointsCode1
Neural Methods for Logical Reasoning Over Knowledge GraphsCode1
Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond AlgorithmsCode1
Sanity Check for External Clustering Validation Benchmarks using Internal Validation MeasuresCode1
A framework for benchmarking clustering algorithmsCode1
Active-Passive SimStereo -- Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo MethodsCode1
Show:102550
← PrevPage 44 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified