SOTAVerified

Benchmarking

Papers

Showing 10511100 of 5548 papers

TitleStatusHype
The CropAndWeed Dataset: A Multi-Modal Learning Approach for Efficient Crop and Weed ManipulationCode1
Trace Encoding in Process Mining: a survey and benchmarkingCode1
Reference Twice: A Simple and Unified Baseline for Few-Shot Instance SegmentationCode1
MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUsCode1
SQAD: Automatic Smartphone Camera Quality Assessment and BenchmarkingCode1
Benchmarking Robustness of 3D Object Detection to Common CorruptionsCode1
Benchmarking Spatial Relationships in Text-to-Image GenerationCode1
A Comprehensive Study of the Robustness for LiDAR-based 3D Object Detectors against Adversarial AttacksCode1
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
Benchmarking Large Language Models for Automated Verilog RTL Code GenerationCode1
On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch BaselineCode1
Ego-Body Pose Estimation via Ego-Head Pose EstimationCode1
Benchmarking Self-Supervised Learning on Diverse Pathology DatasetsCode1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
RLogist: Fast Observation Strategy on Whole-slide Images with Deep Reinforcement LearningCode1
Towards Scene Understanding for Autonomous Operations on Airport ApronsCode1
Geoclidean: Few-Shot Generalization in Euclidean GeometryCode1
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning PotentialsCode1
A Call to Reflect on Evaluation Practices for Failure Detection in Image ClassificationCode1
Multi-Mask Aggregators for Graph Neural NetworksCode1
This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for PolishCode1
fseval: A Benchmarking Framework for Feature Selection and Feature Ranking AlgorithmsCode1
PIC4rl-gym: a ROS2 modular framework for Robots Autonomous Navigation with Deep Reinforcement LearningCode1
CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version)Code1
Benchmarking Graph Neural Networks for FMRI analysisCode1
Hyperparameter optimization in deep multi-target predictionCode1
EventEA: Benchmarking Entity Alignment for Event-centric Knowledge GraphsCode1
Benchmarking Adversarial Patch Against Aerial DetectionCode1
Benchmarking Language Models for Code Syntax UnderstandingCode1
A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial ImagesCode1
ESB: A Benchmark For Multi-Domain End-to-End Speech RecognitionCode1
SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural NetworksCode1
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research ChallengesCode1
RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator ControlCode1
Graphs, Constraints, and Search for the Abstraction and Reasoning CorpusCode1
An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality RecognitionCode1
iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylationsCode1
KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial DocumentsCode1
WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based EnvironmentsCode1
CAB: Comprehensive Attention Benchmarking on Long Sequence ModelingCode1
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and RethinkingCode1
DCL-Net: Deep Correspondence Learning Network for 6D Pose EstimationCode1
Benchmarking saliency methods for chest X-ray interpretationCode1
Benchmarking Reinforcement Learning Techniques for Autonomous NavigationCode1
ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial ViewpointsCode1
Neural Methods for Logical Reasoning Over Knowledge GraphsCode1
Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond AlgorithmsCode1
Sanity Check for External Clustering Validation Benchmarks using Internal Validation MeasuresCode1
A framework for benchmarking clustering algorithmsCode1
Active-Passive SimStereo -- Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo MethodsCode1
Show:102550
← PrevPage 22 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified