SOTAVerified

Benchmarking

Papers

Showing 13011350 of 5548 papers

TitleStatusHype
Data-Driven Denoising of Stationary Accelerometer SignalsCode1
Data Generating Process to Evaluate Causal Discovery Techniques for Time Series DataCode1
Initial recommendations for performing, benchmarking, and reporting single-cell proteomics experimentsCode1
Benchmarking Robustness of Machine Reading Comprehension ModelsCode1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Benchmarking Robustness to Adversarial Image ObfuscationsCode1
DCL-Net: Deep Correspondence Learning Network for 6D Pose EstimationCode1
MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph DataCode1
NAS-Bench-101: Towards Reproducible Neural Architecture SearchCode1
Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet DatasetsCode1
Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantificationCode1
Benchmarking saliency methods for chest X-ray interpretationCode1
Beyond neural scaling laws: beating power law scaling via data pruningCode1
Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generationCode1
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
A framework for benchmarking class-out-of-distribution detection and its application to ImageNetCode1
Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning AlgorithmsCode1
Benchmarking Segmentation Models with Mask-Preserved Attribute EditingCode1
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and RethinkingCode1
Benchmarking Self-Supervised Learning on Diverse Pathology DatasetsCode1
NEORL: NeuroEvolution Optimization with Reinforcement LearningCode1
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on InequalitiesCode1
Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K datasetCode1
Neural Multi-Hop Reasoning With Logical Rules on Biomedical Knowledge GraphsCode1
Attention, Please! Revisiting Attentive Probing for Masked Image ModelingCode1
Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them allCode1
IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARLCode1
Benchmarking Simulation-Based InferenceCode1
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4Code1
Neuro-Symbolic Inductive Logic Programming with Logical Neural NetworksCode1
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value ExtractionCode1
A Large-Scale Dataset for Benchmarking Elevator Button Segmentation and Character RecognitionCode1
Deep Learning for ECG Analysis: Benchmarks and Insights from PTB-XLCode1
Deep learning model solves change point detection for multiple change typesCode1
Implicit Multi-Spectral Transformer: An Lightweight and Effective Visible to Infrared Image Translation ModelCode1
nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image SegmentationCode1
Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and EfficiencyCode1
AudioMarkBench: Benchmarking Robustness of Audio WatermarkingCode1
Improving and Benchmarking Offline Reinforcement Learning AlgorithmsCode1
Benchmarking TinyML Systems: Challenges and DirectionCode1
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAMCode1
IMGTB: A Framework for Machine-Generated Text Detection BenchmarkingCode1
Benchmarking the Spectrum of Agent CapabilitiesCode1
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and EarbudsCode1
Benchmarking Image Retrieval for Visual LocalizationCode1
ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringCode1
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic ObjectCode1
Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark DetectionCode1
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal CorruptionsCode1
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasetsCode1
Show:102550
← PrevPage 27 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified