SOTAVerified

Benchmarking

Papers

Showing 35013550 of 5548 papers

TitleStatusHype
AERF: Adaptive ensemble random fuzzy algorithm for anomaly detection in cloud computing0
Logically at Factify 2: A Multi-Modal Fact Checking System Based on Evidence Retrieval techniques and Transformer Encoder Architecture0
"It's a Match!" -- A Benchmark of Task Affinity Scores for Joint Learning0
The CropAndWeed Dataset: A Multi-Modal Learning Approach for Efficient Crop and Weed ManipulationCode1
The Evolutionary Computation Methods No One Should Use0
ANNA: Abstractive Text-to-Image Synthesis with Filtered News CaptionsCode0
Trace Encoding in Process Mining: a survey and benchmarkingCode1
HaN-Seg: The head and neck organ-at-risk CT and MR segmentation dataset0
Improving Sequential Recommendation Models with an Enhanced Loss FunctionCode0
Benchmarking common uncertainty estimation methods with histopathological images under domain shift and label noise0
Benchmarking the Robustness of LiDAR Semantic Segmentation ModelsCode2
Reference Twice: A Simple and Unified Baseline for Few-Shot Instance SegmentationCode1
SQAD: Automatic Smartphone Camera Quality Assessment and BenchmarkingCode1
Tree Instance Segmentation With Temporal Contour Graph0
Benchmarking Robustness of 3D Object Detection to Common CorruptionsCode1
MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUsCode1
Comparison of tree-based ensemble algorithms for merging satellite and earth-observed precipitation data at the daily time scale0
4Seasons: Benchmarking Visual SLAM and Long-Term Localization for Autonomous Driving in Challenging Conditions0
Biologically Plausible Learning on Neuromorphic Hardware Architectures0
MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing0
AER: Auto-Encoder with Regression for Time Series Anomaly DetectionCode3
Quality at the Tail of Machine Learning Inference0
Benchmarking Machine Learning Models to Predict Corporate Bankruptcy0
Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based MethodCode2
A Seven-Layer Model for Standardising AI Fairness Assessment0
Causally Testing Gender Bias in LLMs: A Case Study on Occupational BiasCode0
Distributed Software-Defined Network Architecture for Smart Grid Resilience to Denial-of-Service Attacks0
AI applications in forest monitoring need remote sensing benchmark datasets0
Benchmarking person re-identification datasets and approaches for practical real-world implementationsCode0
A Comprehensive Study of the Robustness for LiDAR-based 3D Object Detectors against Adversarial AttacksCode1
AnyTOD: A Programmable Task-Oriented Dialog System0
Benchmarking Spatial Relationships in Text-to-Image GenerationCode1
Trial-Based Dominance Enables Non-Parametric Tests to Compare both the Speed and Accuracy of Stochastic Optimizers0
GiCCS: A German in-Context Conversational Similarity Benchmark0
Biomedical image analysis competitions: The state of current participation practice0
Automatic vehicle trajectory data reconstruction at scale0
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
Benchmarking Large Language Models for Automated Verilog RTL Code GenerationCode1
Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction0
PyPop7: A Pure-Python Library for Population-Based Black-Box OptimizationCode2
On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch BaselineCode1
Momentum Contrastive Pre-training for Question Answering0
Progressive Multi-view Human Mesh Recovery with Self-Supervision0
Ego-Body Pose Estimation via Ego-Head Pose EstimationCode1
On Distribution Grid Optimal Power Flow Development and Integration0
Benchmarking Self-Supervised Learning on Diverse Pathology DatasetsCode1
Is Bio-Inspired Learning Better than Backprop? Benchmarking Bio Learning vs. Backprop0
Model-based trajectory stitching for improved behavioural cloning and its applications0
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
An open unified deep graph learning framework for discovering drug leadsCode0
Show:102550
← PrevPage 71 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified