SOTAVerified

Benchmarking

Papers

Showing 34013450 of 5548 papers

TitleStatusHype
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches0
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4Code1
A Multi-Task Deep Learning Approach for Sensor-based Human Activity Recognition and Segmentation0
Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous DrivingCode0
Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering Regularized Self-TrainingCode1
COVID-19 event extraction from Twitter via extractive question answering with continuous promptsCode1
CCTV-Gun: Benchmarking Handgun Detection in CCTV ImagesCode1
NoisyHate: Mining Online Human-Written Perturbations for Realistic Robustness Benchmarking of Content Moderation Models0
DeAR: Debiasing Vision-Language Models with Additive Residuals0
Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+Code3
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum LearningCode0
Joint Multi-Scale Tone Mapping and Denoising for HDR Image EnhancementCode0
ShabbyPages: A Reproducible Document Denoising and Binarization Dataset0
DACOS-A Manually Annotated Dataset of Code Smells0
TransNetR: Transformer-based Residual Network for Polyp Segmentation with Multi-Center Out-of-Distribution TestingCode1
Aux-Drop: Handling Haphazard Inputs in Online Learning Using Auxiliary DropoutsCode0
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis DatasetCode0
Towards Self-adaptive Mutation in Evolutionary Multi-Objective Algorithms0
Using Affine Combinations of BBOB Problems for Performance Assessment0
Multimodal Multi-User Surface Recognition with the Kernel Two-Sample TestCode0
Continuous Function Structured in Multilayer Perceptron for Global Optimization0
Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study0
OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy PerceptionCode2
Continuous-Time Gaussian Process Motion-Compensation for Event-vision Pattern Tracking with Distance Fields0
Extended Agriculture-Vision: An Extension of a Large Aerial Image Dataset for Agricultural Pattern AnalysisCode2
FluidLab: A Differentiable Environment for Benchmarking Complex Fluid ManipulationCode2
Benchmarking framework for machine learning classification from fNIRS dataCode0
Benchmarking White Blood Cell Classification Under Domain ShiftCode0
Data-Efficient Training of CNNs and Transformers with Coresets: A Stability PerspectiveCode0
POPGym: Benchmarking Partially Observable Reinforcement LearningCode2
Structure-Based Experimental Datasets for Benchmarking Protein Simulation Force Fields0
Learning to Adapt to Online Streams with Distribution Shifts0
Benchmarking Self-Supervised Contrastive Learning Methods for Image-Based Plant PhenotypingCode0
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking0
Benchmarking Deepart Detection0
Predicting the Performance of a Computing System with Deep Networks0
Benchmarking of Cancelable Biometrics for Deep Templates0
STA: Self-controlled Text Augmentation for Improving Text ClassificationsCode0
Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views0
What Can We Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet ClassifiersCode1
Revisiting the Gumbel-Softmax in MADDPGCode1
A framework for benchmarking class-out-of-distribution detection and its application to ImageNetCode1
Dermatological Diagnosis Explainability Benchmark for Convolutional Neural NetworksCode0
MultiRobustBench: Benchmarking Robustness Against Multiple Attacks0
An Efficient Two-stage Gradient Boosting Framework for Short-term Traffic State EstimationCode0
Time to Embrace Natural Language Processing (NLP)-based Digital Pathology: Benchmarking NLP- and Convolutional Neural Network-based Deep Learning Pipelines0
Determinants of Performance in European ATM -- How to Analyze a Diverse Industry0
Arena-Rosnav 2.0: A Development and Benchmarking Platform for Robot Navigation in Highly Dynamic EnvironmentsCode0
Fuzzy Knowledge Distillation from High-Order TSK to Low-Order TSK0
Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking0
Show:102550
← PrevPage 69 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified