SOTAVerified

Benchmarking

Papers

Showing 13261350 of 5548 papers

TitleStatusHype
Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them allCode1
IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARLCode1
Benchmarking Simulation-Based InferenceCode1
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4Code1
Neuro-Symbolic Inductive Logic Programming with Logical Neural NetworksCode1
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value ExtractionCode1
A Large-Scale Dataset for Benchmarking Elevator Button Segmentation and Character RecognitionCode1
Deep Learning for ECG Analysis: Benchmarks and Insights from PTB-XLCode1
Deep learning model solves change point detection for multiple change typesCode1
Implicit Multi-Spectral Transformer: An Lightweight and Effective Visible to Infrared Image Translation ModelCode1
nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image SegmentationCode1
Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and EfficiencyCode1
AudioMarkBench: Benchmarking Robustness of Audio WatermarkingCode1
Improving and Benchmarking Offline Reinforcement Learning AlgorithmsCode1
Benchmarking TinyML Systems: Challenges and DirectionCode1
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAMCode1
IMGTB: A Framework for Machine-Generated Text Detection BenchmarkingCode1
Benchmarking the Spectrum of Agent CapabilitiesCode1
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and EarbudsCode1
Benchmarking Image Retrieval for Visual LocalizationCode1
ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringCode1
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic ObjectCode1
Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark DetectionCode1
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal CorruptionsCode1
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasetsCode1
Show:102550
← PrevPage 54 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified