SOTAVerified

Benchmarking

Papers

Showing 701725 of 5548 papers

TitleStatusHype
Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark DetectionCode1
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite ImageryCode1
Descending through a Crowded Valley - Benchmarking Deep Learning OptimizersCode1
A Ladder of Causal DistancesCode1
Automatic sleep stage classification with deep residual networks in a mixed-cohort settingCode1
Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and ToolkitCode1
Benchmarking Neural Network Robustness to Common Corruptions and Surface VariationsCode1
Benchmarking Retrieval-Augmented Multimomal Generation for Document Question AnsweringCode1
ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance ImagingCode1
Autonomous Microscopy Experiments through Large Language Model AgentsCode1
Atom-Level Optical Chemical Structure Recognition with Limited SupervisionCode1
Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural NetworksCode1
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite ImageryCode1
Descending through a Crowded Valley — Benchmarking Deep Learning OptimizersCode1
A Critical Assessment of State-of-the-Art in Entity AlignmentCode1
Benchmarking Offline Reinforcement Learning on Real-Robot HardwareCode1
DFGC 2021: A DeepFake Game CompetitionCode1
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4Code1
Benchmarking Language Models for Code Syntax UnderstandingCode1
Deluca -- A Differentiable Control Library: Environments, Methods, and BenchmarkingCode1
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language modelsCode1
Benchmarking: Past, Present and FutureCode1
Benchmarking Large Language Models for Automated Verilog RTL Code GenerationCode1
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought MethodCode1
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog DomainCode1
Show:102550
← PrevPage 29 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified