SOTAVerified

Benchmarking

Papers

Showing 701725 of 5548 papers

TitleStatusHype
Attention, Please! Revisiting Attentive Probing for Masked Image ModelingCode1
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite ImageryCode1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report LabelingCode1
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning RobustnessCode1
Automatic sleep stage classification with deep residual networks in a mixed-cohort settingCode1
Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRTCode1
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language ModelsCode1
D2S: Document-to-Slide Generation Via Query-Based Text SummarizationCode1
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic WritingCode1
Autonomous Microscopy Experiments through Large Language Model AgentsCode1
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and RethinkingCode1
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine LearningCode1
A Ladder of Causal DistancesCode1
ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance ImagingCode1
A Critical Assessment of State-of-the-Art in Entity AlignmentCode1
DCL-Net: Deep Correspondence Learning Network for 6D Pose EstimationCode1
Atom-Level Optical Chemical Structure Recognition with Limited SupervisionCode1
dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal ProcessingCode1
Benchmarking Adversarial Patch Against Aerial DetectionCode1
CIBench: Evaluating Your LLMs with a Code Interpreter PluginCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
Deep learning model solves change point detection for multiple change typesCode1
ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and LocalizationCode1
Benchmarking and Analyzing Point Cloud Classification under CorruptionsCode1
CCTV-Gun: Benchmarking Handgun Detection in CCTV ImagesCode1
Show:102550
← PrevPage 29 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified