SOTAVerified

Benchmarking

Papers

Showing 751800 of 5548 papers

TitleStatusHype
A Systematic Benchmarking Analysis of Transfer Learning for Medical Image AnalysisCode1
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRACode1
A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation ModelsCode1
BeHonest: Benchmarking Honesty in Large Language ModelsCode1
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign RecognitionCode1
A Comprehensive Overview of Large Language ModelsCode1
Benchmarking TinyML Systems: Challenges and DirectionCode1
AirSim Drone Racing LabCode1
Bench4KE: Benchmarking Automated Competency Question GenerationCode1
A SWAT-based Reinforcement Learning Framework for Crop ManagementCode1
Benchmarking Low-Shot Robustness to Natural Distribution ShiftsCode1
Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital PathologyCode1
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object DetectionCode1
Bencher: Simple and Reproducible Benchmarking for Black-Box OptimizationCode1
Benchmarking LLMs for Political Science: A United Nations PerspectiveCode1
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal ModelsCode1
Benchmarking LLMs' Swarm intelligenceCode1
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?Code1
Anabranch Network for Camouflaged Object SegmentationCode1
G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural NetworksCode1
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease DetectionCode1
Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K datasetCode1
DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment RegimeCode1
Ego-Body Pose Estimation via Ego-Head Pose EstimationCode1
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field TestsCode1
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering WorkflowCode1
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image CaptioningCode1
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity DatasetCode1
Generalizable deep learning for photoplethysmography-based blood pressure estimation -- A Benchmarking StudyCode1
Generating a Doppelganger Graph: Resembling but DistinctCode1
4D Panoptic LiDAR SegmentationCode1
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarksCode1
Benchmark on Drug Target Interaction Modeling from a Structure PerspectiveCode1
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action ConstraintsCode1
DocuMint: Docstring Generation for Python using Small Language ModelsCode1
Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through LexicaCode1
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetCode1
BEND: Benchmarking DNA Language Models on biologically meaningful tasksCode1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
Benchmarking Adversarial Patch Against Aerial DetectionCode1
dMelodies: A Music Dataset for Disentanglement LearningCode1
GeoBenchX: Benchmarking LLMs for Multistep Geospatial TasksCode1
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language ModelsCode1
Benchmarking Adversarial Robustness on Image ClassificationCode1
Benchmarking of DL Libraries and Models on Mobile DevicesCode1
GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric AlgebrasCode1
DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip TrainingCode1
Does your model understand genes? A benchmark of gene properties for biological and text modelsCode1
Show:102550
← PrevPage 16 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified