SOTAVerified

Benchmarking

Papers

Showing 776800 of 5548 papers

TitleStatusHype
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field TestsCode1
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering WorkflowCode1
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image CaptioningCode1
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity DatasetCode1
Generalizable deep learning for photoplethysmography-based blood pressure estimation -- A Benchmarking StudyCode1
Generating a Doppelganger Graph: Resembling but DistinctCode1
4D Panoptic LiDAR SegmentationCode1
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarksCode1
Benchmark on Drug Target Interaction Modeling from a Structure PerspectiveCode1
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action ConstraintsCode1
DocuMint: Docstring Generation for Python using Small Language ModelsCode1
Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through LexicaCode1
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetCode1
BEND: Benchmarking DNA Language Models on biologically meaningful tasksCode1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
Benchmarking Adversarial Patch Against Aerial DetectionCode1
dMelodies: A Music Dataset for Disentanglement LearningCode1
GeoBenchX: Benchmarking LLMs for Multistep Geospatial TasksCode1
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language ModelsCode1
Benchmarking Adversarial Robustness on Image ClassificationCode1
Benchmarking of DL Libraries and Models on Mobile DevicesCode1
GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric AlgebrasCode1
DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip TrainingCode1
Does your model understand genes? A benchmark of gene properties for biological and text modelsCode1
Show:102550
← PrevPage 32 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified