SOTAVerified

Benchmarking

Papers

Showing 801825 of 5548 papers

TitleStatusHype
BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway ReasoningCode1
BLADE: Benchmarking Language Model Agents for Data-Driven ScienceCode1
Ego-Body Pose Estimation via Ego-Head Pose EstimationCode1
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for ElectromyographyCode1
Benchmarking AI scientists in omics data-driven biological researchCode1
HazeSpace2M: A Dataset for Haze Aware Single Image DehazingCode1
ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetryCode1
Benchmarking Algorithms for Federated Domain GeneralizationCode1
Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfilerCode1
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge LearningCode1
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge GraphsCode1
4D Panoptic LiDAR SegmentationCode1
Benchmarking and Analysis of Unsupervised Object Segmentation from Real-world Single ImagesCode1
Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized CodebaseCode1
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign RecognitionCode1
Benchmarking LLMs for Political Science: A United Nations PerspectiveCode1
B-Pref: Benchmarking Preference-Based Reinforcement LearningCode1
Benchmarking and Analyzing Point Cloud Classification under CorruptionsCode1
Benchmarking LLMs' Swarm intelligenceCode1
Benchmarking Low-Shot Robustness to Natural Distribution ShiftsCode1
Benchmarking of DL Libraries and Models on Mobile DevicesCode1
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice TextCode1
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization ModelingCode1
Dynatask: A Framework for Creating Dynamic AI Benchmark TasksCode1
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research ChallengesCode1
Show:102550
← PrevPage 33 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified