SOTAVerified

Benchmarking

Papers

Showing 901925 of 5548 papers

TitleStatusHype
Benchmarking Cognitive Biases in Large Language Models as EvaluatorsCode1
MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph DataCode1
G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural NetworksCode1
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of ThingsCode1
Revisiting Neural Program Smoothing for FuzzingCode1
FORB: A Flat Object Retrieval Benchmark for Universal Image EmbeddingCode1
The Trickle-down Impact of Reward (In-)consistency on RLHFCode1
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking SuiteCode1
NLPBench: Evaluating Large Language Models on Solving NLP ProblemsCode1
OceanBench: The Sea Surface Height EditionCode1
Node-Aligned Graph-to-Graph (NAG2G): Elevating Template-Free Deep Learning Approaches in Single-Step RetrosynthesisCode1
Unified Long-Term Time-Series Forecasting BenchmarkCode1
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign RecognitionCode1
Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Shape ReconstructionCode1
Grad DFT: a software library for machine learning enhanced density functional theoryCode1
Prompt Tuned Embedding Classification for Multi-Label Industry Sector AllocationCode1
An Image Dataset for Benchmarking Recommender Systems with Raw PixelsCode1
Formalizing Multimedia Recommendation through Multimodal Deep LearningCode1
FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World ConditionsCode1
RecAD: Towards A Unified Library for Recommender Attack and DefenseCode1
Evaluation of large language models for discovery of gene set functionCode1
A skeletonization algorithm for gradient-based optimizationCode1
Benchmarking Autoregressive Conditional Diffusion Models for Turbulent Flow SimulationCode1
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph EngineeringCode1
Benchmarking the Generation of Fact Checking ExplanationsCode1
Show:102550
← PrevPage 37 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified