SOTAVerified

Benchmarking

Papers

Showing 11511175 of 5548 papers

TitleStatusHype
Hierarchical graph neural nets can capture long-range interactionsCode1
IDToolkit: A Toolkit for Benchmarking and Developing Inverse Design Algorithms in NanophotonicsCode1
A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial ImagesCode1
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarksCode1
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with DataCode1
Image Colorization: A Survey and DatasetCode1
German Text Embedding Clustering BenchmarkCode1
A Closer Look at Mortality Risk Prediction from ElectrocardiogramsCode1
Benchmarking MRI Reconstruction Neural Networks on Large Public DatasetsCode1
A global analysis of metrics used for measuring performance in natural language processingCode1
A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise ModelsCode1
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language ModelsCode1
Benchmarking Large Language Models for News SummarizationCode1
Benchmarking Multidomain English-Indonesian Machine TranslationCode1
AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope PredictionCode1
Geoclidean: Few-Shot Generalization in Euclidean GeometryCode1
German's Next Language ModelCode1
GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric AlgebrasCode1
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language ModelsCode1
Benchmarking Multimodal Knowledge Conflict for Large Multimodal ModelsCode1
A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective OptimizationCode1
BiBench: Benchmarking and Analyzing Network BinarizationCode1
Benchmarking Robustness of Text-Image Composed RetrievalCode1
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on InequalitiesCode1
Benchmarking Robustness to Adversarial Image ObfuscationsCode1
Show:102550
← PrevPage 47 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified