SOTAVerified

Benchmarking

Papers

Showing 30263050 of 5548 papers

TitleStatusHype
Benchmarking Causal Study to Interpret Large Language Models for Source Code0
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0Code0
LLMRec: Benchmarking Large Language Models on Recommendation TaskCode1
Efficient Benchmarking of Language Models0
Expecting The Unexpected: Towards Broad Out-Of-Distribution DetectionCode0
Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman ProcessCode0
Beyond MD17: the reactive xxMD datasetCode0
Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models0
UGSL: A Unified Framework for Benchmarking Graph Structure Learning0
VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical RepresentationsCode1
Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing RisksCode0
Benchmarking Neural Network Generalization for Grammar InductionCode1
Benchmarking Adversarial Robustness of Compressed Deep Learning Models0
IoT Data Trust Evaluation via Machine LearningCode0
Deep Neural Operator Driven Real Time Inference for Nuclear Systems to Enable Digital Twin Solutions0
A Survey on Model Compression for Large Language Models0
Benchmarking Scalable Epistemic Uncertainty Quantification in Organ SegmentationCode0
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?Code1
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous AgentsCode2
Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields0
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic DiversityCode1
A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective OptimizationCode1
Spintronics for image recognition: performance benchmarking via ultrafast data-driven simulations0
Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation0
Enhancing Architecture Frameworks by Including Modern Stakeholders and their Views/Viewpoints0
Show:102550
← PrevPage 122 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified