SOTAVerified

Benchmarking

Papers

Showing 14511500 of 5548 papers

TitleStatusHype
Explainable Benchmarking for Iterative Optimization HeuristicsCode1
Explainable Global Wildfire Prediction Models using Graph Neural NetworksCode1
Learning Representations with Contrastive Self-Supervised Learning for Histopathology ApplicationsCode1
BlenderGym: Benchmarking Foundational Model Systems for Graphics EditingCode1
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell DataCode1
Bag of Tricks for Adversarial TrainingCode1
Biomedical Data-to-Text Generation via Fine-Tuning TransformersCode1
Exploring Large Language Models for Classical PhilologyCode1
BioRED: A Rich Biomedical Relation Extraction DatasetCode1
BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway ReasoningCode1
Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited DataCode1
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech RepresentationsCode1
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph GenerationCode1
Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture TranscriptsCode1
AQuA: A Benchmarking Tool for Label Quality AssessmentCode1
Failure Detection in Medical Image Classification: A Reality Check and Benchmarking TestbedCode1
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language ModelsCode1
Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and BenchmarkingCode1
ScandEval: A Benchmark for Scandinavian Natural Language ProcessingCode1
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and BeyondCode1
Benchmarking large language models for biomedical natural language processing applications and recommendationsCode1
Quantum machine learning of large datasets using randomized measurementsCode1
MatTools: Benchmarking Large Language Models for Materials Science ToolsCode1
FineSurE: Fine-grained Summarization Evaluation using LLMsCode1
LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class TaxonomiesCode1
Fast hyperboloid decision tree algorithmsCode1
BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose EstimationCode1
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language ModelsCode1
BiBench: Benchmarking and Analyzing Network BinarizationCode1
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language ModelsCode1
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App ScreenshotsCode1
ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw TheoryCode1
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
Benchmarking Graph Neural Networks for FMRI analysisCode1
Beyond neural scaling laws: beating power law scaling via data pruningCode1
Beyond Normal: On the Evaluation of Mutual Information EstimatorsCode1
Formalizing Multimedia Recommendation through Multimodal Deep LearningCode1
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking SuiteCode1
Large Language Models for Multi-Robot Systems: A SurveyCode1
LEAF: A Benchmark for Federated SettingsCode1
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language ModelsCode1
Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and TasksCode1
MIRFLEX: Music Information Retrieval Feature Library for ExtractionCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
Smiles2Dock: an open large-scale multi-task dataset for ML-based molecular dockingCode1
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and ChallengingCode1
FiFAR: A Fraud Detection Dataset for Learning to DeferCode1
ACCESS DENIED INC: The First Benchmark Environment for Sensitivity AwarenessCode0
Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in GraphsCode0
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue SystemsCode0
Show:102550
← PrevPage 30 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified