SOTAVerified

Benchmarking

Papers

Showing 53265350 of 5548 papers

TitleStatusHype
PixelHop: A Successive Subspace Learning (SSL) Method for Object ClassificationCode0
pke: an open source python-based keyphrase extraction toolkitCode0
Benchmarking Educational Program RepairCode0
A Benchmarking Study of Vision-based Robotic Grasping AlgorithmsCode0
CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and SummarizationCode0
CREPO: An Open Repository to Benchmark Credal Network AlgorithmsCode0
A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision MakingCode0
Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSICode0
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language ModelsCode0
ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasetsCode0
PMLB: A Large Benchmark Suite for Machine Learning Evaluation and ComparisonCode0
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting FrameworkCode0
pmuBAGE: The Benchmarking Assortment of Generated PMU Data for Power System Events -- Part I: Overview and ResultsCode0
pmuBAGE: The Benchmarking Assortment of Generated PMU Data for Power System EventsCode0
Continuous Optimization Benchmarks by SimulationCode0
Continual Learning Strategies for 3D Engineering Regression Problems: A Benchmarking StudyCode0
Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum SystemsCode0
Structured Prediction Problem ArchiveCode0
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment BenchmarkingCode0
Benchmarking down-scaled (not so large) pre-trained language modelsCode0
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition DynamicsCode0
ContextGNN goes to Elliot: Towards Benchmarking Relational Deep Learning for Static Link Prediction (aka Personalized Item Recommendation)Code0
Selected Languages are All You Need for Cross-lingual Truthfulness TransferCode0
Content-Aware Differential Privacy with Conditional Invertible Neural NetworksCode0
Population-wise Labeling of Sulcal Graphs using Multi-graph MatchingCode0
Show:102550
← PrevPage 214 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified