SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 951–960 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Differential Privacy and Federated Learning for BERT Models	Jun 26, 2021	BenchmarkingFederated Learning	CodeCode Available	1	5
Accelerated and interpretable oblique random survival forests	Aug 1, 2022	BenchmarkingComputational Efficiency	CodeCode Available	1	5
Explainable Benchmarking for Iterative Optimization Heuristics	Jan 31, 2024	BenchmarkingEvolutionary Algorithms	CodeCode Available	1	5
Benchmarking Distribution Shift in Tabular Data with TableShift	Dec 10, 2023	BenchmarkingBinary Classification	CodeCode Available	1	5
DataRec: A Python Library for Standardized and Reproducible Data Management in Recommender Systems	Oct 30, 2024	BenchmarkingManagement	CodeCode Available	1	5
Dataset and Benchmark: Novel Sensors for Autonomous Vehicle Perception	Jan 24, 2024	Benchmarking	CodeCode Available	1	5
Working Memory Capacity of ChatGPT: An Empirical Study	Apr 30, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet Datasets	Apr 11, 2022	Action Triplet RecognitionBenchmarking	CodeCode Available	1	5
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models	Oct 17, 2023	BenchmarkingLanguage Modelling	CodeCode Available	1	5
Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models	May 26, 2025	BenchmarkingRAG	CodeCode Available	1	5

Show:10 25 50

← PrevPage 96 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified