SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2821–2830 of 5548 papers

Title	Date	Tasks	Status	Hype
Evaluating LLP Methods: Challenges and Approaches	Oct 29, 2023	BenchmarkingModel Selection	CodeCode Available	0
Benchmark Generation Framework with Customizable Distortions for Image Classifier Robustness	Oct 28, 2023	Benchmarkingimage-classification	CodeCode Available	0
OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame Compression	Oct 27, 2023	BenchmarkingGPU	CodeCode Available	0
On General Language Understanding	Oct 27, 2023	BenchmarkingEthics	—Unverified	0
OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User	Oct 26, 2023	Anomaly DetectionBenchmarking	—Unverified	0
Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting	Oct 25, 2023	BenchmarkingHyperparameter Optimization	—Unverified	0
RDBench: ML Benchmark for Relational Databases	Oct 25, 2023	Benchmarking	—Unverified	0
ConDefects: A New Dataset to Address the Data Leakage Concern for LLM-based Fault Localization and Program Repair	Oct 25, 2023	BenchmarkingFault localization	—Unverified	0
XFEVER: Exploring Fact Verification across Languages	Oct 25, 2023	BenchmarkingFact Verification	CodeCode Available	0
MLFMF: Data Sets for Machine Learning for Mathematical Formalization	Oct 24, 2023	BenchmarkingRecommendation Systems	CodeCode Available	1

Show:10 25 50

← PrevPage 283 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified