SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4551–4560 of 5548 papers

Title	Date	Tasks	Status	Hype
Improvements & Evaluations on the MLCommons CloudMask Benchmark	Mar 7, 2024	Benchmarking	CodeCode Available	0
The current state of single-cell proteomics data analysis	Oct 3, 2022	Benchmarking	CodeCode Available	0
Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation	Nov 11, 2024	16kBenchmarking	CodeCode Available	0
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation	Jan 27, 2021	BenchmarkingText Generation	CodeCode Available	0
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part II	Sep 17, 2024	BenchmarkingDescriptive	CodeCode Available	0
BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media Texts	Dec 3, 2024	Age And Gender ClassificationAge and Gender Estimation	CodeCode Available	0
LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages	Mar 24, 2025	Benchmarking	CodeCode Available	0
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part I	Sep 12, 2024	BenchmarkingCPU	CodeCode Available	0
LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts	Sep 5, 2024	Benchmarking	CodeCode Available	0
Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation Threads	Nov 6, 2022	BenchmarkingOpinion Mining	CodeCode Available	0

Show:10 25 50

← PrevPage 456 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified