SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2521–2530 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
DFEE: Interactive DataFlow Execution and Evaluation Kit	Dec 4, 2022	BenchmarkingScheduling	CodeCode Available	0	5
A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild	Jun 11, 2025	Age EstimationBenchmarking	CodeCode Available	0	5
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum Learning	Mar 16, 2023	BenchmarkingContinual Learning	CodeCode Available	0	5
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations	Jun 17, 2024	BenchmarkingDataset Generation	CodeCode Available	0	5
Flexible Generation of Preference Data for Recommendation Analysis	Jul 23, 2024	BenchmarkingRecommendation Systems	CodeCode Available	0	5
FALCON: Feature-Label Constrained Graph Net Collapse for Memory Efficient GNNs	Dec 27, 2023	BenchmarkingGPU	CodeCode Available	0	5
Benchmarking Commercial Intent Detection Services with Practice-Driven Evaluations	Dec 7, 2020	BenchmarkingGoal-Oriented Dialog	CodeCode Available	0	5
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering	May 27, 2025	BenchmarkingQuestion Answering	CodeCode Available	0	5
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN Parameters	Sep 8, 2022	Benchmarkingcontinuous-control	CodeCode Available	0	5
FR-MRInet: A Deep Convolutional Encoder-Decoder for Brain Tumor Segmentation with Relu-RGB and Sliding-window	Jul 26, 2018	BenchmarkingBrain Tumor Segmentation	CodeCode Available	0	5

Show:10 25 50

← PrevPage 253 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified