SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2691–2700 of 5548 papers

Title	Date	Tasks	Status	Hype
QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers	Oct 8, 2024	Benchmarking	CodeCode Available	0
Named Clinical Entity Recognition Benchmark	Oct 7, 2024	BenchmarkingDecoder	CodeCode Available	0
Precise Model Benchmarking with Only a Few Observations	Oct 7, 2024	Benchmarkingmodel	—Unverified	0
Rule-based Data Selection for Large Language Models	Oct 7, 2024	BenchmarkingMath	—Unverified	0
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models	Oct 7, 2024	BenchmarkingSegmentation	CodeCode Available	0
Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems	Oct 7, 2024	BenchmarkingMachine Translation	—Unverified	0
Adjusting Pretrained Backbones for Performativity	Oct 6, 2024	BenchmarkingDeep Learning	CodeCode Available	0
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection	Oct 6, 2024	BenchmarkingMathematical Reasoning	—Unverified	0
Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels	Oct 5, 2024	Benchmarking	—Unverified	0
Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends	Oct 5, 2024	BenchmarkingChart Understanding	—Unverified	0

Show:10 25 50

← PrevPage 270 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified