SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2801–2810 of 5548 papers

Title	Date	Tasks	Status	Hype
Grounded Intuition of GPT-Vision's Abilities with Scientific Images	Nov 3, 2023	Benchmarkingcounterfactual	CodeCode Available	0
An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction	Nov 3, 2023	BenchmarkingSentence	—Unverified	0
Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval	Nov 3, 2023	BenchmarkingFairness	CodeCode Available	0
Decentralized Federated Learning on the Edge over Wireless Mesh Networks	Nov 2, 2023	BenchmarkingFederated Learning	—Unverified	0
Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia	Nov 2, 2023	BenchmarkingMachine Translation	CodeCode Available	0
Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO	Nov 2, 2023	BenchmarkingEdge-computing	CodeCode Available	1
EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergence	Nov 1, 2023	BenchmarkingCryogenic Electron Microscopy (cryo-EM)	CodeCode Available	1
Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs	Nov 1, 2023	BenchmarkingQuestion Answering	—Unverified	0
SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization	Nov 1, 2023	Benchmarkingreinforcement-learning	—Unverified	0
UAV Immersive Video Streaming: A Comprehensive Survey, Benchmarking, and Open Challenges	Oct 31, 2023	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 281 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified