SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4881–4890 of 5548 papers

Title	Date	Tasks	Status	Hype
Unraveling the Capabilities of Language Models in News Summarization	Jan 30, 2025	BenchmarkingFew-Shot Learning	CodeCode Available	0
mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale	Jun 26, 2025	Anomaly DetectionBenchmarking	CodeCode Available	0
FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks	Apr 18, 2021	BenchmarkingFederated Learning	CodeCode Available	0
MUBen: Benchmarking the Uncertainty of Molecular Representation Models	Jun 14, 2023	BenchmarkingDrug Discovery	CodeCode Available	0
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection	Sep 17, 2024	BenchmarkingEvent Detection	CodeCode Available	0
WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection	Mar 13, 2020	Abuse DetectionBenchmarking	CodeCode Available	0
FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs	Jun 8, 2023	BenchmarkingFederated Learning	CodeCode Available	0
Fedivertex: a Graph Dataset based on Decentralized Social Networks for Trustworthy Machine Learning	May 27, 2025	Benchmarking	CodeCode Available	0
Feature interpretability in BCIs: exploring the role of network lateralization	Jul 16, 2024	BenchmarkingEEG	CodeCode Available	0
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?	Oct 28, 2024	BenchmarkingQuestion Answering	CodeCode Available	0

Show:10 25 50

← PrevPage 489 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified