SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 521–530 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking LLMs for Political Science: A United Nations Perspective	Feb 19, 2025	BenchmarkingDecision Making	CodeCode Available	1
Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?	Feb 18, 2025	BenchmarkingBlocking	CodeCode Available	1
ILIAS: Instance-Level Image retrieval At Scale	Feb 17, 2025	BenchmarkingImage Retrieval	CodeCode Available	1
Positional Encoding in Transformer-Based Time Series Models: A Survey	Feb 17, 2025	Anomaly DetectionBenchmarking	CodeCode Available	1
HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims	Feb 17, 2025	BenchmarkingFact Checking	CodeCode Available	1
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs	Feb 13, 2025	BenchmarkingRetrieval	CodeCode Available	1
LOB-Bench: Benchmarking Generative AI for Finance -- an Application to Limit Order Book Data	Feb 13, 2025	BenchmarkingState Space Models	CodeCode Available	1
Foundation Model of Electronic Medical Records for Adaptive Risk Estimation	Feb 10, 2025	Benchmarking	CodeCode Available	1
Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments	Feb 10, 2025	BenchmarkingOptical Character Recognition	CodeCode Available	1
ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts	Feb 8, 2025	BenchmarkingSelf-Supervised Learning	CodeCode Available	1

Show:10 25 50

← PrevPage 53 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified