SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1141–1150 of 5548 papers

Title	Date	Tasks	Status	Hype
A Deep Reinforcement Learning Framework for Dynamic Portfolio Optimization: Evidence from China's Stock Market	Dec 24, 2024	BenchmarkingDecision Making	CodeCode Available	0
Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations	Dec 23, 2024	BenchmarkingQuestion Answering	—Unverified	0
StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs	Dec 23, 2024	BenchmarkingLogical Reasoning	—Unverified	0
Benchmarking Generative AI Models for Deep Learning Test Input Generation	Dec 23, 2024	BenchmarkingDeep Learning	CodeCode Available	0
Multimodal Deep Reinforcement Learning for Portfolio Optimization	Dec 23, 2024	ArticlesBenchmarking	—Unverified	0
SCBench: A Sports Commentary Benchmark for Video LLMs	Dec 23, 2024	Benchmarking	—Unverified	0
SMAC-Hard: Enabling Mixed Opponent Strategy Script and Self-play on SMAC	Dec 23, 2024	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	1
On the Generalization Ability of Machine-Generated Text Detectors	Dec 23, 2024	BenchmarkingMisinformation	CodeCode Available	1
Chumor 2.0: Towards Benchmarking Chinese Humor Understanding	Dec 23, 2024	Benchmarking	CodeCode Available	0
Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders	Dec 23, 2024	3D Shape ModelingBenchmarking	CodeCode Available	4

Show:10 25 50

← PrevPage 115 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified