SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 721–730 of 5548 papers

Title	Date	Tasks	Status	Hype
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding	Mar 22, 2025	BenchmarkingObject	CodeCode Available	0
Benchmark Dataset for Pore-Scale CO2-Water Interaction	Mar 22, 2025	Benchmarking	—Unverified	0
CausalRivers -- Scaling up benchmarking of causal discovery for real-world time-series	Mar 21, 2025	Anomaly DetectionBenchmarking	—Unverified	0
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer	Mar 21, 2025	BenchmarkingVideo Generation	CodeCode Available	2
ContextGNN goes to Elliot: Towards Benchmarking Relational Deep Learning for Static Link Prediction (aka Personalized Item Recommendation)	Mar 20, 2025	BenchmarkingLink Prediction	CodeCode Available	0
QCPINN: Quantum-Classical Physics-Informed Neural Networks for Solving PDEs	Mar 20, 2025	BenchmarkingPhysics-informed machine learning	CodeCode Available	1
A Statistical Analysis for Per-Instance Evaluation of Stochastic Optimizers: How Many Repeats Are Enough?	Mar 20, 2025	Benchmarking	—Unverified	0
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models	Mar 20, 2025	BenchmarkingReinforcement Learning (RL)	CodeCode Available	4
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination	Mar 20, 2025	BenchmarkingLarge Language Model	CodeCode Available	1
DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs	Mar 20, 2025	BenchmarkingHallucination	—Unverified	0

Show:10 25 50

← PrevPage 73 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified