SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 5548 papers

Title	Date	Tasks	Status	Hype
Hyperspectral Anomaly Detection Methods: A Survey and Comparative Study	Jul 8, 2025	Anomaly DetectionBenchmarking	—Unverified	0
SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations	Jul 8, 2025	6D Pose Estimation6D Pose Estimation using RGB	CodeCode Available	0
Inaugural MOASEI Competition at AAMAS'2025: A Technical Report	Jul 7, 2025	BenchmarkingDecision Making	—Unverified	0
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models	Jul 5, 2025	BenchmarkingGPU	CodeCode Available	1
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning	Jul 4, 2025	BenchmarkingGraph Generation	CodeCode Available	2
STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking	Jul 4, 2025	BenchmarkingNavigate	CodeCode Available	0
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction	Jul 3, 2025	Benchmarking	CodeCode Available	0
CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks	Jul 3, 2025	BenchmarkingCode Generation	—Unverified	0
Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data	Jul 3, 2025	BenchmarkingRepresentation Learning	CodeCode Available	1
TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation	Jul 1, 2025	BenchmarkingMachine Translation	—Unverified	0

Show:10 25 50

← PrevPage 3 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified