SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 561–570 of 5548 papers

Title	Date	Tasks	Status	Hype
Unreal Robotics Lab: A High-Fidelity Robotics Simulator with Advanced Physics and Rendering	Apr 19, 2025	BenchmarkingDataset Generation	—Unverified	0
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale	Apr 19, 2025	Benchmarking	CodeCode Available	2
CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations	Apr 19, 2025	Benchmarking	—Unverified	0
AI Idea Bench 2025: AI Research Idea Generation Benchmark	Apr 19, 2025	Benchmarkingscientific discovery	—Unverified	0
Integrated Super-resolution Sensing and Symbiotic Communication with 3D Sparse MIMO for Low-Altitude UAV Swarm	Apr 18, 2025	BenchmarkingSuper-Resolution	—Unverified	0
OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation	Apr 18, 2025	Benchmarking	—Unverified	0
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models	Apr 17, 2025	BenchmarkingMath	—Unverified	0
Benchmarking LLM-based Relevance Judgment Methods	Apr 17, 2025	BenchmarkingOpen-Domain Question Answering	CodeCode Available	0
Benchmarking Multi-National Value Alignment for Large Language Models	Apr 17, 2025	Benchmarking	—Unverified	0
Enhancing Explainability and Reliable Decision-Making in Particle Swarm Optimization through Communication Topologies	Apr 17, 2025	BenchmarkingDecision Making	—Unverified	0

Show:10 25 50

← PrevPage 57 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified