SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1521–1530 of 5548 papers

Title	Date	Tasks	Status	Hype
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making	Oct 9, 2024	BenchmarkingDecision Making	CodeCode Available	3
M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes	Oct 9, 2024	BenchmarkingMotion Generation	—Unverified	0
Active Evaluation Acquisition for Efficient LLM Benchmarking	Oct 8, 2024	Benchmarking	—Unverified	0
Manual Verbalizer Enrichment for Few-Shot Text Classification	Oct 8, 2024	BenchmarkingClassification	—Unverified	0
Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective	Oct 8, 2024	AttributeBenchmarking	CodeCode Available	1
QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers	Oct 8, 2024	Benchmarking	CodeCode Available	0
FedGraph: A Research Library and Benchmark for Federated Graph Learning	Oct 8, 2024	BenchmarkingFederated Learning	CodeCode Available	2
Benchmarking of a new data splitting method on volcanic eruption data	Oct 8, 2024	Benchmarking	—Unverified	0
Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems	Oct 7, 2024	BenchmarkingMachine Translation	—Unverified	0
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild	Oct 7, 2024	BenchmarkingMixture-of-Experts	CodeCode Available	1

Show:10 25 50

← PrevPage 153 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified