SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2511–2520 of 5548 papers

Title	Date	Tasks	Status	Hype
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents	Feb 13, 2024	BenchmarkingModel Selection	CodeCode Available	2
Privacy-Preserving Language Model Inference with Instance Obfuscation	Feb 13, 2024	BenchmarkingLanguage Modeling	—Unverified	0
BdSLW60: A Word-Level Bangla Sign Language Dataset	Feb 13, 2024	BenchmarkingGesture Recognition	CodeCode Available	0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages	Feb 12, 2024	Automated Theorem ProvingBenchmarking	—Unverified	0
Customizable Perturbation Synthesis for Robust SLAM Benchmarking	Feb 12, 2024	BenchmarkingSimultaneous Localization and Mapping	CodeCode Available	2
Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems	Feb 12, 2024	Benchmarking	—Unverified	0
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT	Feb 12, 2024	BenchmarkingChunking	—Unverified	0
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension	Feb 12, 2024	2kAutomatic Speech Recognition	CodeCode Available	2
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study	Feb 11, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0
Explainable Global Wildfire Prediction Models using Graph Neural Networks	Feb 11, 2024	BenchmarkingCommunity Detection	CodeCode Available	1

Show:10 25 50

← PrevPage 252 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified