SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1381–1390 of 5548 papers

Title	Date	Tasks	Status	Hype
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models	Nov 1, 2024	BenchmarkingMixture-of-Experts	CodeCode Available	1
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators	Oct 31, 2024	BenchmarkingText Generation	CodeCode Available	2
LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction	Oct 31, 2024	BenchmarkingPrediction	CodeCode Available	1
IdeaBench: Benchmarking Large Language Models for Research Idea Generation	Oct 31, 2024	Benchmarkingscientific discovery	CodeCode Available	0
Pedestrian Trajectory Prediction with Missing Data: Datasets, Imputation, and Benchmarking	Oct 31, 2024	BenchmarkingImputation	CodeCode Available	1
Benchmark Data Repositories for Better Benchmarking	Oct 31, 2024	Benchmarking	—Unverified	0
XRDSLAM: A Flexible and Modular Framework for Deep Learning based SLAM	Oct 31, 2024	3DGSBenchmarking	CodeCode Available	3
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography	Oct 31, 2024	BenchmarkingElectromyography (EMG)	CodeCode Available	1
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents	Oct 31, 2024	Benchmarking	CodeCode Available	3
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery	Oct 31, 2024	BenchmarkingCloud Removal	CodeCode Available	1

Show:10 25 50

← PrevPage 139 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified