SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2171–2180 of 5548 papers

Title	Date	Tasks	Status	Hype
Full-stack evaluation of Machine Learning inference workloads for RISC-V systems	May 24, 2024	BenchmarkingDeep Learning	—Unverified	0
Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks	May 24, 2024	BenchmarkingDecoder	—Unverified	0
MCDFN: Supply Chain Demand Forecasting via an Explainable Multi-Channel Data Fusion Network Model	May 24, 2024	BenchmarkingDemand Forecasting	—Unverified	0
Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling	May 23, 2024	Benchmarking	CodeCode Available	1
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models	May 23, 2024	Benchmarking	CodeCode Available	2
An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models	May 23, 2024	Autonomous DrivingBenchmarking	—Unverified	0
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents	May 23, 2024	Benchmarking	CodeCode Available	4
GCondenser: Benchmarking Graph Condensation	May 23, 2024	BenchmarkingGraph Representation Learning	CodeCode Available	1
A Gap in Time: The Challenge of Processing Heterogeneous IoT Data in Digitalized Buildings	May 23, 2024	BenchmarkingData Integration	—Unverified	0
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models	May 22, 2024	BenchmarkingHallucination	—Unverified	0

Show:10 25 50

← PrevPage 218 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified