SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2971–2980 of 5548 papers

Title	Date	Tasks	Status	Hype
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation	Jul 4, 2024	BenchmarkingChatbot	—Unverified	0
Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias	Jul 3, 2024	BenchmarkingBias Detection	CodeCode Available	0
Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms	Jul 3, 2024	BenchmarkingCPU	—Unverified	0
TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations	Jul 2, 2024	Benchmarkingtext-to-speech	—Unverified	0
Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks	Jul 2, 2024	Activity PredictionAnomaly Detection	CodeCode Available	0
Open foundation models for Azerbaijani language	Jul 2, 2024	Benchmarking	—Unverified	0
ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions	Jul 1, 2024	BenchmarkingQuestion Generation	—Unverified	0
EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting	Jul 1, 2024	3D ReconstructionBenchmarking	—Unverified	0
Reinvestigating the R2 Indicator: Achieving Pareto Compliance by Integration	Jul 1, 2024	Benchmarking	CodeCode Available	0
Modified CMA-ES Algorithm for Multi-Modal Optimization: Incorporating Niching Strategies and Dynamic Adaptation Mechanism	Jul 1, 2024	BenchmarkingDiversity	—Unverified	0

Show:10 25 50

← PrevPage 298 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified