SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3471–3480 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking LLM Guardrails in Handling Multilingual Toxicity	Oct 29, 2024	Benchmarking	—Unverified	0	0
Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V3	Apr 22, 2025	BenchmarkingLanguage Modeling	—Unverified	0	0
Towards a Unified Framework for Determining Conformational Ensembles of Disordered Proteins	Apr 4, 2025	Benchmarking	—Unverified	0	0
Towards Benchmarking and Assessing the Safety and Robustness of Autonomous Driving on Safety-critical Scenarios	Mar 31, 2025	Adversarial AttackAutonomous Driving	—Unverified	0	0
Making Sense of Data in the Wild: Data Analysis Automation at Scale	Jan 27, 2025	BenchmarkingDiversity	—Unverified	0	0
OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User	Oct 26, 2023	Anomaly DetectionBenchmarking	—Unverified	0	0
A Deep Q-Learning Method for Downlink Power Allocation in Multi-Cell Networks	Apr 30, 2019	BenchmarkingDeep Reinforcement Learning	—Unverified	0	0
Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages	Sep 1, 2024	BenchmarkingCode Generation	—Unverified	0	0
Benchmarking LiDAR Sensors for Development and Evaluation of Automotive Perception	Apr 28, 2020	BenchmarkingSystematic Literature Review	—Unverified	0	0
Towards Benchmarking and Evaluating Deepfake Detection	Mar 4, 2022	BenchmarkingDeepFake Detection	—Unverified	0	0

Show:10 25 50

← PrevPage 348 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified