SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1601–1610 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation	May 16, 2025	BenchmarkingEthics	CodeCode Available	0	5
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue Systems	Jun 1, 2021	BenchmarkingGoal-Oriented Dialogue Systems	CodeCode Available	0	5
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering	May 21, 2025	BenchmarkingLanguage Modeling	CodeCode Available	0	5
KhabarChin: Automatic Detection of Important News in the Persian Language	Dec 6, 2023	ArticlesBenchmarking	CodeCode Available	0	5
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-Zen	Mar 3, 2022	Benchmarking	CodeCode Available	0	5
An implementation of the "Guess who?" game using CLIP	Nov 30, 2021	Benchmarking	CodeCode Available	0	5
KArSL: Arabic Sign Language Database	Jan 1, 2021	BenchmarkingSign Language Recognition	CodeCode Available	0	5
Adjusting Pretrained Backbones for Performativity	Oct 6, 2024	BenchmarkingDeep Learning	CodeCode Available	0	5
Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis	Mar 18, 2025	BenchmarkingDrug Response Prediction	CodeCode Available	0	5
An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes Equations	Jun 29, 2022	Benchmarking	CodeCode Available	0	5

Show:10 25 50

← PrevPage 161 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified