SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1141–1150 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns	Jan 28, 2025	Adversarial AttackBenchmarking	CodeCode Available	1	5
Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain State Decoding	Nov 6, 2023	BenchmarkingData Compression	CodeCode Available	1	5
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data	Feb 27, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking Meaning Representations in Neural Semantic Parsing	Nov 1, 2020	BenchmarkingSemantic Parsing	CodeCode Available	1	5
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning	Sep 27, 2024	AutoMLBenchmarking	CodeCode Available	1	5
Benchmarking Meta-embeddings: What Works and What Does Not	Nov 1, 2021	BenchmarkingEmbeddings Evaluation	CodeCode Available	1	5
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios	Oct 25, 2024	BenchmarkingDiversity	CodeCode Available	1	5
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications	Mar 8, 2024	Action RecognitionBenchmarking	CodeCode Available	1	5
Generative Wind Power Curve Modeling Via Machine Vision: A Self-learning Deep Convolutional Network Based Method	Aug 19, 2021	BenchmarkingSynthetic Data Generation	CodeCode Available	1	5
Benchmarking Large Language Models for News Summarization	Jan 31, 2023	BenchmarkingNews Summarization	CodeCode Available	1	5

Show:10 25 50

← PrevPage 115 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified