SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1861–1870 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated Samples	Feb 6, 2025	BenchmarkingDeepFake Detection	CodeCode Available	0	5
Multimodal Multi-User Surface Recognition with the Kernel Two-Sample Test	Mar 8, 2023	BenchmarkingTime Series	CodeCode Available	0	5
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion	May 28, 2023	BenchmarkingDecision Making	CodeCode Available	0	5
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequences	Jun 25, 2025	Benchmarking	CodeCode Available	0	5
Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor Perturbation	Mar 2, 2021	BenchmarkingDeep Learning	CodeCode Available	0	5
MineRL: A Large-Scale Dataset of Minecraft Demonstrations	Jul 29, 2019	BenchmarkingDeep Reinforcement Learning	CodeCode Available	0	5
OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame Compression	Oct 27, 2023	BenchmarkingGPU	CodeCode Available	0	5
Advancing and Benchmarking Personalized Tool Invocation for LLMs	May 7, 2025	BenchmarkingWorld Knowledge	CodeCode Available	0	5
ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge	Jun 17, 2025	BenchmarkingRetrieval	CodeCode Available	0	5
Impact of ImageNet Model Selection on Domain Adaptation	Feb 6, 2020	BenchmarkingDomain Adaptation	CodeCode Available	0	5

Show:10 25 50

← PrevPage 187 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified