SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2551–2560 of 5548 papers

Title	Date	Tasks	Status	Hype
Coherent Feed Forward Quantum Neural Network	Feb 1, 2024	BenchmarkingDiagnostic	—Unverified	0
We're Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline	Feb 1, 2024	BenchmarkingDomain Adaptation	CodeCode Available	1
Benchmarking Transferable Adversarial Attacks	Feb 1, 2024	Adversarial AttackBenchmarking	CodeCode Available	1
Benchmarking Sensitivity of Continual Graph Learning for Skeleton-Based Action Recognition	Jan 31, 2024	Action RecognitionBenchmarking	—Unverified	0
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench	Jan 31, 2024	BenchmarkingMultiple-choice	CodeCode Available	4
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available	0
Explainable Benchmarking for Iterative Optimization Heuristics	Jan 31, 2024	BenchmarkingEvolutionary Algorithms	CodeCode Available	1
Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels	Jan 30, 2024	Benchmarkingimage-classification	CodeCode Available	1
Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios	Jan 30, 2024	Benchmarking	CodeCode Available	2
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks	Jan 29, 2024	BenchmarkingCross-Lingual Transfer	CodeCode Available	0

Show:10 25 50

← PrevPage 256 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified