SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3311–3320 of 5548 papers

Title	Date	Tasks	Status	Hype
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models	Mar 1, 2024	BenchmarkingMathematical Reasoning	—Unverified	0
Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance	Mar 1, 2024	BenchmarkingStance Detection	—Unverified	0
The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition	Feb 29, 2024	Action Unit DetectionArousal Estimation	—Unverified	0
FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking	Feb 28, 2024	BenchmarkingInductive Learning	CodeCode Available	0
Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models	Feb 28, 2024	BenchmarkingHallucination	CodeCode Available	0
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies	Feb 27, 2024	BenchmarkingSystematic Generalization	—Unverified	0
The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns	Feb 27, 2024	BenchmarkingBinary Classification	CodeCode Available	0
A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images	Feb 27, 2024	BenchmarkingDefect Detection	—Unverified	0
The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection	Feb 27, 2024	Benchmarking	—Unverified	0
Performance Comparison of Surrogate-Assisted Evolutionary Algorithms on Computational Fluid Dynamics Problems	Feb 26, 2024	BenchmarkingEvolutionary Algorithms	—Unverified	0

Show:10 25 50

← PrevPage 332 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified