SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2931–2940 of 5548 papers

Title	Date	Tasks	Status	Hype
Enhancing Biomedical Knowledge Discovery for Diseases: An Open-Source Framework Applied on Rett Syndrome and Alzheimer's Disease	Jul 18, 2024	Benchmarking	CodeCode Available	0
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle	Jul 18, 2024	BenchmarkingLanguage Modeling	—Unverified	0
RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark	Jul 18, 2024	3D Human Pose EstimationBenchmarking	—Unverified	0
Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance	Jul 18, 2024	Benchmarking	—Unverified	0
Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks	Jul 17, 2024	Adversarial RobustnessBenchmarking	CodeCode Available	0
FETCH: A Memory-Efficient Replay Approach for Continual Learning in Image Classification	Jul 17, 2024	BenchmarkingContinual Learning	—Unverified	0
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?	Jul 17, 2024	BenchmarkingSarcasm Detection	—Unverified	0
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models	Jul 17, 2024	BenchmarkingLanguage Modelling	—Unverified	0
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships	Jul 17, 2024	Benchmarking	CodeCode Available	0
HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects	Jul 17, 2024	BenchmarkingHuman-Object Interaction Detection	—Unverified	0

Show:10 25 50

← PrevPage 294 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified