SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2321–2330 of 5548 papers

Title	Date	Tasks	Status	Hype
AEON: Adaptive Estimation of Instance-Dependent In-Distribution and Out-of-Distribution Label Noise for Robust Learning	Jan 23, 2025	Benchmarkingimage-classification	—Unverified	0
You Only Crash Once v2: Perceptually Consistent Strong Features for One-Stage Domain Adaptive Detection of Space Terrain	Jan 23, 2025	BenchmarkingDomain Adaptation	—Unverified	0
DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale	Jan 23, 2025	Benchmarking	—Unverified	0
Leveraging LLMs to Create a Haptic Devices' Recommendation System	Jan 22, 2025	Benchmarking	—Unverified	0
CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization	Jan 22, 2025	Benchmarkingregression	—Unverified	0
Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities	Jan 22, 2025	BenchmarkingReferring Expression	—Unverified	0
RAG-Reward: Optimizing RAG with Reward Modeling and RLHF	Jan 22, 2025	BenchmarkingHallucination	—Unverified	0
Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning	Jan 22, 2025	Benchmarking	CodeCode Available	0
Benchmarking Generative AI for Scoring Medical Student Interviews in Objective Structured Clinical Examinations (OSCEs)	Jan 21, 2025	Benchmarking	—Unverified	0
Optimally-Weighted Maximum Mean Discrepancy Framework for Continual Learning	Jan 21, 2025	BenchmarkingContinual Learning	—Unverified	0

Show:10 25 50

← PrevPage 233 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified