SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1391–1400 of 5548 papers

Title	Date	Tasks	Status	Hype
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery	Oct 31, 2024	BenchmarkingCloud Removal	CodeCode Available	1
CALE: Continuous Arcade Learning Environment	Oct 31, 2024	Atari GamesBenchmarking	CodeCode Available	7
Low-Density 3D Point Cloud Classification	Oct 30, 2024	3D Point Cloud ClassificationAutonomous Driving	—Unverified	0
Survey of Cultural Awareness in Language Models: Text and Beyond	Oct 30, 2024	Benchmarking	CodeCode Available	1
NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentation	Oct 30, 2024	BenchmarkingContinual Learning	CodeCode Available	0
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning	Oct 30, 2024	BenchmarkingHallucination	—Unverified	0
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes	Oct 30, 2024	Benchmarking	—Unverified	0
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models	Oct 30, 2024	Benchmarking	CodeCode Available	2
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation	Oct 30, 2024	BenchmarkingPassage Retrieval	CodeCode Available	2
Evaluating Cultural and Social Awareness of LLM Web Agents	Oct 30, 2024	BenchmarkingNavigate	—Unverified	0

Show:10 25 50

← PrevPage 140 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified