SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2591–2600 of 5548 papers

Title	Date	Tasks	Status	Hype
Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing	Nov 1, 2024	BenchmarkingSemantic Segmentation	CodeCode Available	0
Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model	Nov 1, 2024	BenchmarkingCross-Domain Named Entity Recognition	—Unverified	0
A Review of Reinforcement Learning in Financial Applications	Nov 1, 2024	BenchmarkingDecision Making	—Unverified	0
IdeaBench: Benchmarking Large Language Models for Research Idea Generation	Oct 31, 2024	Benchmarkingscientific discovery	CodeCode Available	0
Benchmark Data Repositories for Better Benchmarking	Oct 31, 2024	Benchmarking	—Unverified	0
NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentation	Oct 30, 2024	BenchmarkingContinual Learning	CodeCode Available	0
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning	Oct 30, 2024	BenchmarkingHallucination	—Unverified	0
Evaluating Cultural and Social Awareness of LLM Web Agents	Oct 30, 2024	BenchmarkingNavigate	—Unverified	0
Low-Density 3D Point Cloud Classification	Oct 30, 2024	3D Point Cloud ClassificationAutonomous Driving	—Unverified	0
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes	Oct 30, 2024	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 260 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified