SOTAVerified

Benchmarking

Papers

Showing 25912600 of 5548 papers

TitleStatusHype
Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image EditingCode0
Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model0
A Review of Reinforcement Learning in Financial Applications0
IdeaBench: Benchmarking Large Language Models for Research Idea GenerationCode0
Benchmark Data Repositories for Better Benchmarking0
NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentationCode0
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning0
Evaluating Cultural and Social Awareness of LLM Web Agents0
Low-Density 3D Point Cloud Classification0
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes0
Show:102550
← PrevPage 260 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified