SOTAVerified

Benchmarking

Papers

Showing 11761200 of 5548 papers

TitleStatusHype
Graph Neural Network-Based Anomaly Detection for River Network SystemsCode1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking SequencesCode1
BLADE: Benchmarking Language Model Agents for Data-Driven ScienceCode1
Benchmarking Simulation-Based InferenceCode1
Benchmarking Visual Localization for Autonomous NavigationCode1
A skeletonization algorithm for gradient-based optimizationCode1
Benchmarking Multi-Scene Fire and Smoke DetectionCode1
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defensesCode1
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object InteractionsCode1
Boosting Healthcare LLMs Through Retrieved ContextCode1
Boosting Neural Image Compression for Machines Using Latent Space MaskingCode1
GraphArena: Benchmarking Large Language Models on Graph Computational ProblemsCode1
Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine LearningCode1
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice TextCode1
Grounding Descriptions in Images informs Zero-Shot Visual RecognitionCode1
AI Accelerator Survey and TrendsCode1
ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation datasetCode1
Benchmarking Neural Network Generalization for Grammar InductionCode1
Benchmarking Neural Network Robustness to Common Corruptions and Surface VariationsCode1
Benchmarking Segmentation Models with Mask-Preserved Attribute EditingCode1
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and BeyondCode1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPTCode1
GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth BenchmarkingCode1
GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and BenchmarkingCode1
GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language ModelsCode1
Show:102550
← PrevPage 48 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified