SOTAVerified

Benchmarking

Papers

Showing 41514175 of 5548 papers

TitleStatusHype
PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs0
Pitfalls of topology-aware image segmentation0
pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild0
A Computer Vision System to Localize and Classify Wastes on the Streets0
Benchmarking performance, explainability, and evaluation strategies of vision-language models for surgery: Challenges and opportunities0
A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects0
PKLot-A robust dataset for parking lot classification0
PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI0
BEADs: Bias Evaluation Across Domains0
BEACON: A Benchmark for Efficient and Accurate Counting of Subgraphs0
Plant in Cupboard, Orange on Rably, Inat Aphone. Benchmarking Incremental Learning of Situation and Language Model using a Text-Simulated Situated Environment0
BBOB Instance Analysis: Landscape Properties and Algorithm Performance across Problem Instances0
Bayesian Neural Networks at Scale: A Performance Analysis and Pruning Study0
Bayesian Multi-type Mean Field Multi-agent Imitation Learning0
White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs0
A Bayesian Model for Bivariate Causal Inference0
A Comprehensive Study on the Robustness of Image Classification and Object Detection in Remote Sensing: Surveying and Benchmarking0
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking0
Barkour: Benchmarking Animal-level Agility with Quadruped Robots0
BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models for Violence Inciting Text Detection in Bengali0
Point Cloud Compression and Objective Quality Assessment: A Survey0
Point Cloud Objective Quality: Benchmarking Features and Quality Evaluation0
Polarization and Index Modulations: a Theoretical and Practical Perspective0
Policy Entropy for Out-of-Distribution Classification0
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding0
Show:102550
← PrevPage 167 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified