SOTAVerified

Benchmarking

Papers

Showing 40764100 of 5548 papers

TitleStatusHype
Parsing Any Domain English text to CoNLL dependencies0
Trust but Verify: Programmatic VLM Evaluation in the Wild0
Participatory Personalization in Classification0
'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems0
When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques0
Benchmarking a Benchmark: How Reliable is MS-COCO?0
PASTA: A Dataset for Modeling Participant States in Narratives0
Yambda-5B -- A Large-Scale Multi-modal Dataset for Ranking And Retrieval0
PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database0
PathBench: A Benchmarking Platform for Classical and Learned Path Planning Algorithms0
PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology0
Patherea: Cell Detection and Classification for the 2020s0
A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis0
A Continuously Growing Dataset of Sentential Paraphrases0
Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications0
Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite0
PawPrint: Whose Footprints Are These? Identifying Animal Individuals by Their Footprints0
Object Pose Estimation in Robotics Revisited0
Benchmarking 3D multi-coil NC-PDNet MRI reconstruction0
Benchmarking 3D Human Pose Estimation Models Under Occlusions0
IN-Sight: Interactive Navigation through Sight0
Benchmarking 2D Egocentric Hand Pose Datasets0
Benchmark for Antibody Binding Affinity Maturation and Design0
Perception Test 2023: A Summary of the First Challenge And Outcome0
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark0
Show:102550
← PrevPage 164 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified