SOTAVerified

Benchmarking

Papers

Showing 40514100 of 5548 papers

TitleStatusHype
Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks0
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents0
oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving0
Benchmarking Adversarial Robustness of Compressed Deep Learning Models0
Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms0
Out of Distribution Performance of State of Art Vision Model0
Benchmarking Adversarial Robustness0
Overconfident Oracles: Limitations of In Silico Sequence Design Benchmarking0
Overview and practical recommendations on using Shapley Values for identifying predictive biomarkers via CATE modeling0
Overview of Todai Robot Project and Evaluation Framework of its NLP-based Problem Solving0
Benchmarking Adversarially Robust Quantum Machine Learning at Scale0
OVQA: A Clinically Generated Visual Question Answering Dataset0
Paddy Doctor: A Visual Image Dataset for Automated Paddy Disease Classification and Benchmarking0
Benchmarking adversarial attacks and defenses for time-series data0
PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms0
Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches0
Benchmarking Adaptive Intelligence and Computer Vision on Human-Robot Collaboration0
Benchmarking Adaptative Variational Quantum Algorithms on QUBO Instances0
Paradigm Shift in Sustainability Disclosure Analysis: Empowering Stakeholders with CHATREPORT, a Language Model-Based Tool0
Para-Lane: Multi-Lane Dataset Registering Parallel Scans for Benchmarking Novel View Synthesis0
Benchmarking Active Learning Strategies for Materials Optimization and Discovery0
A critical analysis of metrics used for measuring progress in artificial intelligence0
True Online TD-Replan(lambda) Achieving Planning through Replaying0
Benchmarking Active Learning for NILM0
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles0
Parsing Any Domain English text to CoNLL dependencies0
Trust but Verify: Programmatic VLM Evaluation in the Wild0
Participatory Personalization in Classification0
'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems0
When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques0
Benchmarking a Benchmark: How Reliable is MS-COCO?0
PASTA: A Dataset for Modeling Participant States in Narratives0
Yambda-5B -- A Large-Scale Multi-modal Dataset for Ranking And Retrieval0
PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database0
PathBench: A Benchmarking Platform for Classical and Learned Path Planning Algorithms0
PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology0
Patherea: Cell Detection and Classification for the 2020s0
A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis0
A Continuously Growing Dataset of Sentential Paraphrases0
Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications0
Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite0
PawPrint: Whose Footprints Are These? Identifying Animal Individuals by Their Footprints0
Object Pose Estimation in Robotics Revisited0
Benchmarking 3D multi-coil NC-PDNet MRI reconstruction0
Benchmarking 3D Human Pose Estimation Models Under Occlusions0
IN-Sight: Interactive Navigation through Sight0
Benchmarking 2D Egocentric Hand Pose Datasets0
Benchmark for Antibody Binding Affinity Maturation and Design0
Perception Test 2023: A Summary of the First Challenge And Outcome0
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark0
Show:102550
← PrevPage 82 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified