SOTAVerified

Benchmarking

Papers

Showing 31263150 of 5548 papers

TitleStatusHype
The Russian practice of applying cluster approach in regional development0
Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images0
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models0
Investigating the Vision Transformer Model for Image Retrieval Tasks0
Benchmarking Robustness in Neural Radiance Fields0
The Principle of Unchanged Optimality in Reinforcement Learning Generalization0
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting0
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO0
Benchmarking Robot Manipulation with the Rubik's Cube0
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness0
The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection0
4Seasons: Benchmarking Visual SLAM and Long-Term Localization for Autonomous Driving in Challenging Conditions0
IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models0
IO-VNBD: Inertial and Odometry Benchmark Dataset for Ground Vehicle Positioning0
The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks0
Iris Liveness Detection Competition (LivDet-Iris) -- The 2020 Edition0
Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies0
Is Bio-Inspired Learning Better than Backprop? Benchmarking Bio Learning vs. Backprop0
Benchmarking Retrieval-Augmented Generation for Chemistry0
A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data0
Evaluating Ising Processing Units with Integer Programming0
Benchmarking Resource Usage for Efficient Distributed Deep Learning0
Benchmarking Reinforcement Learning Methods for Dexterous Robotic Manipulation with a Three-Fingered Gripper0
ISLES'24: Improving final infarct prediction in ischemic stroke using multimodal imaging and clinical data0
Benchmarking Reasoning Robustness in Large Language Models0
Show:102550
← PrevPage 126 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified