SOTAVerified

Benchmarking

Papers

Showing 31013150 of 5548 papers

TitleStatusHype
Benchmarking Sample Selection Strategies for Batch Reinforcement Learning0
InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset0
InterLoc: LiDAR-based Intersection Localization using Road Segmentation with Automated Evaluation Method0
InternalInspector I^2: Robust Confidence Estimation in LLMs through Internal States0
Interpretable Feature Construction for Time Series Extrinsic Regression0
Interpretable graph-based models on multimodal biomedical data integration: A technical review and benchmarking0
Interpretable machine learning applied to on-farm biosecurity and porcine reproductive and respiratory syndrome virus0
Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation0
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition0
The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search0
Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management0
Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor Perturbation0
Intrinsic uncertainties and where to find them0
Introducing a new benchmarked dataset for activity monitoring0
Introducing CausalBench: A Flexible Benchmark Framework for Causal Analysis and Machine Learning0
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval0
Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction0
7th AI Driving Olympics: 1st Place Report for Panoptic Tracking0
Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities0
Introduction to Voice Presentation Attack Detection and Recent Advances0
Intuitive or Dependent? Investigating LLMs' Behavior Style to Conflicting Prompts0
InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences0
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents0
Investigating Deep-Learning NLP for Automating the Extraction of Oncology Efficacy Endpoints from Scientific Literature0
Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings0
The Russian practice of applying cluster approach in regional development0
Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images0
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models0
Investigating the Vision Transformer Model for Image Retrieval Tasks0
Benchmarking Robustness in Neural Radiance Fields0
The Principle of Unchanged Optimality in Reinforcement Learning Generalization0
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting0
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO0
Benchmarking Robot Manipulation with the Rubik's Cube0
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness0
The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection0
4Seasons: Benchmarking Visual SLAM and Long-Term Localization for Autonomous Driving in Challenging Conditions0
IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models0
IO-VNBD: Inertial and Odometry Benchmark Dataset for Ground Vehicle Positioning0
The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks0
Iris Liveness Detection Competition (LivDet-Iris) -- The 2020 Edition0
Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies0
Is Bio-Inspired Learning Better than Backprop? Benchmarking Bio Learning vs. Backprop0
Benchmarking Retrieval-Augmented Generation for Chemistry0
A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data0
Evaluating Ising Processing Units with Integer Programming0
Benchmarking Resource Usage for Efficient Distributed Deep Learning0
Benchmarking Reinforcement Learning Methods for Dexterous Robotic Manipulation with a Three-Fingered Gripper0
ISLES'24: Improving final infarct prediction in ischemic stroke using multimodal imaging and clinical data0
Benchmarking Reasoning Robustness in Large Language Models0
Show:102550
← PrevPage 63 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified