SOTAVerified

Benchmarking

Papers

Showing 13011325 of 5548 papers

TitleStatusHype
Forecasting Future International Events: A Reliable Dataset for Text-Based Event ModelingCode0
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time SeriesCode0
Multi-Agent Environments for Vehicle Routing ProblemsCode1
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking0
Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver0
Delta-Influence: Unlearning Poisons via Influence FunctionsCode0
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative ModelsCode5
BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation0
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games0
The Moral Mind(s) of Large Language Models0
Integrating Dynamic Correlation Shifts and Weighted Benchmarking in Extreme Value Analysis0
Benchmarking Positional Encodings for GNNs and Graph TransformersCode0
DLBacktrace: A Model Agnostic Explainability for any Deep Learning ModelsCode1
Introducing Milabench: Benchmarking Accelerators for AICode1
Benchmarking pre-trained text embedding models in aligning built asset informationCode0
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media ContextsCode0
Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies0
FastDraft: How to Train Your Draft0
Reinforcing Competitive Multi-Agents for Playing So Long Sucker0
Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML0
Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer SectionsCode0
The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods0
The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering0
Automated Coding of Communications in Collaborative Problem-solving Tasks Using ChatGPT0
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level0
Show:102550
← PrevPage 53 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified