SOTAVerified

Benchmarking

Papers

Showing 22762300 of 5548 papers

TitleStatusHype
Bayesian Multi-type Mean Field Multi-agent Imitation Learning0
A Bayesian Model for Bivariate Causal Inference0
Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding0
Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding0
Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging0
Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada0
AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving0
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models0
Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems0
Finance Language Model Evaluation (FLaME)0
Beyond Benchmarks: On The False Promise of AI Regulation0
Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models0
Active Learning for Community Detection in Stochastic Block Models0
Filter Methods for Feature Selection in Supervised Machine Learning Applications -- Review and Benchmark0
Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art0
FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures0
Better Practices for Domain Adaptation0
Barkour: Benchmarking Animal-level Agility with Quadruped Robots0
Active Evaluation Acquisition for Efficient LLM Benchmarking0
AMLgentex: Mobilizing Data-Driven Research to Combat Money Laundering0
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding0
Few-Shot Defect Segmentation Leveraging Abundant Normal Training Samples Through Normal Background Regularization and Crop-and-Paste Operation0
Better Bill GPT: Comparing Large Language Models against Legal Invoice Reviewers0
BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures0
BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models for Violence Inciting Text Detection in Bengali0
Show:102550
← PrevPage 92 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified