SOTAVerified

Benchmarking

Papers

Showing 37763800 of 5548 papers

TitleStatusHype
VMAS: A Vectorized Multi-Agent Simulator for Collective Robot LearningCode2
Benefits and Challenges of Dynamic Modelling of Cascading Failures in Power Systems0
Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and LeaderboardingCode2
Identifying the Context Shift between Test Benchmarks and Production Data0
Can Language Models Make Fun? A Case Study in Chinese Comical CrosstalkCode1
Less Is More: A Comparison of Active Learning Strategies for 3D Medical Image SegmentationCode1
HATE-ITA: New Baselines for Hate Speech Detection in ItalianCode0
SentSpace: Large-Scale Benchmarking and Evaluation of Text using Cognitively Motivated Lexical, Syntactic, and Semantic Features0
Towards Toxic Positivity Detection0
Benchmarking Intersectional Biases in NLPCode0
Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding0
DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles0
Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking0
Benchmarking Language-agnostic Intent Classification for Virtual Assistant PlatformsCode0
Local manifold learning and its link to domain-based physics knowledgeCode0
Analyzing the behaviour of D'WAVE quantum annealer: fine-tuning parameterization and tests with restrictive Hamiltonian formulations0
DFGC 2022: The Second DeepFake Game CompetitionCode1
Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital PathologyCode1
Computer-aided diagnosis and prediction in brain disorders0
An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes EquationsCode0
Beyond neural scaling laws: beating power law scaling via data pruningCode1
Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video FramesCode1
Toward an ImageNet Library of Functions for Global Optimization Benchmarking0
Benchopt: Reproducible, efficient and collaborative optimization benchmarksCode4
The DEBS 2022 Grand Challenge: Detecting Trading Trends in Financial Tick DataCode1
Show:102550
← PrevPage 152 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified