SOTAVerified

Benchmarking

Papers

Showing 43264350 of 5548 papers

TitleStatusHype
Yet Another ADNI Machine Learning Paper? Paving The Way Towards Fully-reproducible Research on Classification of Alzheimer's Disease0
Understanding the Limits of Lifelong Knowledge Editing in LLMs0
Who Wins the Game of Thrones? How Sentiments Improve the Prediction of Candidate Choice0
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective0
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture0
A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain0
R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models0
R2H: Building Multimodal Navigation Helpers that Respond to Help Requests0
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation0
R3L: Connecting Deep Reinforcement Learning to Recurrent Neural Networks for Image Denoising via Residual Recovery0
A Two-Stage Neural-Filter Pareto Front Extractor and the need for Benchmarking0
RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR0
A tutorial on multi-view autoencoders using the multi-view-AE library0
Understanding the User: An Intent-Based Ranking Dataset0
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems0
Attention versus Contrastive Learning of Tabular Data -- A Data-centric Benchmarking0
A Theory of Dynamic Benchmarks0
RAG-Reward: Optimizing RAG with Reward Modeling and RLHF0
Rail-5k: a Real-World Dataset for Rail Surface Defects Detection0
On the Evaluation of Engineering Artificial General Intelligence0
A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality0
RAN-GNNs: breaking the capacity limits of graph neural networks0
ATG: Benchmarking Automated Theorem Generation for Generative Language Models0
A Comparison of Cryptocurrency Volatility-benchmarking New and Mature Asset Classes0
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games0
Show:102550
← PrevPage 174 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified