SOTAVerified

Benchmarking

Papers

Showing 10011025 of 5548 papers

TitleStatusHype
Working Memory Capacity of ChatGPT: An Empirical StudyCode1
Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through LexicaCode1
FedCV: A Federated Learning Framework for Diverse Computer Vision TasksCode1
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMsCode1
featsel: A framework for benchmarking of feature selection algorithms and cost functionsCode1
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of ThingsCode1
RADAR: Benchmarking Language Models on Imperfect Tabular DataCode1
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?Code1
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable SummarizationCode1
DomainLab: A modular Python package for domain generalization in deep learningCode1
Federated Learning Under Intermittent Client Availability and Time-Varying Communication ConstraintsCode1
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?Code1
Benchmarking: Past, Present and FutureCode1
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089Code1
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative ComprehensionCode1
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning AlgorithmsCode1
A Comparison of Image Denoising MethodsCode1
Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image GenerationCode1
Fast hyperboloid decision tree algorithmsCode1
AI Agents That MatterCode1
Benchmarking Offline Reinforcement Learning on Real-Robot HardwareCode1
AI Accelerator Survey and TrendsCode1
EXPObench: Benchmarking Surrogate-based Optimisation Algorithms on Expensive Black-box FunctionsCode1
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMsCode1
Benchmarking Object Detectors with COCO: A New Path ForwardCode1
Show:102550
← PrevPage 41 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified