SOTAVerified

Benchmarking

Papers

Showing 12011225 of 5548 papers

TitleStatusHype
Benchmarking of GPU-optimized Quantum-Inspired Evolutionary Optimization Algorithm using Functional Analysis0
JuStRank: Benchmarking LLM Judges for System Ranking0
Neptune: The Long Orbit to Benchmarking Long Video UnderstandingCode2
Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction0
Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph GenerationCode0
Koopman Theory-Inspired Method for Learning Time Advancement Operators in Unstable Flame Front Evolution0
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual IllusionsCode0
Learn How to Query from Unlabeled Data Streams in Federated LearningCode0
Benchmarking learned algorithms for computed tomography image reconstruction tasks0
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image CaptioningCode1
LCFO: Long Context and Long Form Output Dataset and Benchmarking0
A quantum-classical reinforcement learning model to play Atari gamesCode0
Light Field Image Quality Assessment With Auxiliary Learning Based on Depthwise and Anglewise Separable Convolutions0
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems0
Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings0
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive AnnotationsCode5
Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments0
MO-IOHinspector: Anytime Benchmarking of Multi-Objective Algorithms using IOHprofiler0
Bilingual BSARD: Extending Statutory Article Retrieval to DutchCode0
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral PerspectiveCode0
Multi-Behavior Recommendation with Personalized Directed Acyclic Behavior GraphsCode1
PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language ModelsCode0
PowerMamba: A Deep State Space Model and Comprehensive Benchmark for Time Series Prediction in Electric Power SystemsCode1
ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities0
On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events0
Show:102550
← PrevPage 49 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified