SOTAVerified

Benchmarking

Papers

Showing 891900 of 5548 papers

TitleStatusHype
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMMCode1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative AgentsCode1
Benchmarking MRI Reconstruction Neural Networks on Large Public DatasetsCode1
Benchmarking Cognitive Biases in Large Language Models as EvaluatorsCode1
EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergenceCode1
Recent Advances on Neural Network Pruning at InitializationCode1
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language ModelsCode1
An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative TasksCode1
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph GenerationCode1
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for ElectromyographyCode1
Show:102550
← PrevPage 90 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified