SOTAVerified

Benchmarking

Papers

Showing 876900 of 5548 papers

TitleStatusHype
End-to-end Knowledge Retrieval with Multi-modal QueriesCode1
Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsersCode1
Knodle: Modular Weakly Supervised Learning with PyTorchCode1
SHARP: Environment and Person Independent Activity Recognition with Commodity IEEE 802.11 Access PointsCode1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarkingCode1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
Kvasir-Instrument: Diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopyCode1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
A Closer Look at Mortality Risk Prediction from ElectrocardiogramsCode1
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization CorrelationsCode1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code GenerationCode1
Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking DatasetCode1
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical StudyCode1
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive CareCode1
CodeS: Natural Language to Code Repository via Multi-Layer SketchCode1
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMMCode1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative AgentsCode1
Benchmarking MRI Reconstruction Neural Networks on Large Public DatasetsCode1
Benchmarking Cognitive Biases in Large Language Models as EvaluatorsCode1
EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergenceCode1
Recent Advances on Neural Network Pruning at InitializationCode1
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language ModelsCode1
An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative TasksCode1
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph GenerationCode1
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for ElectromyographyCode1
Show:102550
← PrevPage 36 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified