Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 876–900 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
End-to-end Knowledge Retrieval with Multi-modal Queries	Jun 1, 2023	BenchmarkingCross-Modal Retrieval	CodeCode Available	1	5
Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsers	Jul 9, 2020	Benchmarking	CodeCode Available	1	5
Knodle: Modular Weakly Supervised Learning with PyTorch	Apr 23, 2021	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
SHARP: Environment and Person Independent Activity Recognition with Commodity IEEE 802.11 Access Points	Mar 17, 2021	Activity RecognitionBenchmarking	CodeCode Available	1	5
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking	Jan 22, 2020	Benchmarkingobject-detection	CodeCode Available	1	5
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite	Mar 15, 2019	Benchmarking	CodeCode Available	1	5
Kvasir-Instrument: Diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy	Oct 23, 2020	BenchmarkingDiagnostic	CodeCode Available	1	5
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework	Dec 7, 2022	Benchmarking	CodeCode Available	1	5
A Closer Look at Mortality Risk Prediction from Electrocardiograms	Jun 24, 2024	BenchmarkingPrediction	CodeCode Available	1	5
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations	Mar 21, 2024	BenchmarkingMemorization	CodeCode Available	1	5
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation	Feb 26, 2025	BenchmarkingCode Generation	CodeCode Available	1	5
Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset	Jul 3, 2024	BenchmarkingDiversity	CodeCode Available	1	5
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study	Dec 30, 2021	AttributeBenchmarking	CodeCode Available	1	5
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care	Sep 16, 2022	BenchmarkingDeep Learning	CodeCode Available	1	5
CodeS: Natural Language to Code Repository via Multi-Layer Sketch	Mar 25, 2024	Benchmarking	CodeCode Available	1	5
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM	Nov 26, 2024	BenchmarkingText-to-Video Generation	CodeCode Available	1	5
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Feb 27, 2025	Benchmarking	CodeCode Available	1	5
Benchmarking MRI Reconstruction Neural Networks on Large Public Datasets	Mar 6, 2020	BenchmarkingImage Reconstruction	CodeCode Available	1	5
Benchmarking Cognitive Biases in Large Language Models as Evaluators	Sep 29, 2023	BenchmarkingIn-Context Learning	CodeCode Available	1	5
EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergence	Nov 1, 2023	BenchmarkingCryogenic Electron Microscopy (cryo-EM)	CodeCode Available	1	5
Recent Advances on Neural Network Pruning at Initialization	Mar 11, 2021	BenchmarkingNetwork Pruning	CodeCode Available	1	5
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models	Jun 9, 2024	Benchmarking	CodeCode Available	1	5
An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks	Feb 7, 2025	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	1	5
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation	Nov 4, 2024	BenchmarkingGraph Generation	CodeCode Available	1	5
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography	Oct 31, 2024	BenchmarkingElectromyography (EMG)	CodeCode Available	1	5

Show:10 25 50

← PrevPage 36 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified