Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4751–4775 of 5548 papers

Title	Date	Tasks	Status
Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural Networks	Jan 7, 2024	BenchmarkingGraph Neural Network	CodeCode Available
GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search	Jan 26, 2025	BenchmarkingDiversity	CodeCode Available
Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties	Feb 24, 2025	Benchmarking	CodeCode Available
Safe Trajectory Generation for Complex Urban Environments Using Spatio-temporal Semantic Corridor	Jun 24, 2019	Autonomous VehiclesBenchmarking	CodeCode Available
Natural Image Noise Dataset	Jun 1, 2019	BenchmarkingDenoising	CodeCode Available
Benchmarking Suite for Synthetic Aperture Radar Imagery Anomaly Detection (SARIAD) Algorithms	Apr 10, 2025	Anomaly DetectionBenchmarking	CodeCode Available
SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration	Sep 17, 2024	Benchmarkingcounterfactual	CodeCode Available
Geological Inference from Textual Data using Word Embeddings	Apr 10, 2025	BenchmarkingWord Embeddings	CodeCode Available
Flexible Generation of Preference Data for Recommendation Analysis	Jul 23, 2024	BenchmarkingRecommendation Systems	CodeCode Available
AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels	Oct 26, 2024	BenchmarkingInformation Retrieval	CodeCode Available
MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages	Mar 3, 2025	Benchmarking	CodeCode Available
The LOCATA Challenge: Acoustic Source Localization and Tracking	Sep 3, 2019	BenchmarkingSound Source Localization	CodeCode Available
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider	Apr 26, 2025	BenchmarkingGPU	CodeCode Available
A Meta-Analysis of the Anomaly Detection Problem	Mar 3, 2015	Anomaly DetectionBenchmarking	CodeCode Available
On the Measure of Intelligence	Nov 5, 2019	ARCBenchmarking	CodeCode Available
Generalization and Regularization in DQN	Sep 29, 2018	Atari GamesBenchmarking	CodeCode Available
Automatic Resolution of Domain Name Disputes	Nov 1, 2021	Benchmarking	CodeCode Available
Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AI	Jun 13, 2025	BenchmarkingIn-Context Learning	CodeCode Available
Automatic benchmarking of large multimodal models via iterative experiment programming	Jun 18, 2024	BenchmarkingLanguage Modeling	CodeCode Available
GenderBench: Evaluation Suite for Gender Biases in LLMs	May 17, 2025	Benchmarking	CodeCode Available
MineRL: A Large-Scale Dataset of Minecraft Demonstrations	Jul 29, 2019	BenchmarkingDeep Reinforcement Learning	CodeCode Available
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data	Feb 22, 2024	Benchmarking	CodeCode Available
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations	Jun 17, 2024	BenchmarkingDataset Generation	CodeCode Available
Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling	Mar 24, 2025	BenchmarkingOpenAI Gym	CodeCode Available
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma	Oct 4, 2023	BenchmarkingSegmentation	CodeCode Available

Show:10 25 50

← PrevPage 191 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified