Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–475 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
An Exploration of Embodied Visual Exploration	Jan 7, 2020	Benchmarking	CodeCode Available	1	5
AD-LLM: Benchmarking Large Language Models for Anomaly Detection	Dec 15, 2024	Anomaly DetectionBenchmarking	CodeCode Available	1	5
An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models	Mar 15, 2024	BenchmarkingDrug Discovery	CodeCode Available	1	5
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials	Nov 29, 2022	Benchmarking	CodeCode Available	1	5
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph	Nov 15, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking Econometric and Machine Learning Methodologies in Nowcasting	May 6, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments	Oct 18, 2024	Autonomous NavigationBenchmarking	CodeCode Available	1	5
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking	Jan 22, 2020	Benchmarkingobject-detection	CodeCode Available	1	5
Benchmarking Detection Transfer Learning with Vision Transformers	Nov 22, 2021	Benchmarkingobject-detection	CodeCode Available	1	5
CoDEx: A Comprehensive Knowledge Graph Completion Benchmark	Sep 16, 2020	BenchmarkingKnowledge Graph Completion	CodeCode Available	1	5
Large Scale MRI Collection and Segmentation of Cirrhotic Liver	Oct 6, 2024	BenchmarkingDiagnostic	CodeCode Available	1	5
Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective	Jul 10, 2024	BenchmarkingDiagnostic	CodeCode Available	1	5
ClearPose: Large-scale Transparent Object Dataset and Benchmark	Mar 8, 2022	BenchmarkingDepth Completion	CodeCode Available	1	5
AnomalyHop: An SSL-based Image Anomaly Localization Method	May 8, 2021	Anomaly LocalizationBenchmarking	CodeCode Available	1	5
Benchmarking Deep Models for Salient Object Detection	Feb 7, 2022	BenchmarkingObject	CodeCode Available	1	5
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics	May 6, 2025	Benchmarking	CodeCode Available	1	5
Benchmarking Deep Learning Interpretability in Time Series Predictions	Oct 26, 2020	BenchmarkingDeep Learning	CodeCode Available	1	5
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods	Aug 2, 2022	BenchmarkingCausal Discovery	CodeCode Available	1	5
Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding	May 21, 2024	BenchmarkingKeypoint Detection	CodeCode Available	1	5
CommonPower: A Framework for Safe Data-Driven Smart Grid Control	Jun 5, 2024	Benchmarkingenergy management	CodeCode Available	1	5
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization	Nov 15, 2023	BenchmarkingInstruction Following	CodeCode Available	1	5
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089	Nov 6, 2023	BenchmarkingKnowledge Base Question Answering	CodeCode Available	1	5
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin	Jul 15, 2024	Benchmarking	CodeCode Available	1	5
CIDEr: Consensus-based Image Description Evaluation	Nov 20, 2014	Action RecognitionAttribute	CodeCode Available	1	5
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning	Nov 29, 2024	BenchmarkingDeepFake Detection	CodeCode Available	1	5

Show:10 25 50

← PrevPage 19 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified