Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–500 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
An Image Dataset for Benchmarking Recommender Systems with Raw Pixels	Sep 13, 2023	BenchmarkingRecommendation Systems	CodeCode Available	1	5
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Feb 27, 2025	Benchmarking	CodeCode Available	1	5
An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models	Mar 15, 2024	BenchmarkingDrug Discovery	CodeCode Available	1	5
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials	Nov 29, 2022	Benchmarking	CodeCode Available	1	5
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph	Nov 15, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT	Jul 9, 2021	BenchmarkingDocument Classification	CodeCode Available	1	5
New Protocols and Negative Results for Textual Entailment Data Collection	Apr 24, 2020	BenchmarkingDiversity	CodeCode Available	1	5
An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks	Feb 7, 2025	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	1	5
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization	Nov 15, 2023	BenchmarkingInstruction Following	CodeCode Available	1	5
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM	Mar 28, 2024	Benchmarking	CodeCode Available	1	5
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs	Nov 2, 2020	Benchmarking	CodeCode Available	1	5
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies	Jun 15, 2025	Benchmarking	CodeCode Available	1	5
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking	Jul 3, 2024	BenchmarkingObject	CodeCode Available	1	5
AnomalyHop: An SSL-based Image Anomaly Localization Method	May 8, 2021	Anomaly LocalizationBenchmarking	CodeCode Available	1	5
Contemporary Symbolic Regression Methods and their Relative Performance	Jul 29, 2021	Benchmarkingparameter estimation	CodeCode Available	1	5
CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer	Dec 2, 2021	BenchmarkingOrdinal Classification	CodeCode Available	1	5
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4	Mar 20, 2023	BenchmarkingDe-identification	CodeCode Available	1	5
An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality Recognition	Oct 17, 2022	Benchmarking	CodeCode Available	1	5
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089	Nov 6, 2023	BenchmarkingKnowledge Base Question Answering	CodeCode Available	1	5
An Exploration of Embodied Visual Exploration	Jan 7, 2020	Benchmarking	CodeCode Available	1	5
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation	Feb 26, 2025	BenchmarkingCode Generation	CodeCode Available	1	5
Benchmarking Graph Neural Networks for FMRI analysis	Nov 16, 2022	Benchmarking	CodeCode Available	1	5
CODEMENV: Benchmarking Large Language Models on Code Migration	Jun 1, 2025	Benchmarking	CodeCode Available	1	5
Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective	Jul 10, 2024	BenchmarkingDiagnostic	CodeCode Available	1	5
Benchmarking Econometric and Machine Learning Methodologies in Nowcasting	May 6, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets	Dec 10, 2021	Benchmarking	CodeCode Available	1	5
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking	Jan 22, 2020	Benchmarkingobject-detection	CodeCode Available	1	5
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite	Mar 15, 2019	Benchmarking	CodeCode Available	1	5
Benchmarking Distribution Shift in Tabular Data with TableShift	Dec 10, 2023	BenchmarkingBinary Classification	CodeCode Available	1	5
Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT	Jun 13, 2024	BenchmarkingLLM-generated Text Detection	CodeCode Available	1	5
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform	Oct 12, 2021	Benchmarking	CodeCode Available	1	5
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models	Jan 2, 2025	BenchmarkingComputer Security	CodeCode Available	1	5
Benchmarking Differential Privacy and Federated Learning for BERT Models	Jun 26, 2021	BenchmarkingFederated Learning	CodeCode Available	1	5
Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Shape Reconstruction	Sep 24, 2023	3D Shape ReconstructionAnatomy	CodeCode Available	1	5
AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring	Jul 11, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation	Feb 18, 2024	BenchmarkingLanguage Modeling	CodeCode Available	1	5
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization	Apr 6, 2025	BenchmarkingCombinatorial Optimization	CodeCode Available	1	5
Benchmarking Language Model Creativity: A Case Study on Code Generation	Jul 12, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework	Dec 7, 2022	Benchmarking	CodeCode Available	1	5
CodeS: Natural Language to Code Repository via Multi-Layer Sketch	Mar 25, 2024	Benchmarking	CodeCode Available	1	5
CLoG: Benchmarking Continual Learning of Image Generation Models	Jun 7, 2024	BenchmarkingContinual Learning	CodeCode Available	1	5
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1	5
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs	Feb 23, 2024	Benchmarkingslot-filling	CodeCode Available	1	5
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs	Feb 21, 2025	Benchmarking	CodeCode Available	1	5
Clinical Prompt Learning with Frozen Language Models	May 11, 2022	BenchmarkingGPU	CodeCode Available	1	5
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation	Nov 10, 2023	BenchmarkingCloud Computing	CodeCode Available	1	5
A Platform for the Biomedical Application of Large Language Models	May 10, 2023	BenchmarkingPrivacy Preserving	CodeCode Available	1	5
Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory	Jul 20, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
Benchmarking Detection Transfer Learning with Vision Transformers	Nov 22, 2021	Benchmarkingobject-detection	CodeCode Available	1	5
Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments	Oct 18, 2024	Autonomous NavigationBenchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 10 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified