Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1351–1375 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
A Computed Tomography Vertebral Segmentation Dataset with Anatomical Variations and Multi-Vendor Scanner Data	Mar 10, 2021	AnatomyBenchmarking	CodeCode Available	1	5
DFGC 2021: A DeepFake Game Competition	Jun 2, 2021	BenchmarkingDeepFake Detection	CodeCode Available	1	5
DFGC 2022: The Second DeepFake Game Competition	Jun 30, 2022	BenchmarkingFace Swapping	CodeCode Available	1	5
Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification	Jul 6, 2023	BenchmarkingDomain Adaptation	CodeCode Available	1	5
A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games	Aug 25, 2021	BenchmarkingVideo Classification	CodeCode Available	1	5
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?	Sep 29, 2023	BenchmarkingKnowledge Graph Completion	CodeCode Available	1	5
DiffuSETS: 12-lead ECG Generation Conditioned on Clinical Text Reports and Patient-Specific Information	Jan 10, 2025	BenchmarkingData Augmentation	CodeCode Available	1	5
Online Learning with Optimism and Delay	Jun 13, 2021	BenchmarkingWeather Forecasting	CodeCode Available	1	5
SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)	Jun 25, 2024	BenchmarkingExperimental Design	CodeCode Available	1	5
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments	May 8, 2025	BenchmarkingPrompt Engineering	CodeCode Available	1	5
Initial recommendations for performing, benchmarking, and reporting single-cell proteomics experiments	Jul 19, 2022	BenchmarkingExperimental Design	CodeCode Available	1	5
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs	Sep 18, 2021	BenchmarkingComplex Query Answering	CodeCode Available	1	5
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA	Dec 29, 2023	AnatomyBenchmarking	CodeCode Available	1	5
OpenCIL: Benchmarking Out-of-Distribution Detection in Class-Incremental Learning	Jul 8, 2024	Benchmarkingclass-incremental learning	CodeCode Available	1	5
InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models	Jan 19, 2025	BenchmarkingQuestion Answering	CodeCode Available	1	5
OpenFWI: Large-Scale Multi-Structural Benchmark Datasets for Seismic Full Waveform Inversion	Nov 4, 2021	2kBenchmarking	CodeCode Available	1	5
Benchmarking Image Retrieval for Visual Localization	Nov 24, 2020	Autonomous DrivingBenchmarking	CodeCode Available	1	5
ArabicaQA: A Comprehensive Dataset for Arabic Question Answering	Mar 26, 2024	BenchmarkingMachine Reading Comprehension	CodeCode Available	1	5
IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding	Sep 11, 2020	BenchmarkingDiversity	CodeCode Available	1	5
OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents	Aug 6, 2024	BenchmarkingRetrieval-augmented Generation	CodeCode Available	1	5
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds	Apr 25, 2023	BenchmarkingPose Estimation	CodeCode Available	1	5
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets	Dec 10, 2021	Benchmarking	CodeCode Available	1	5
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models	Jun 24, 2024	BenchmarkingData Augmentation	CodeCode Available	1	5
DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models	Nov 19, 2024	BenchmarkingDeep Learning	CodeCode Available	1	5
RGB-D Indiscernible Object Counting in Underwater Scenes	Apr 23, 2023	BenchmarkingDepth Estimation	CodeCode Available	1	5

Show:10 25 50

← PrevPage 55 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified