Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1401–1450 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Image Retrieval for Visual Localization	Nov 24, 2020	Autonomous DrivingBenchmarking	CodeCode Available	1	5
Quantitative Certification of Bias in Large Language Models	May 29, 2024	Benchmarking	CodeCode Available	1	5
ArabicaQA: A Comprehensive Dataset for Arabic Question Answering	Mar 26, 2024	BenchmarkingMachine Reading Comprehension	CodeCode Available	1	5
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning	Dec 11, 2023	BenchmarkingHuman-Object Interaction Detection	CodeCode Available	1	5
Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them all	Oct 17, 2024	AllBenchmarking	CodeCode Available	1	5
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery	Oct 31, 2024	BenchmarkingCloud Removal	CodeCode Available	1	5
Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems	Jun 28, 2021	3D ReconstructionBenchmarking	CodeCode Available	1	5
Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset	Aug 12, 2024	Benchmarking	CodeCode Available	1	5
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters	Oct 13, 2023	BenchmarkingFairness	CodeCode Available	1	5
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography	Oct 31, 2024	BenchmarkingElectromyography (EMG)	CodeCode Available	1	5
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets	Dec 10, 2021	Benchmarking	CodeCode Available	1	5
KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration	May 25, 2023	BenchmarkingFace Recognition	CodeCode Available	1	5
EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergence	Nov 1, 2023	BenchmarkingCryogenic Electron Microscopy (cryo-EM)	CodeCode Available	1	5
KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text Classification for Kinyarwanda and Kirundi	Oct 23, 2020	ArticlesBenchmarking	CodeCode Available	1	5
Beyond neural scaling laws: beating power law scaling via data pruning	Jun 29, 2022	Benchmarking	CodeCode Available	1	5
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning	May 30, 2024	Autonomous DrivingBenchmarking	CodeCode Available	1	5
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models	Dec 15, 2023	BenchmarkingCode Summarization	CodeCode Available	1	5
Beyond Normal: On the Evaluation of Mutual Information Estimators	Jun 19, 2023	BenchmarkingDomain Generalization	CodeCode Available	1	5
Enhancing Biomedical Relation Extraction with Directionality	Jan 23, 2025	BenchmarkingDocument-level Relation Extraction	CodeCode Available	1	5
Enhancing Ligand Pose Sampling for Molecular Docking	Nov 30, 2023	BenchmarkingMolecular Docking	CodeCode Available	1	5
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data	Jun 10, 2025	BenchmarkingData Augmentation	CodeCode Available	1	5
Just Rank: Rethinking Evaluation with Word and Sentence Similarities	Mar 5, 2022	BenchmarkingSemantic Similarity	CodeCode Available	1	5
Autonomous Reinforcement Learning: Formalism and Benchmarking	Dec 17, 2021	Benchmarkingreinforcement-learning	CodeCode Available	1	5
Labelling unlabelled videos from scratch with multi-modal self-supervision	Jun 24, 2020	BenchmarkingClustering	CodeCode Available	1	5
JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching	Feb 5, 2024	BenchmarkingSentence	CodeCode Available	1	5
ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry	Apr 1, 2023	3D Reconstruction3D Scene Reconstruction	CodeCode Available	1	5
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments	May 8, 2025	BenchmarkingPrompt Engineering	CodeCode Available	1	5
Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks	Nov 4, 2024	Action GenerationBenchmarking	CodeCode Available	1	5
Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking	Jun 17, 2024	BenchmarkingDemand Forecasting	CodeCode Available	1	5
SHARP: Environment and Person Independent Activity Recognition with Commodity IEEE 802.11 Access Points	Mar 17, 2021	Activity RecognitionBenchmarking	CodeCode Available	1	5
JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning	Jul 21, 2023	BenchmarkingCombinatorial Optimization	CodeCode Available	1	5
A Critical Assessment of State-of-the-Art in Entity Alignment	Oct 30, 2020	BenchmarkingEntity Alignment	CodeCode Available	1	5
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset	Nov 5, 2024	BenchmarkingLanguage Modeling	CodeCode Available	1	5
ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems	Mar 19, 2024	Benchmarkingfeature selection	CodeCode Available	1	5
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models	Jul 16, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes	May 10, 2025	BenchmarkingGPU	CodeCode Available	1	5
Jojajovai: A Parallel Guarani-Spanish Corpus for MT Benchmarking	Jun 1, 2022	BenchmarkingSentence	CodeCode Available	1	5
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks	May 13, 2021	BenchmarkingDrug Discovery	CodeCode Available	1	5
AQuA: A Benchmarking Tool for Label Quality Assessment	Jun 15, 2023	BenchmarkingLabel Error Detection	CodeCode Available	1	5
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond	Dec 25, 2023	Animal Pose EstimationBenchmarking	CodeCode Available	1	5
Evaluating histopathology transfer learning with ChampKit	Jun 14, 2022	BenchmarkingCell Detection	CodeCode Available	1	5
Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking	Jun 18, 2023	BenchmarkingLink Prediction	CodeCode Available	1	5
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models	Jun 2, 2023	BenchmarkingLanguage Acquisition	CodeCode Available	1	5
Evaluating Multimodal Representations on Visual Semantic Textual Similarity	Apr 4, 2020	BenchmarkingImage Captioning	CodeCode Available	1	5
ISSAFE: Improving Semantic Segmentation in Accidents by Fusing Event-based Data	Aug 20, 2020	Autonomous VehiclesBenchmarking	CodeCode Available	1	5
Rethinking Machine Unlearning in Image Generation Models	Jun 3, 2025	BenchmarkingImage Generation	CodeCode Available	1	5
JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds	Nov 5, 2023	Autonomous NavigationAutonomous Vehicles	CodeCode Available	1	5
Benchmark on Drug Target Interaction Modeling from a Structure Perspective	Jul 4, 2024	BenchmarkingDrug Discovery	CodeCode Available	1	5
ClinicRealm: Re-evaluating Large Language Models with Conventional Machine Learning for Non-Generative Clinical Prediction Tasks	Jul 26, 2024	BenchmarkingModel Selection	CodeCode Available	1	5
Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning Algorithms	Jul 8, 2021	Benchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 29 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified