Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1351–1375 of 5548 papers

Title	Date	Tasks	Status	Hype
Contemporary Symbolic Regression Methods and their Relative Performance	Jul 29, 2021	Benchmarkingparameter estimation	CodeCode Available	1
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms	Nov 30, 2023	BenchmarkingOpenAI Gym	CodeCode Available	1
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs	Apr 11, 2025	BenchmarkingImage Generation	CodeCode Available	1
Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification	Jul 6, 2023	BenchmarkingDomain Adaptation	CodeCode Available	1
A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games	Aug 25, 2021	BenchmarkingVideo Classification	CodeCode Available	1
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?	Sep 29, 2023	BenchmarkingKnowledge Graph Completion	CodeCode Available	1
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models	Nov 27, 2024	BenchmarkingEarth Observation	CodeCode Available	1
LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond	Oct 13, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1
Benchmarking Image Retrieval for Visual Localization	Nov 24, 2020	Autonomous DrivingBenchmarking	CodeCode Available	1
ArabicaQA: A Comprehensive Dataset for Arabic Question Answering	Mar 26, 2024	BenchmarkingMachine Reading Comprehension	CodeCode Available	1
A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models	Apr 22, 2024	BenchmarkingWorld Knowledge	CodeCode Available	1
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs	Sep 18, 2021	BenchmarkingComplex Query Answering	CodeCode Available	1
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA	Dec 29, 2023	AnatomyBenchmarking	CodeCode Available	1
Comprehensive benchmarking of large language models for RNA secondary structure prediction	Oct 21, 2024	Benchmarking	CodeCode Available	1
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets	Dec 10, 2021	Benchmarking	CodeCode Available	1
ReMeDi: Resources for Multi-domain, Multi-service, Medical Dialogues	Sep 1, 2021	BenchmarkingContrastive Learning	CodeCode Available	1
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies	Jun 15, 2025	Benchmarking	CodeCode Available	1
Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection	Apr 25, 2024	Benchmarkingobject-detection	CodeCode Available	1
Boosting Neural Image Compression for Machines Using Latent Space Masking	Dec 15, 2021	BenchmarkingImage Compression	CodeCode Available	1
Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets	Jan 29, 2024	BenchmarkingMachine Translation	CodeCode Available	1
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection	May 30, 2022	3D Object DetectionAutonomous Driving	CodeCode Available	1
MALPOLON: A Framework for Deep Species Distribution Modeling	Sep 26, 2024	BenchmarkingGPU	CodeCode Available	1
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models	Jun 24, 2024	BenchmarkingData Augmentation	CodeCode Available	1
High-Dimensional Inference in Bayesian Networks	Dec 16, 2021	BenchmarkingVocal Bursts Intensity Prediction	CodeCode Available	1
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning	May 30, 2024	Autonomous DrivingBenchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 55 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified