Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–775 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica	Sep 6, 2021	Benchmarking	CodeCode Available	1	5
Benchmarking structure-based three-dimensional molecular generative models using GenBench3D: ligand conformation quality matters	Jul 5, 2024	Benchmarkingvalid	CodeCode Available	1	5
A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis	Aug 12, 2021	BenchmarkingMedical Image Analysis	CodeCode Available	1	5
BeHonest: Benchmarking Honesty in Large Language Models	Jun 19, 2024	BenchmarkingMisinformation	CodeCode Available	1	5
Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical Cyclones	Nov 5, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations	Apr 15, 2024	BenchmarkingBias Detection	CodeCode Available	1	5
A Comprehensive Overview of Large Language Models	Jul 12, 2023	Benchmarking	CodeCode Available	1	5
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity	Aug 11, 2023	BenchmarkingDiversity	CodeCode Available	1	5
Bench4KE: Benchmarking Automated Competency Question Generation	May 30, 2025	BenchmarkingQuestion Generation	CodeCode Available	1	5
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?	Sep 29, 2023	BenchmarkingKnowledge Graph Completion	CodeCode Available	1	5
A multi-schematic classifier-independent oversampling approach for imbalanced datasets	Jul 15, 2021	Benchmarking	CodeCode Available	1	5
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs	Sep 18, 2021	BenchmarkingComplex Query Answering	CodeCode Available	1	5
AirSim Drone Racing Lab	Mar 12, 2020	BenchmarkingOptical Flow Estimation	CodeCode Available	1	5
Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization	May 27, 2025	Benchmarking	CodeCode Available	1	5
A SWAT-based Reinforcement Learning Framework for Crop Management	Feb 10, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models	Dec 5, 2023	BenchmarkingVisual Question Answering	CodeCode Available	1	5
Benchmarking MRI Reconstruction Neural Networks on Large Public Datasets	Mar 6, 2020	BenchmarkingImage Reconstruction	CodeCode Available	1	5
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1	5
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning	Dec 11, 2024	AttributeBenchmarking	CodeCode Available	1	5
Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions	Oct 13, 2021	BenchmarkingComputational Efficiency	CodeCode Available	1	5
Disentangled Feature Representation for Few-shot Image Classification	Sep 26, 2021	BenchmarkingClassification	CodeCode Available	1	5
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis	Mar 9, 2021	BenchmarkingClassification	CodeCode Available	1	5
Does your model understand genes? A benchmark of gene properties for biological and text models	Dec 5, 2024	BenchmarkingMulti-class Classification	CodeCode Available	1	5
Event-Free Moving Object Segmentation from Moving Ego Vehicle	Apr 28, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1	5
ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry	Apr 1, 2023	3D Reconstruction3D Scene Reconstruction	CodeCode Available	1	5

Show:10 25 50

← PrevPage 31 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified