Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 901–925 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
An Image Dataset for Benchmarking Recommender Systems with Raw Pixels	Sep 13, 2023	BenchmarkingRecommendation Systems	CodeCode Available	1	5
Comprehensive benchmarking of large language models for RNA secondary structure prediction	Oct 21, 2024	Benchmarking	CodeCode Available	1	5
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models	Oct 17, 2023	BenchmarkingLanguage Modelling	CodeCode Available	1	5
ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems	Mar 19, 2024	Benchmarkingfeature selection	CodeCode Available	1	5
AD-LLM: Benchmarking Large Language Models for Anomaly Detection	Dec 15, 2024	Anomaly DetectionBenchmarking	CodeCode Available	1	5
LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction	Oct 31, 2024	BenchmarkingPrediction	CodeCode Available	1	5
An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models	Mar 15, 2024	BenchmarkingDrug Discovery	CodeCode Available	1	5
Benchmarking Counterfactual Image Generation	Mar 29, 2024	BenchmarkingConditional Image Generation	CodeCode Available	1	5
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency	Apr 24, 2025	BenchmarkingMath	CodeCode Available	1	5
LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild	May 30, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness	Mar 24, 2025	BenchmarkingSemantic Segmentation	CodeCode Available	1	5
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition	Oct 24, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1	5
Benchmarking MRI Reconstruction Neural Networks on Large Public Datasets	Mar 6, 2020	BenchmarkingImage Reconstruction	CodeCode Available	1	5
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models	Jul 5, 2025	BenchmarkingGPU	CodeCode Available	1	5
ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry	Apr 1, 2023	3D Reconstruction3D Scene Reconstruction	CodeCode Available	1	5
Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective	Oct 8, 2024	AttributeBenchmarking	CodeCode Available	1	5
Benchmarking Data Science Agents	Feb 27, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models	Aug 28, 2024	BenchmarkingLogical Reasoning	CodeCode Available	1	5
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms	Nov 30, 2023	BenchmarkingOpenAI Gym	CodeCode Available	1	5
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models	Nov 27, 2024	BenchmarkingEarth Observation	CodeCode Available	1	5
A Closer Look at Mortality Risk Prediction from Electrocardiograms	Jun 24, 2024	BenchmarkingPrediction	CodeCode Available	1	5
MC-Blur: A Comprehensive Benchmark for Image Deblurring	Dec 1, 2021	BenchmarkingDeblurring	CodeCode Available	1	5
Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics	Aug 2, 2024	Adversarial AttackAdversarial Purification	CodeCode Available	1	5
Benchmarking Multidomain English-Indonesian Machine Translation	May 1, 2020	BenchmarkingMachine Translation	CodeCode Available	1	5
EntQA: Entity Linking as Question Answering	Oct 5, 2021	BenchmarkingEntity Linking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 37 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified