Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1226–1250 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning	Feb 20, 2024	Atomic number classificationBenchmarking	CodeCode Available	1	5
HINT3: Raising the bar for Intent Detection in the Wild	Sep 29, 2020	BenchmarkingIntent Detection	CodeCode Available	1	5
CIDEr: Consensus-based Image Description Evaluation	Nov 20, 2014	Action RecognitionAttribute	CodeCode Available	1	5
Histo-Genomic Knowledge Distillation For Cancer Prognosis From Histopathology Whole Slide Images	Mar 15, 2024	BenchmarkingKnowledge Distillation	CodeCode Available	1	5
Large Scale MRI Collection and Segmentation of Cirrhotic Liver	Oct 6, 2024	BenchmarkingDiagnostic	CodeCode Available	1	5
Benchmarking Large Language Models for Automated Verilog RTL Code Generation	Dec 13, 2022	BenchmarkingCode Generation	CodeCode Available	1	5
Hierarchical graph neural nets can capture long-range interactions	Jul 15, 2021	BenchmarkingMolecular Property Prediction	CodeCode Available	1	5
Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection	Jun 28, 2023	BenchmarkingData Augmentation	CodeCode Available	1	5
A Reinforcement Learning Environment for Multi-Service UAV-enabled Wireless Systems	May 11, 2021	BenchmarkingEdge-computing	CodeCode Available	1	5
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation	Dec 12, 2023	Anomaly DetectionAutonomous Driving	CodeCode Available	1	5
Benchmarking Language Models for Code Syntax Understanding	Oct 26, 2022	Benchmarking	CodeCode Available	1	5
LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild	May 30, 2024	Benchmarking	CodeCode Available	1	5
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction	Nov 16, 2023	BenchmarkingEvent Extraction	CodeCode Available	1	5
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM	Nov 26, 2024	BenchmarkingText-to-Video Generation	CodeCode Available	1	5
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models	Nov 29, 2021	BenchmarkingPhysical Simulations	CodeCode Available	1	5
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care	Sep 16, 2022	BenchmarkingDeep Learning	CodeCode Available	1	5
HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis	Feb 13, 2021	BenchmarkingClustering	CodeCode Available	1	5
Benchmarking Language Model Creativity: A Case Study on Code Generation	Jul 12, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
CLoG: Benchmarking Continual Learning of Image Generation Models	Jun 7, 2024	BenchmarkingContinual Learning	CodeCode Available	1	5
Clinical Prompt Learning with Frozen Language Models	May 11, 2022	BenchmarkingGPU	CodeCode Available	1	5
Benchmarking structure-based three-dimensional molecular generative models using GenBench3D: ligand conformation quality matters	Jul 5, 2024	Benchmarkingvalid	CodeCode Available	1	5
HazeSpace2M: A Dataset for Haze Aware Single Image Dehazing	Sep 25, 2024	BenchmarkingImage Dehazing	CodeCode Available	1	5
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning	Jul 22, 2024	BenchmarkingHallucination	CodeCode Available	1	5
Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency	Jun 14, 2024	Benchmarking	CodeCode Available	1	5
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns	Jan 28, 2025	Adversarial AttackBenchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 50 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified