Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5101–5125 of 5548 papers

Title	Date	Tasks	Status
Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning	Jan 22, 2025	Benchmarking	CodeCode Available
Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks	Jan 5, 2025	Adversarial RobustnessBenchmarking	CodeCode Available
Benchmarking LLM-based Relevance Judgment Methods	Apr 17, 2025	BenchmarkingOpen-Domain Question Answering	CodeCode Available
Toward 3D Object Reconstruction from Stereo Images	Oct 18, 2019	3D Object ReconstructionBenchmarking	CodeCode Available
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models	Jun 8, 2023	BenchmarkingFairness	CodeCode Available
Skelite: Compact Neural Networks for Efficient Iterative Skeletonization	Mar 10, 2025	BenchmarkingComputational Efficiency	CodeCode Available
Divergent Creativity in Humans and Large Language Models	May 13, 2024	Benchmarking	CodeCode Available
A Kernel-Based Approach for Accurate Steady-State Detection in Performance Time Series	Jun 4, 2025	BenchmarkingIrregular Time Series	CodeCode Available
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric	Jan 22, 2021	BenchmarkingSentence	CodeCode Available
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems	Oct 8, 2023	Benchmarking	CodeCode Available
User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks	Aug 9, 2018	BenchmarkingColorization	CodeCode Available
Towards a Benchmark for Large Language Models for Business Process Management Tasks	Oct 4, 2024	BenchmarkingManagement	CodeCode Available
Weighting-Based Treatment Effect Estimation via Distribution Learning	Dec 26, 2020	Benchmarking	CodeCode Available
Slot Filling for Extracting Reskilling and Upskilling Options from the Web	Jul 11, 2022	BenchmarkingEntity Linking	CodeCode Available
On Pitfalls of RemOve-And-Retrain: Data Processing Inequality Perspective	Apr 26, 2023	BenchmarkingFeature Importance	CodeCode Available
Distributional Depth-Based Estimation of Object Articulation Models	Aug 12, 2021	BenchmarkingObject	CodeCode Available
Benchmarking Linguistic Diversity of Large Language Models	Dec 13, 2024	BenchmarkingDiversity	CodeCode Available
On Recurrent Neural Networks for Sequence-based Processing in Communications	May 24, 2019	BenchmarkingDecoder	CodeCode Available
Benchmarking Learning Efficiency in Deep Reservoir Computing	Sep 29, 2022	Benchmarking	CodeCode Available
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation	Apr 21, 2025	Benchmarking	CodeCode Available
Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer Sections	Nov 16, 2024	BenchmarkingDiagnostic	CodeCode Available
Benchmarking Large Language Model Uncertainty for Prompt Optimization	Sep 16, 2024	BenchmarkingDiversity	CodeCode Available
Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets	May 23, 2022	Argument MiningBenchmarking	CodeCode Available
On the Evaluation Consistency of Attribution-based Explanations	Jul 28, 2024	Benchmarking	CodeCode Available
On the Evaluation of Conditional GANs	Jul 11, 2019	BenchmarkingDiversity	CodeCode Available

Show:10 25 50

← PrevPage 205 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified