Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4551–4600 of 5548 papers

Title	Date	Tasks	Status
Salient Object Detection: A Benchmark	Jan 5, 2015	BenchmarkingObject	—Unverified
SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models	May 24, 2025	BenchmarkingVideo Grounding	—Unverified
SAM-based instance segmentation models for the automation of structural damage detection	Jan 27, 2024	BenchmarkingInstance Segmentation	—Unverified
A Real Benchmark Swell Noise Dataset for Performing Seismic Data Denoising via Deep Learning	Oct 2, 2024	BenchmarkingDenoising	—Unverified
Use of Deep Neural Networks for Uncertain Stress Functions with Extensions to Impact Mechanics	Nov 3, 2023	Benchmarkingquantile regression	—Unverified
Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection	Sep 29, 2023	BenchmarkingDiversity	—Unverified
SASSE: Scalable and Adaptable 6-DOF Pose Estimation	Feb 5, 2019	BenchmarkingPose Estimation	—Unverified
SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas	May 20, 2025	BenchmarkingLogical Reasoning	—Unverified
Wildfire Forecasting with Satellite Images and Deep Generative Model	Aug 19, 2022	BenchmarkingVideo Prediction	—Unverified
User Profile with Large Language Models: Construction, Updating, and Benchmarking	Feb 15, 2025	BenchmarkingProfile Generation	—Unverified
SAWNet: A Spatially Aware Deep Neural Network for 3D Point Cloud Processing	May 18, 2019	BenchmarkingScene Segmentation	—Unverified
Scaffold Splits Overestimate Virtual Screening Performance	Jun 2, 2024	BenchmarkingClustering	—Unverified
Scalable and Customizable Benchmark Problems for Many-Objective Optimization	Jan 26, 2020	BenchmarkingPosition	—Unverified
Scalable and Hybrid Ensemble-Based Causality Discovery	Dec 24, 2020	BenchmarkingDistributed Computing	—Unverified
ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection	Oct 17, 2020	BenchmarkingFact Checking	—Unverified
Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency	Apr 26, 2023	BenchmarkingCloud Computing	—Unverified
ARBiBench: Benchmarking Adversarial Robustness of Binarized Neural Networks	Dec 21, 2023	Adversarial RobustnessBenchmarking	—Unverified
AraSTEM: A Native Arabic Multiple Choice Question Benchmark for Evaluating LLMs Knowledge In STEM Subjects	Dec 31, 2024	BenchmarkingMultiple-choice	—Unverified
Scalable Psychological Momentum Forecasting in Esports	Jan 30, 2020	Benchmarking	—Unverified
Using Affine Combinations of BBOB Problems for Performance Assessment	Mar 8, 2023	Benchmarking	—Unverified
Using generative adversarial networks to synthesize artificial financial datasets	Feb 6, 2020	Benchmarking	—Unverified
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis	Aug 27, 2024	BenchmarkingLarge Language Model	—Unverified
Using Multi-Temporal Sentinel-1 and Sentinel-2 data for water bodies mapping	Jan 5, 2024	Benchmarking	—Unverified
Automated Coding of Communications in Collaborative Problem-solving Tasks Using ChatGPT	Nov 15, 2024	Benchmarking	—Unverified
Using Neural Architecture Search for Improving Software Flaw Detection in Multimodal Deep Learning Models	Sep 22, 2020	BenchmarkingBIG-bench Machine Learning	—Unverified
AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP	Jun 10, 2025	BenchmarkingSentiment Analysis	—Unverified
ScanNeRF: a Scalable Benchmark for Neural Radiance Fields	Nov 24, 2022	BenchmarkingNeRF	—Unverified
SCBench: A Sports Commentary Benchmark for Video LLMs	Dec 23, 2024	Benchmarking	—Unverified
AraBench: Benchmarking Dialectal Arabic-English Machine Translation	Dec 1, 2020	BenchmarkingData Augmentation	—Unverified
Using PCA to Efficiently Represent State Spaces	May 2, 2015	BenchmarkingDimensionality Reduction	—Unverified
Scenarios and Approaches for Situated Natural Language Explanations	Jun 7, 2024	BenchmarkingIn-Context Learning	—Unverified
A quantitative method for benchmarking fair income distribution	Feb 2, 2022	Benchmarking	—Unverified
A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy from Monocular Endoscopic Video	Oct 22, 2023	3D ReconstructionAnatomy	—Unverified
ScholarSearch: Benchmarking Scholar Searching Ability of LLMs	Jun 11, 2025	BenchmarkingInformation Retrieval	—Unverified
Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures	Aug 15, 2018	Benchmarking	—Unverified
A Probabilistic Framework for Lexicon-based Keyword Spotting in Handwritten Text Images	Apr 9, 2021	BenchmarkingKeyword Spotting	—Unverified
A PRISMA Driven Systematic Review of Publicly Available Datasets for Benchmark and Model Developments for Industrial Defect Detection	Jun 11, 2024	BenchmarkingDefect Detection	—Unverified
SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement	Sep 28, 2024	BenchmarkingCode Generation	—Unverified
Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers	Feb 25, 2025	ArticlesBenchmarking	—Unverified
Scientific Machine Learning Benchmarks	Oct 25, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified
Using Well-Understood Single-Objective Functions in Multiobjective Black-Box Optimization Test Suites	Apr 1, 2016	BenchmarkingMultiobjective Optimization	—Unverified
uTHCD: A New Benchmarking for Tamil Handwritten OCR	Mar 13, 2021	BenchmarkingOptical Character Recognition (OCR)	—Unverified
A practical generalization metric for deep networks benchmarking	Sep 2, 2024	BenchmarkingDiversity	—Unverified
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models	Mar 12, 2025	BenchmarkingFairness	—Unverified
Approaches for benchmarking single-cell gene regulatory network inference methods	Jul 17, 2023	Benchmarking	—Unverified
Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models	Jun 6, 2023	BenchmarkingEthics	—Unverified
Applications in CityLearn Gym Environment for Multi-Objective Control Benchmarking in Grid-Interactive Buildings and Districts	Aug 27, 2024	BenchmarkingModel Predictive Control	—Unverified
Application of Machine Learning for Online Reputation Systems	Sep 10, 2022	BenchmarkingRecommendation Systems	—Unverified
Utility-Optimized Synthesis of Differentially Private Location Traces	Sep 14, 2020	Bayesian OptimizationBenchmarking	—Unverified
scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection	Jun 25, 2025	BenchmarkingContrastive Learning	—Unverified

Show:10 25 50

← PrevPage 92 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified