Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3001–3050 of 5548 papers

Title	Date	Tasks	Status
Benchmarking the Gerchberg-Saxton Algorithm	May 18, 2020	Benchmarking	—Unverified
Benchmarking the Fidelity and Utility of Synthetic Relational Data	Oct 4, 2024	BenchmarkingFeature Importance	—Unverified
Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web	May 1, 2014	BenchmarkingEntity Linking	—Unverified
ImageNet performance correlates with pose estimation robustness and generalization on out-of-domain data	Jul 17, 2020	Animal Pose EstimationBenchmarking	—Unverified
ImagePairs: Realistic Super Resolution Dataset via Beam Splitter Camera Rig	Apr 18, 2020	BenchmarkingBIG-bench Machine Learning	—Unverified
Imagining and building wise machines: The centrality of AI metacognition	Nov 4, 2024	BenchmarkingNavigate	—Unverified
Benchmarking the Effectiveness of Classification Algorithms and SVM Kernels for Dry Beans	Jul 15, 2023	BenchmarkingDimensionality Reduction	—Unverified
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World	Dec 5, 2023	BenchmarkingDiversity	—Unverified
Imitation Learning Datasets: A Toolkit For Creating Datasets, Training Agents and Benchmarking	Mar 1, 2024	BenchmarkingImitation Learning	—Unverified
Imitation Learning from Pixel Observations for Continuous Control	Sep 29, 2021	Benchmarkingcontinuous-control	—Unverified
Practical Guidelines for Cell Segmentation Models Under Optical Aberrations in Microscopy	Apr 12, 2024	BenchmarkingCell Segmentation	—Unverified
A Functional Analysis Approach to Symbolic Regression	Feb 9, 2024	Benchmarkingregression	—Unverified
Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors	Aug 15, 2024	BenchmarkingManagement	—Unverified
A Framework for Large Scale Synthetic Graph Dataset Generation	Oct 4, 2022	BenchmarkingDataset Generation	—Unverified
Dataset Properties Shape the Success of Neuroimaging-Based Patient Stratification: A Benchmarking Analysis Across Clustering Algorithms	Mar 15, 2025	BenchmarkingBrain Morphometry	—Unverified
A Framework for Evaluating Predictive Models Using Synthetic Image Covariates and Longitudinal Data	Oct 21, 2024	Benchmarking	—Unverified
Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems	Feb 12, 2024	Benchmarking	—Unverified
Implementing and Benchmarking the Locally Competitive Algorithm on the Loihi 2 Neuromorphic Processor	Jul 25, 2023	BenchmarkingCPU	—Unverified
Implementing hosting capacity analysis in distribution networks: Practical considerations, advancements and future directions	Dec 11, 2023	BenchmarkingCapacity Estimation	—Unverified
Benchmarking the Benchmark -- Analysis of Synthetic NIDS Datasets	Apr 19, 2021	BenchmarkingIntrusion Detection	—Unverified
Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities	Jan 22, 2025	BenchmarkingReferring Expression	—Unverified
Benchmarking the Accuracy and Robustness of Feedback Alignment Algorithms	Aug 30, 2021	Benchmarking	—Unverified
Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels	Oct 5, 2024	Benchmarking	—Unverified
The Moral Mind(s) of Large Language Models	Nov 19, 2024	BenchmarkingDecision Making	—Unverified
Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices	Mar 21, 2022	BenchmarkingGPU	—Unverified
Ward: Provable RAG Dataset Inference via LLM Watermarks	Oct 4, 2024	BenchmarkingRAG	—Unverified
The Multi-speaker Multi-style Voice Cloning Challenge 2021	Apr 5, 2021	BenchmarkingVoice Cloning	—Unverified
PAWS-VMK: A Unified Approach To Semi-Supervised Learning And Out-of-Distribution Detection	Nov 28, 2023	Benchmarkingimage-classification	—Unverified
Improved statistical benchmarking of digital pathology models using pairwise frames evaluation	Jun 7, 2023	BenchmarkingClassification	—Unverified
The Neural Painter: Multi-Turn Image Generation	Jun 16, 2018	BenchmarkingConditional Image Generation	—Unverified
Improved YOLOv12 with LLM-Generated Synthetic Data for Enhanced Apple Detection and Benchmarking Against YOLOv11 and YOLOv10	Feb 26, 2025	Benchmarkingobject-detection	—Unverified
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects	Jun 1, 2023	BenchmarkingObject	—Unverified
A 28-nm Convolutional Neuromorphic Processor Enabling Online Learning with Spike-Based Retinas	May 13, 2020	BenchmarkingEdge-computing	—Unverified
Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests	Oct 31, 2023	Benchmarking	—Unverified
Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation	Feb 9, 2024	6D Pose Estimation using RGBBenchmarking	—Unverified
Benchmarking terminology building capabilities of ChatGPT on an English-Russian Fashion Corpus	Dec 4, 2024	Benchmarking	—Unverified
Improving Augmentation and Evaluation Schemes for Semantic Image Synthesis	Nov 25, 2020	BenchmarkingData Augmentation	—Unverified
Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary	Jun 20, 2024	BenchmarkingIn-Context Learning	—Unverified
Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model	Nov 1, 2024	BenchmarkingCross-Domain Named Entity Recognition	—Unverified
The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods	Nov 15, 2024	3D ReconstructionBenchmarking	—Unverified
Improving Items and Contexts Understanding with Descriptive Graph for Conversational Recommendation	Apr 11, 2023	BenchmarkingConversational Recommendation	—Unverified
Improving Medical Image Classification with Label Noise Using Dual-uncertainty Estimation	Feb 28, 2021	BenchmarkingGeneral Classification	—Unverified
Improving Model Generalization: A Chinese Named Entity Recognition Case Study	Aug 1, 2021	BenchmarkingChinese Named Entity Recognition	—Unverified
Improving Named Entity Linking Corpora Quality	Sep 1, 2019	BenchmarkingEntity Linking	—Unverified
Improving plant disease classification by adaptive minimal ensembling	Sep 8, 2022	BenchmarkingClassification	—Unverified
The Paradox of Success in Evolutionary and Bioinspired Optimization: Revisiting Critical Issues, Key Studies, and Methodological Pathways	Jan 13, 2025	BenchmarkingMetaheuristic Optimization	—Unverified
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards	Jun 25, 2023	BenchmarkingContrastive Learning	—Unverified
Improving seasonal forecast using probabilistic deep learning	Oct 27, 2020	BenchmarkingDeep Learning	—Unverified
The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering	Nov 15, 2024	BenchmarkingClustering	—Unverified
Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework	Jun 14, 2024	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 61 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified