Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2101–2150 of 5548 papers

Title	Date	Tasks	Status
Benchmarking Model Predictive Control Algorithms in Building Optimization Testing Framework (BOPTEST)	Jan 31, 2023	BenchmarkingModel Predictive Control	—Unverified
A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches	Oct 12, 2023	BenchmarkingColorization	—Unverified
Exploration of TPUs for AI Applications	Sep 16, 2023	BenchmarkingEdge-computing	—Unverified
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation	May 30, 2025	BenchmarkingMachine Translation	—Unverified
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography	Apr 14, 2025	BenchmarkingVisual Reasoning	—Unverified
Analyzing the Effectiveness of Listwise Reranking with Positional Invariance on Temporal Generalizability	Jul 9, 2024	BenchmarkingDecoder	—Unverified
CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing	Jan 9, 2025	BenchmarkingChatbot	—Unverified
Call for Action: towards the next generation of symbolic regression benchmark	May 6, 2025	BenchmarkingDiversity	—Unverified
Benchmarking Agility and Reconfigurability in Satellite Systems for Tropical Cyclone Monitoring	Nov 27, 2024	BenchmarkingEarth Observation	—Unverified
A Data-Driven Method to Identify IBRs with Dominant Participation in Sub-Synchronous Oscillations	May 20, 2025	Benchmarking	—Unverified
Benchmarking Aggression Identification in Social Media	Aug 1, 2018	Aggression IdentificationBenchmarking	—Unverified
Calibrating chemical multisensory devices for real world applications: An in-depth comparison of quantitative Machine Learning approaches	Aug 30, 2017	Benchmarking	—Unverified
Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline	Aug 6, 2024	Benchmarking	—Unverified
Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift	Jul 12, 2025	BenchmarkingTransfer Learning	—Unverified
Analyzing the behaviour of D'WAVE quantum annealer: fine-tuning parameterization and tests with restrictive Hamiltonian formulations	Jul 1, 2022	BenchmarkingCombinatorial Optimization	—Unverified
Explicitly Multi-Modal Benchmarks for Multi-Objective Optimization	Oct 7, 2021	Benchmarking	—Unverified
Exploitation-Guided Exploration for Semantic Embodied Navigation	Nov 6, 2023	Benchmarking	—Unverified
Exploring and Benchmarking the Planning Capabilities of Large Language Models	Jun 18, 2024	BenchmarkingIn-Context Learning	—Unverified
Extensible Logging and Empirical Attainment Function for IOHexperimenter	Sep 28, 2021	Benchmarking	—Unverified
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations	Oct 2, 2024	BenchmarkingLong Form Question Answering	—Unverified
Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report	Oct 5, 2023	Benchmarking	—Unverified
CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods	Oct 10, 2023	BenchmarkingPrediction	—Unverified
Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic	Oct 23, 2023	BenchmarkingInstruction Following	—Unverified
Quantum Similarity Testing with Convolutional Neural Networks	Nov 3, 2022	Benchmarking	—Unverified
Explainable AI using expressive Boolean formulas	Jun 6, 2023	BenchmarkingExplainable Artificial Intelligence (XAI)	—Unverified
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering	Sep 13, 2024	BenchmarkingBinary Classification	—Unverified
Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks	Mar 15, 2024	Adversarial AttackAdversarial Robustness	—Unverified
Analyzing Hong Kong's Legal Judgments from a Computational Linguistics point-of-view	May 4, 2023	BenchmarkingGraph Generation	—Unverified
Benchmarking Adversarial Robustness of Compressed Deep Learning Models	Aug 16, 2023	Adversarial RobustnessBenchmarking	—Unverified
Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs	May 24, 2025	Benchmarking	—Unverified
A Benchmarking Protocol for Pansharpening: Dataset, Preprocessing, and Quality Assessment	Jun 7, 2021	BenchmarkingPansharpening	—Unverified
Benchmarking Adversarial Robustness	Dec 26, 2019	Adversarial AttackAdversarial Robustness	—Unverified
Experimenting with robotic intra-logistics domains	Apr 26, 2018	Benchmarkingvalid	—Unverified
Building benchmarking frameworks for supporting replicability and reproducibility: spatial and textual analysis as an example	Jul 4, 2020	BenchmarkingPosition	—Unverified
Experimental robustness benchmark of quantum neural network on a superconducting quantum processor	May 22, 2025	Adversarial AttackAdversarial Robustness	—Unverified
Benchmarking Adversarially Robust Quantum Machine Learning at Scale	Nov 23, 2022	Adversarial AttackAdversarial Attack Detection	—Unverified
Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite	May 24, 2023	Benchmarking	—Unverified
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists	Jun 2, 2025	BenchmarkingForm	—Unverified
Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP)	Oct 14, 2024	BenchmarkingMulti-Task Learning	—Unverified
Benchmarking adversarial attacks and defenses for time-series data	Aug 30, 2020	Adversarial DefenseBenchmarking	—Unverified
Analysis of different disparity estimation techniques on aerial stereo image datasets	Oct 9, 2024	BenchmarkingDepth Estimation	—Unverified
Building a De-identification System for Real Swedish Clinical Text Using Pseudonymised Clinical Text	Nov 1, 2019	BenchmarkingDe-identification	—Unverified
Building a continuous benchmarking ecosystem in bioinformatics	Sep 23, 2024	Benchmarking	—Unverified
Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches	Apr 22, 2024	BenchmarkingDiversity	—Unverified
Benchmarking Adaptive Intelligence and Computer Vision on Human-Robot Collaboration	Sep 30, 2024	BenchmarkingIntent Detection	—Unverified
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer	May 24, 2023	BenchmarkingCross-Lingual Transfer	—Unverified
AT-Drone: Benchmarking Adaptive Teaming in Multi-Drone Pursuit	Feb 13, 2025	BenchmarkingEdge-computing	—Unverified
BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes	Nov 11, 2024	BenchmarkingMulti-Object Tracking	—Unverified
Benchmarking Adaptative Variational Quantum Algorithms on QUBO Instances	Aug 3, 2023	Benchmarking	—Unverified
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark	Jun 4, 2018	BenchmarkingBIG-bench Machine Learning	—Unverified

Show:10 25 50

← PrevPage 43 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified