Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4101–4150 of 5548 papers

Title	Date	Tasks	Status
The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests	Sep 22, 2024	Benchmarking	—Unverified
The ACL RD-TEC: A Dataset for Benchmarking Terminology Extraction and Classification in Computational Linguistics	Aug 1, 2014	BenchmarkingGeneral Classification	—Unverified
The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking	Apr 22, 2024	BenchmarkingMisinformation	—Unverified
The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence	May 14, 2019	Benchmarkingspeech-recognition	—Unverified
Language Models as a Service: Overview of a New Paradigm and its Challenges	Sep 28, 2023	Benchmarking	—Unverified
The Benchmark Lottery	Jul 14, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified
The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI	Jun 1, 2023	BenchmarkingBrain Tumor Segmentation	—Unverified
The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal	Sep 12, 2024	BenchmarkingLanguage Modeling	—Unverified
The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework Approach	Apr 27, 2025	BenchmarkingDecision Making	—Unverified
The Curious Case of Integrator Reach Sets, Part I: Basic Theory	Feb 23, 2021	Benchmarking	—Unverified
The Design and Implementation of a Scalable DL Benchmarking Platform	Nov 19, 2019	Benchmarking	—Unverified
The Disagreement Problem in Faithfulness Metrics	Nov 13, 2023	BenchmarkingExplainable artificial intelligence	—Unverified
The DLV System for Knowledge Representation and Reasoning	Nov 4, 2002	Benchmarking	—Unverified
The Dota 2 Bot Competition	Mar 4, 2021	BenchmarkingDota 2	—Unverified
The Effect of Domain and Diacritics in Yoruba–English Neural Machine Translation	Aug 1, 2021	BenchmarkingMachine Translation	—Unverified
The EuroCity Persons Dataset: A Novel Benchmark for Object Detection	May 18, 2018	BenchmarkingObject	—Unverified
The Evolutionary Computation Methods No One Should Use	Jan 5, 2023	Benchmarking	—Unverified
The Expressive Power of Word Embeddings	Jan 15, 2013	BenchmarkingSentence	—Unverified
The Extractive-Abstractive Axis: Measuring Content "Borrowing" in Generative Language Models	Jul 20, 2023	Benchmarking	—Unverified
The FaceChannelS: Strike of the Sequences for the AffWild 2 Challenge	Oct 4, 2020	BenchmarkingBIG-bench Machine Learning	—Unverified
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input	Jan 6, 2025	BenchmarkingForm	—Unverified
The Forchheim Image Database for Camera Identification in the Wild	Nov 4, 2020	BenchmarkingFact Checking	—Unverified
The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech	Apr 17, 2021	Benchmarking	—Unverified
The Impact of Genomic Variation on Function (IGVF) Consortium	Jul 24, 2023	Benchmarking	—Unverified
The iNaturalist Sounds Dataset	May 31, 2025	Benchmarking	—Unverified
The Interactive Effects of Operators and Parameters to GA Performance Under Different Problem Sizes	Aug 1, 2015	Benchmarking	—Unverified
The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine	Sep 12, 2024	Autonomous DrivingBenchmarking	—Unverified
The Jungle of Generative Drug Discovery: Traps, Treasures, and Ways Out	Dec 24, 2024	BenchmarkingDeep Learning	—Unverified
The Karp Dataset	Jan 24, 2025	BenchmarkingMathematical Reasoning	—Unverified
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs	Oct 2, 2024	BenchmarkingHallucination	—Unverified
The Leaderboard Illusion	Apr 29, 2025	BenchmarkingChatbot	—Unverified
The Liouville Generator for Producing Integrable Expressions	Jun 17, 2024	Benchmarking	—Unverified
The Low Emission Oil&Gas Open (LEOGO) Reference Platform of an Off-Grid Energy System for Renewable Integration Studies	Aug 16, 2022	BenchmarkingManagement	—Unverified
The Moral Mind(s) of Large Language Models	Nov 19, 2024	BenchmarkingDecision Making	—Unverified
The Multi-speaker Multi-style Voice Cloning Challenge 2021	Apr 5, 2021	BenchmarkingVoice Cloning	—Unverified
The Neural Painter: Multi-Turn Image Generation	Jun 16, 2018	BenchmarkingConditional Image Generation	—Unverified
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects	Jun 1, 2023	BenchmarkingObject	—Unverified
Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests	Oct 31, 2023	Benchmarking	—Unverified
The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods	Nov 15, 2024	3D ReconstructionBenchmarking	—Unverified
The Paradox of Success in Evolutionary and Bioinspired Optimization: Revisiting Critical Issues, Key Studies, and Methodological Pathways	Jan 13, 2025	BenchmarkingMetaheuristic Optimization	—Unverified
The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering	Nov 15, 2024	BenchmarkingClustering	—Unverified
The Partial Response Network: a neural network nomogram	Aug 16, 2019	Additive modelsBenchmarking	—Unverified
The Pitfalls of Benchmarking in Algorithm Selection: What We Are Getting Wrong	May 12, 2025	Benchmarking	—Unverified
The Protein Engineering Tournament: An Open Science Benchmark for Protein Modeling and Design	Sep 18, 2023	Benchmarking	—Unverified
Thermal Image-based Fault Diagnosis in Induction Machines via Self-Organized Operational Neural Networks	Dec 8, 2024	BenchmarkingDiagnostic	—Unverified
The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search	Jul 17, 2019	BenchmarkingDiversity	—Unverified
The Russian practice of applying cluster approach in regional development	Jun 8, 2021	Benchmarking	—Unverified
The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection	Feb 27, 2024	Benchmarking	—Unverified
The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks	Sep 30, 2023	Benchmarking	—Unverified
The Trap of Presumed Equivalence: Artificial General Intelligence Should Not Be Assessed on the Scale of Human Intelligence	Oct 14, 2024	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 83 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified