Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1701–1750 of 5548 papers

Title	Date	Tasks	Status	Score
QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules	Jun 20, 2024	Benchmarking	CodeCode Available	5
Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification	Sep 21, 2022	BenchmarkingManagement	CodeCode Available	5
Learn How to Query from Unlabeled Data Streams in Federated Learning	Dec 11, 2024	BenchmarkingDecision Making	CodeCode Available	5
Light Field Saliency Detection with Deep Convolutional Networks	Jun 19, 2019	BenchmarkingSaliency Detection	CodeCode Available	5
Machine learning classification of non-Markovian noise disturbing quantum dynamics	Jan 8, 2021	BenchmarkingBIG-bench Machine Learning	CodeCode Available	5
KhabarChin: Automatic Detection of Important News in the Persian Language	Dec 6, 2023	ArticlesBenchmarking	CodeCode Available	5
A Benchmarking Study of Vision-based Robotic Grasping Algorithms	Mar 14, 2025	BenchmarkingRobotic Grasping	CodeCode Available	5
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals	Jun 7, 2023	BenchmarkingMachine Reading Comprehension	CodeCode Available	5
Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic Environments	Apr 16, 2025	BenchmarkingCausal Inference	CodeCode Available	5
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs	May 26, 2025	BenchmarkingFault localization	CodeCode Available	5
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series	Nov 21, 2024	Anomaly DetectionBenchmarking	CodeCode Available	5
KArSL: Arabic Sign Language Database	Jan 1, 2021	BenchmarkingSign Language Recognition	CodeCode Available	5
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning	May 19, 2025	Benchmarking	CodeCode Available	5
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-Zen	Mar 3, 2022	Benchmarking	CodeCode Available	5
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering	May 21, 2025	BenchmarkingLanguage Modeling	CodeCode Available	5
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue Systems	Jun 1, 2021	BenchmarkingGoal-Oriented Dialogue Systems	CodeCode Available	5
Joint Multi-Scale Tone Mapping and Denoising for HDR Image Enhancement	Mar 16, 2023	BenchmarkingDemosaicking	CodeCode Available	5
Anchor Points: Benchmarking Models with Much Fewer Examples	Sep 14, 2023	BenchmarkingLanguage Modeling	CodeCode Available	5
An Auditing Test To Detect Behavioral Shift in Language Models	Oct 25, 2024	BenchmarkingChange Detection	CodeCode Available	5
VitaGraph: Building a Knowledge Graph for Biologically Relevant Learning Tasks	May 16, 2025	BenchmarkingLink Prediction	CodeCode Available	5
JExplore: Design Space Exploration Tool for Nvidia Jetson Boards	Feb 16, 2025	BenchmarkingGPU	CodeCode Available	5
Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy	Aug 9, 2024	BenchmarkingMedical Image Analysis	CodeCode Available	5
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs	May 29, 2025	BenchmarkingFairness	CodeCode Available	5
Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing	Nov 1, 2024	BenchmarkingSemantic Segmentation	CodeCode Available	5
An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science	Feb 23, 2025	BenchmarkingCode Generation	CodeCode Available	5
Benchmarking AutoML algorithms on a collection of synthetic classification problems	Dec 6, 2022	AutoMLBenchmarking	CodeCode Available	5
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models	May 23, 2025	BenchmarkingDiversity	CodeCode Available	5
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study	Feb 11, 2024	Anomaly DetectionBenchmarking	CodeCode Available	5
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs	Apr 10, 2024	Benchmarkingknowledge editing	CodeCode Available	5
JATE 2.0: Java Automatic Term Extraction with Apache Solr	May 1, 2016	BenchmarkingTerm Extraction	CodeCode Available	5
Knowledge Enhanced Conditional Imputation for Healthcare Time-series	Dec 27, 2023	BenchmarkingImputation	CodeCode Available	5
IoT Data Trust Evaluation via Machine Learning	Aug 15, 2023	BenchmarkingTime Series	CodeCode Available	5
IPC: A Benchmark Data Set for Learning with Graph-Structured Data	May 15, 2019	BenchmarkingGraph Classification	CodeCode Available	5
Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-Bench	Apr 1, 2025	Benchmarking	CodeCode Available	5
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions	Oct 18, 2023	BenchmarkingVisual Grounding	CodeCode Available	5
IOLBENCH: Benchmarking LLMs on Linguistic Reasoning	Jan 8, 2025	Benchmarking	CodeCode Available	5
ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical Images	Oct 22, 2024	BenchmarkingSelf-Supervised Learning	CodeCode Available	5
Inverse Contextual Bandits: Learning How Behavior Evolves over Time	Jul 13, 2021	BenchmarkingDecision Making	CodeCode Available	5
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance	Sep 22, 2024	AutoMLBenchmarking	CodeCode Available	5
Can geometric combinatorics improve RNA branching predictions?	Mar 26, 2025	Benchmarking	CodeCode Available	5
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM	Oct 8, 2014	Benchmarking	CodeCode Available	5
Air Learning: A Deep Reinforcement Learning Gym for Autonomous Aerial Robot Visual Navigation	Jun 2, 2019	BenchmarkingDeep Reinforcement Learning	CodeCode Available	5
Can a single neuron learn predictive uncertainty?	Jun 7, 2021	BenchmarkingConformal Prediction	CodeCode Available	5
Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim Evidence Reasoning	Jun 9, 2025	BenchmarkingDiagnostic	CodeCode Available	5
Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models	Mar 11, 2025	BenchmarkingHyperparameter Optimization	CodeCode Available	5
Integrating Expert Knowledge into Logical Programs via LLMs	Feb 17, 2025	BenchmarkingLogical Reasoning	CodeCode Available	5
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available	5
Analyzing the Feature Extractor Networks for Face Image Synthesis	Jun 4, 2024	BenchmarkingImage Generation	CodeCode Available	5
InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition	Dec 23, 2021	BenchmarkingDeep Learning	CodeCode Available	5
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model	Jul 31, 2024	BenchmarkingLarge Language Model	CodeCode Available	5

Show:10 25 50

← PrevPage 35 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified