Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4451–4500 of 5548 papers

Title	Date	Tasks	Status
Large-scale Ridesharing DARP Instances Based on Real Travel Demand	May 30, 2023	Benchmarking	CodeCode Available
Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement	May 26, 2025	Benchmarking	CodeCode Available
JExplore: Design Space Exploration Tool for Nvidia Jetson Boards	Feb 16, 2025	BenchmarkingGPU	CodeCode Available
Anchor Points: Benchmarking Models with Much Fewer Examples	Sep 14, 2023	BenchmarkingLanguage Modeling	CodeCode Available
Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?	May 19, 2021	BenchmarkingSentence	CodeCode Available
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models	Sep 17, 2024	BenchmarkingBinary Classification	CodeCode Available
JATE 2.0: Java Automatic Term Extraction with Apache Solr	May 1, 2016	BenchmarkingTerm Extraction	CodeCode Available
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models	May 23, 2025	BenchmarkingDiversity	CodeCode Available
Calibrated Adaptive Probabilistic ODE Solvers	Dec 15, 2020	BenchmarkingDescriptive	CodeCode Available
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs	May 29, 2025	BenchmarkingFairness	CodeCode Available
Reinforcement Learning to Disentangle Multiqubit Quantum States from Partial Observations	Jun 12, 2024	BenchmarkingDeep Reinforcement Learning	CodeCode Available
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs	Apr 10, 2024	Benchmarkingknowledge editing	CodeCode Available
AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms	Oct 16, 2019	Bayesian InferenceBenchmarking	CodeCode Available
An Auditing Test To Detect Behavioral Shift in Language Models	Oct 25, 2024	BenchmarkingChange Detection	CodeCode Available
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods	Apr 29, 2024	BenchmarkingDrug Discovery	CodeCode Available
Learnability and Complexity of Quantum Samples	Oct 22, 2020	Benchmarking	CodeCode Available
Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural Networks	Feb 2, 2025	Benchmarking	CodeCode Available
Learned Sorted Table Search and Static Indexes in Small Model Space	Jul 19, 2021	BenchmarkingOpen-Ended Question Answering	CodeCode Available
Learn How to Query from Unlabeled Data Streams in Federated Learning	Dec 11, 2024	BenchmarkingDecision Making	CodeCode Available
Reinvestigating the R2 Indicator: Achieving Pareto Compliance by Integration	Jul 1, 2024	Benchmarking	CodeCode Available
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking	Jul 30, 2018	Benchmarkingfeature selection	CodeCode Available
Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo	Oct 1, 2019	Benchmarking	CodeCode Available
Adjusting Pretrained Backbones for Performativity	Oct 6, 2024	BenchmarkingDeep Learning	CodeCode Available
Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence Constraints	Nov 25, 2020	BenchmarkingScheduling	CodeCode Available
Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk	May 21, 2019	Bayesian InferenceBenchmarking	CodeCode Available
REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching	Jul 16, 2024	Benchmarking	CodeCode Available
Learning collective multi-cellular dynamics from temporal scRNA-seq via a transformer-enhanced Neural SDE	May 22, 2025	BenchmarkingTime Series	CodeCode Available
Using representation balancing to learn conditional-average dose responses from clustered data	Sep 7, 2023	BenchmarkingCausal Inference	CodeCode Available
Beemo: Benchmark of Expert-edited Machine-generated Outputs	Nov 6, 2024	Benchmarking	CodeCode Available
B-XAIC Dataset: Benchmarking Explainable AI for Graph Neural Networks Using Chemical Data	May 28, 2025	BenchmarkingDrug Discovery	CodeCode Available
Building Conformal Prediction Intervals with Approximate Message Passing	Oct 21, 2024	BenchmarkingConformal Prediction	CodeCode Available
Learning Dynamic Selection and Pricing of Out-of-Home Deliveries	Nov 23, 2023	BenchmarkingDecision Making	CodeCode Available
UAV Trajectory Planning for Data Collection from Time-Constrained IoT Devices	Sep 17, 2019	BenchmarkingTrajectory Planning	CodeCode Available
Learning from Integral Losses in Physics Informed Neural Networks	May 27, 2023	Benchmarking	CodeCode Available
Removing Geometric Bias in One-Class Anomaly Detection with Adaptive Feature Perturbation	Mar 7, 2025	Anomaly DetectionBenchmarking	CodeCode Available
The Arcade Learning Environment: An Evaluation Platform for General Agents	Jul 19, 2012	Atari GamesBenchmarking	CodeCode Available
Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting	May 7, 2021	BenchmarkingDeep Learning	CodeCode Available
Learning protein constitutive motifs from sequence data	Mar 23, 2018	BenchmarkingSpecificity	CodeCode Available
Learning Quantum Processes with Quantum Statistical Queries	Oct 3, 2023	BenchmarkingCryptanalysis	CodeCode Available
ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical Images	Oct 22, 2024	BenchmarkingSelf-Supervised Learning	CodeCode Available
UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions	Jun 18, 2024	BenchmarkingMultiple-choice	CodeCode Available
BED: Bi-Encoder-Based Detectors for Out-of-Distribution Detection	Jun 15, 2023	BenchmarkingOut-of-Distribution Detection	CodeCode Available
Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia	Nov 2, 2023	BenchmarkingMachine Translation	CodeCode Available
RUHSNet: 3D Object Detection Using Lidar Data in Real Time	May 9, 2020	3D Object DetectionAutonomous Vehicles	CodeCode Available
Replication Study and Benchmarking of Real-Time Object Detection Models	May 11, 2024	Benchmarkingobject-detection	CodeCode Available
IPC: A Benchmark Data Set for Learning with Graph-Structured Data	May 15, 2019	BenchmarkingGraph Classification	CodeCode Available
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content	Jun 17, 2024	BenchmarkingGeneral Knowledge	CodeCode Available
Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark	May 9, 2016	BenchmarkingEmotion Recognition	CodeCode Available
IoT Data Trust Evaluation via Machine Learning	Aug 15, 2023	BenchmarkingTime Series	CodeCode Available
Representation Learning of Limit Order Book: A Comprehensive Study and Benchmarking	May 4, 2025	BenchmarkingRepresentation Learning	CodeCode Available

Show:10 25 50

← PrevPage 90 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified