Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4451–4475 of 5548 papers

Title	Date	Tasks	Status
Large-scale Ridesharing DARP Instances Based on Real Travel Demand	May 30, 2023	Benchmarking	CodeCode Available
Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement	May 26, 2025	Benchmarking	CodeCode Available
JExplore: Design Space Exploration Tool for Nvidia Jetson Boards	Feb 16, 2025	BenchmarkingGPU	CodeCode Available
Anchor Points: Benchmarking Models with Much Fewer Examples	Sep 14, 2023	BenchmarkingLanguage Modeling	CodeCode Available
Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?	May 19, 2021	BenchmarkingSentence	CodeCode Available
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models	Sep 17, 2024	BenchmarkingBinary Classification	CodeCode Available
JATE 2.0: Java Automatic Term Extraction with Apache Solr	May 1, 2016	BenchmarkingTerm Extraction	CodeCode Available
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models	May 23, 2025	BenchmarkingDiversity	CodeCode Available
Calibrated Adaptive Probabilistic ODE Solvers	Dec 15, 2020	BenchmarkingDescriptive	CodeCode Available
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs	May 29, 2025	BenchmarkingFairness	CodeCode Available
Reinforcement Learning to Disentangle Multiqubit Quantum States from Partial Observations	Jun 12, 2024	BenchmarkingDeep Reinforcement Learning	CodeCode Available
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs	Apr 10, 2024	Benchmarkingknowledge editing	CodeCode Available
AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms	Oct 16, 2019	Bayesian InferenceBenchmarking	CodeCode Available
An Auditing Test To Detect Behavioral Shift in Language Models	Oct 25, 2024	BenchmarkingChange Detection	CodeCode Available
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods	Apr 29, 2024	BenchmarkingDrug Discovery	CodeCode Available
Learnability and Complexity of Quantum Samples	Oct 22, 2020	Benchmarking	CodeCode Available
Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural Networks	Feb 2, 2025	Benchmarking	CodeCode Available
Learned Sorted Table Search and Static Indexes in Small Model Space	Jul 19, 2021	BenchmarkingOpen-Ended Question Answering	CodeCode Available
Learn How to Query from Unlabeled Data Streams in Federated Learning	Dec 11, 2024	BenchmarkingDecision Making	CodeCode Available
Reinvestigating the R2 Indicator: Achieving Pareto Compliance by Integration	Jul 1, 2024	Benchmarking	CodeCode Available
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking	Jul 30, 2018	Benchmarkingfeature selection	CodeCode Available
Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo	Oct 1, 2019	Benchmarking	CodeCode Available
Adjusting Pretrained Backbones for Performativity	Oct 6, 2024	BenchmarkingDeep Learning	CodeCode Available
Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence Constraints	Nov 25, 2020	BenchmarkingScheduling	CodeCode Available
Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk	May 21, 2019	Bayesian InferenceBenchmarking	CodeCode Available

Show:10 25 50

← PrevPage 179 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified