Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1651–1700 of 5548 papers

Title	Date	Tasks	Status	Score
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction	Jul 3, 2025	Benchmarking	CodeCode Available	5
Laparoscopic Image Desmoking Using the U-Net with New Loss Function and Integrated Differentiable Wiener Filter	May 27, 2025	Benchmarking	CodeCode Available	5
Benchmarking a transformer-FREE model for ad-hoc retrieval	Apr 1, 2021	BenchmarkingCPU	CodeCode Available	5
Benchmarking Approximate Inference Methods for Neural Structured Prediction	Apr 1, 2019	BenchmarkingPrediction	CodeCode Available	5
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methods	Jul 21, 2023	Benchmarking	CodeCode Available	5
Language-based Image Colorization: A Benchmark and Beyond	Mar 19, 2025	BenchmarkingColorization	CodeCode Available	5
Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification	Sep 21, 2022	BenchmarkingManagement	CodeCode Available	5
a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification	Mar 3, 2024	BenchmarkingSpeaker Verification	CodeCode Available	5
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision Transformers	Mar 31, 2023	Benchmarkingimage-classification	CodeCode Available	5
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval	Aug 4, 2023	BenchmarkingInformation Retrieval	CodeCode Available	5
Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System	Jul 28, 2023	Anomaly DetectionAutonomous Driving	CodeCode Available	5
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regions	Nov 19, 2023	Bayesian OptimizationBenchmarking	CodeCode Available	5
Benchmarking and Understanding Compositional Relational Reasoning of LLMs	Dec 17, 2024	BenchmarkingRelational Reasoning	CodeCode Available	5
Characterizing SLAM Benchmarks and Methods for the Robust Perception Age	May 19, 2019	Benchmarking	CodeCode Available	5
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense Scenarios	Mar 8, 2025	BenchmarkingDiagnostic	CodeCode Available	5
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue Systems	Jun 1, 2021	BenchmarkingGoal-Oriented Dialogue Systems	CodeCode Available	5
Benchmarking and Rethinking Knowledge Editing for Large Language Models	May 24, 2025	Benchmarkingknowledge editing	CodeCode Available	5
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals	Jun 7, 2023	BenchmarkingMachine Reading Comprehension	CodeCode Available	5
Knowledge Enhanced Conditional Imputation for Healthcare Time-series	Dec 27, 2023	BenchmarkingImputation	CodeCode Available	5
KhabarChin: Automatic Detection of Important News in the Persian Language	Dec 6, 2023	ArticlesBenchmarking	CodeCode Available	5
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines	Apr 2, 2021	Benchmarking	CodeCode Available	5
Benchmarking and optimizing organism wide single-cell RNA alignment methods	Mar 26, 2025	BenchmarkingDecoder	CodeCode Available	5
Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPs	Apr 21, 2022	BenchmarkingChange Point Detection	CodeCode Available	5
An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations	Aug 26, 2019	BenchmarkingClustering	CodeCode Available	5
A Dataset for Web-Scale Knowledge Base Population	Jun 3, 2018	BenchmarkingKnowledge Base Population	CodeCode Available	5
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-Zen	Mar 3, 2022	Benchmarking	CodeCode Available	5
Benchmarking and Improving Text-to-SQL Generation under Ambiguity	Oct 20, 2023	BenchmarkingDiversity	CodeCode Available	5
An Efficient Two-stage Gradient Boosting Framework for Short-term Traffic State Estimation	Feb 21, 2023	BenchmarkingState Estimation	CodeCode Available	5
Joint Multi-Scale Tone Mapping and Denoising for HDR Image Enhancement	Mar 16, 2023	BenchmarkingDemosaicking	CodeCode Available	5
KArSL: Arabic Sign Language Database	Jan 1, 2021	BenchmarkingSign Language Recognition	CodeCode Available	5
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models	May 23, 2025	BenchmarkingDiversity	CodeCode Available	5
A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches	May 22, 2023	BenchmarkingClassification	CodeCode Available	5
JATE 2.0: Java Automatic Term Extraction with Apache Solr	May 1, 2016	BenchmarkingTerm Extraction	CodeCode Available	5
Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence	Apr 10, 2023	Benchmarkingspeech-recognition	CodeCode Available	5
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs	Apr 10, 2024	Benchmarkingknowledge editing	CodeCode Available	5
CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines	Jun 20, 2024	BenchmarkingDecision Making	CodeCode Available	5
Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation	Apr 5, 2024	AttributeBenchmarking	CodeCode Available	5
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs	May 29, 2025	BenchmarkingFairness	CodeCode Available	5
JExplore: Design Space Exploration Tool for Nvidia Jetson Boards	Feb 16, 2025	BenchmarkingGPU	CodeCode Available	5
Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese	May 28, 2025	Benchmarking	CodeCode Available	5
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering	May 21, 2025	BenchmarkingLanguage Modeling	CodeCode Available	5
Large-scale Ridesharing DARP Instances Based on Real Travel Demand	May 30, 2023	Benchmarking	CodeCode Available	5
IPC: A Benchmark Data Set for Learning with Graph-Structured Data	May 15, 2019	BenchmarkingGraph Classification	CodeCode Available	5
ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical Images	Oct 22, 2024	BenchmarkingSelf-Supervised Learning	CodeCode Available	5
IOLBENCH: Benchmarking LLMs on Linguistic Reasoning	Jan 8, 2025	Benchmarking	CodeCode Available	5
A Benchmarking Study of Vision-based Robotic Grasping Algorithms	Mar 14, 2025	BenchmarkingRobotic Grasping	CodeCode Available	5
IoT Data Trust Evaluation via Machine Learning	Aug 15, 2023	BenchmarkingTime Series	CodeCode Available	5
Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic Environments	Apr 16, 2025	BenchmarkingCausal Inference	CodeCode Available	5
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs	May 26, 2025	BenchmarkingFault localization	CodeCode Available	5
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series	Nov 21, 2024	Anomaly DetectionBenchmarking	CodeCode Available	5

Show:10 25 50

← PrevPage 34 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified