Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5526–5548 of 5548 papers

Title	Date	Tasks	Status
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval	Aug 4, 2023	BenchmarkingInformation Retrieval	CodeCode Available
RCP-Bench: Benchmarking Robustness for Collaborative Perception Under Diverse Corruptions	Jan 1, 2025	Benchmarking	CodeCode Available
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models	Oct 7, 2024	BenchmarkingSegmentation	CodeCode Available
RDF-star2Vec: RDF-star Graph Embeddings for Data Mining	Dec 25, 2023	BenchmarkingGraph Embedding	CodeCode Available
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding	Mar 22, 2025	BenchmarkingObject	CodeCode Available
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines	Apr 2, 2021	Benchmarking	CodeCode Available
Characterizing SLAM Benchmarks and Methods for the Robust Perception Age	May 19, 2019	Benchmarking	CodeCode Available
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge	Apr 10, 2025	Adversarial RobustnessBenchmarking	CodeCode Available
Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese	May 28, 2025	Benchmarking	CodeCode Available
Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPs	Apr 21, 2022	BenchmarkingChange Point Detection	CodeCode Available
TuringQ: Benchmarking AI Comprehension in Theory of Computation	Oct 9, 2024	Benchmarking	CodeCode Available
An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations	Aug 26, 2019	BenchmarkingClustering	CodeCode Available
TweetNERD -- End to End Entity Linking Benchmark for Tweets	Oct 14, 2022	BenchmarkingEntity Linking	CodeCode Available
Real-time cryo-EM data pre-processing with Warp	Jun 14, 2018	BenchmarkingImage Reconstruction	CodeCode Available
Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine-Learning Applied to Large-Sample Datasets	Jul 19, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available
TextClass Benchmark: A Continuous Elo Rating of LLMs in Social Sciences	Nov 30, 2024	BenchmarkingClassification	CodeCode Available
Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence	Apr 10, 2023	Benchmarkingspeech-recognition	CodeCode Available
Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective	May 28, 2025	BenchmarkingMemorization	CodeCode Available
An Efficient Two-stage Gradient Boosting Framework for Short-term Traffic State Estimation	Feb 21, 2023	BenchmarkingState Estimation	CodeCode Available
ACCORD: Closing the Commonsense Measurability Gap	Jun 4, 2024	BenchmarkingCommon Sense Reasoning	CodeCode Available
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions	Jul 28, 2017	Autonomous VehiclesBenchmarking	CodeCode Available
Benchmark Generation Framework with Customizable Distortions for Image Classifier Robustness	Oct 28, 2023	Benchmarkingimage-classification	CodeCode Available
Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black Box	Mar 4, 2022	Benchmarkingcounterfactual	CodeCode Available

Show:10 25 50

← PrevPage 222 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified