Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2301–2325 of 5548 papers

Title	Date	Tasks	Status	Score
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs	Apr 7, 2025	BenchmarkingFairness	CodeCode Available	5
Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time Apps	Jan 8, 2019	BenchmarkingCPU	CodeCode Available	5
Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces	May 31, 2023	BenchmarkingRecommendation Systems	CodeCode Available	5
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios	Jun 11, 2025	Action RecognitionAction Segmentation	CodeCode Available	5
Grounded Intuition of GPT-Vision's Abilities with Scientific Images	Nov 3, 2023	Benchmarkingcounterfactual	CodeCode Available	5
GRATIS: GeneRAting TIme Series with diverse and controllable characteristics	Mar 7, 2019	BenchmarkingClustering	CodeCode Available	5
Improving Sequential Recommendation Models with an Enhanced Loss Function	Jan 3, 2023	BenchmarkingRecommendation Systems	CodeCode Available	5
Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained models	Feb 21, 2025	BenchmarkingDiagnostic	CodeCode Available	5
Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes Prosthesis	Mar 18, 2022	BenchmarkingObject Recognition	CodeCode Available	5
Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora	May 13, 2025	BenchmarkingDiagnostic	CodeCode Available	5
Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models	Feb 28, 2024	BenchmarkingHallucination	CodeCode Available	5
Benchmarking Long-tail Generalization with Likelihood Splits	Oct 13, 2022	BenchmarkingLanguage Modeling	CodeCode Available	5
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective	Dec 10, 2024	Benchmarking	CodeCode Available	5
Learning Conjoint Attentions for Graph Neural Nets	Feb 5, 2021	BenchmarkingGraph Attention	CodeCode Available	5
Graph-theoretical approach to robust 3D normal extraction of LiDAR data	May 23, 2022	Benchmarking	CodeCode Available	5
HRNET: AI on Edge for mask detection and social distancing	Nov 30, 2021	BenchmarkingEdge-computing	CodeCode Available	5
Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere	Mar 27, 2019	Benchmarking	CodeCode Available	5
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking	May 24, 2023	BenchmarkingGraph Mining	CodeCode Available	5
ECBD: Evidence-Centered Benchmark Design for NLP	Jun 13, 2024	Benchmarking	CodeCode Available	5
Benchmarking LLMs' Judgments with No Gold Standard	Nov 11, 2024	BenchmarkingMachine Translation	CodeCode Available	5
Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024)	Dec 2, 2024	BenchmarkingHigh-Level Synthesis	CodeCode Available	5
A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models	Apr 28, 2022	BenchmarkingDiversity	CodeCode Available	5
GOAL: Towards Benchmarking Few-Shot Sports Game Summarization	Jul 18, 2022	Benchmarking	CodeCode Available	5
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum Disorder	Sep 20, 2023	BenchmarkingClustering	CodeCode Available	5
A Review of Testing Object-Based Environment Perception for Safe Automated Driving	Feb 16, 2021	BenchmarkingSensor Modeling	CodeCode Available	5

Show:10 25 50

← PrevPage 93 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified