Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4701–4750 of 5548 papers

Title	Date	Tasks	Status
Hard-Label Cryptanalytic Extraction of Neural Network Models	Sep 18, 2024	Benchmarking	CodeCode Available
Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces	May 31, 2023	BenchmarkingRecommendation Systems	CodeCode Available
Benchmarking Top-K Keyword and Top-K Document Processing with T^2K^2 and T^2K^2D^2	Apr 20, 2018	Benchmarking	CodeCode Available
HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios	Dec 21, 2024	Benchmarking	CodeCode Available
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators	May 28, 2025	BenchmarkingChatbot	CodeCode Available
MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks	May 6, 2025	BenchmarkingMultiple-choice	CodeCode Available
Benchmarking tools for a priori identifiability analysis	Jul 20, 2022	Benchmarking	CodeCode Available
MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book	Jun 1, 2025	Benchmarking	CodeCode Available
Benchmarking time series classification -- Functional data vs machine learning approaches	Nov 18, 2019	Additive modelsBenchmarking	CodeCode Available
Benchmarking the Robustness of UAV Tracking Against Common Corruptions	Mar 18, 2024	Benchmarking	CodeCode Available
Roughness Index and Roughness Distance for Benchmarking Medical Segmentation	Mar 23, 2021	BenchmarkingImage Segmentation	CodeCode Available
The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns	Feb 27, 2024	BenchmarkingBinary Classification	CodeCode Available
MEDFAIR: Benchmarking Fairness for Medical Imaging	Oct 4, 2022	BenchmarkingFairness	CodeCode Available
Benchmarking the Robustness of Optical Flow Estimation to Corruptions	Nov 22, 2024	Autonomous DrivingBenchmarking	CodeCode Available
Adaptive Power System Emergency Control using Deep Reinforcement Learning	Mar 9, 2019	BenchmarkingDeep Reinforcement Learning	CodeCode Available
Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch	Feb 20, 2022	Benchmarking	CodeCode Available
gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo	Mar 14, 2019	BenchmarkingOpenAI Gym	CodeCode Available
Benchmarking the Hooke-Jeeves Method, MTS-LS1, and BSrr on the Large-scale BBOB Function Set	Apr 28, 2022	Benchmarking	CodeCode Available
Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time Apps	Jan 8, 2019	BenchmarkingCPU	CodeCode Available
Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora	May 13, 2025	BenchmarkingDiagnostic	CodeCode Available
The KiTS19 Challenge Data: 300 Kidney Tumor Cases with Clinical Context, CT Semantic Segmentations, and Surgical Outcomes	Mar 31, 2019	BenchmarkingComputed Tomography (CT)	CodeCode Available
RTSeg: Real-time Semantic Segmentation Comparative Study	Mar 7, 2018	Autonomous DrivingBenchmarking	CodeCode Available
Meet Spinky: An Open-Source Spindle and K-Complex Detection Toolbox Validated on the Open-Access Montreal Archive of Sleep Studies (MASS).	Mar 2, 2017	BenchmarkingEEG	CodeCode Available
Benchmarking the Hill-Valley Evolutionary Algorithm for the GECCO 2018 Competition on Niching Methods Multimodal Optimization	Jun 30, 2018	Benchmarking	CodeCode Available
Grounded Intuition of GPT-Vision's Abilities with Scientific Images	Nov 3, 2023	Benchmarkingcounterfactual	CodeCode Available
GRATIS: GeneRAting TIme Series with diverse and controllable characteristics	Mar 7, 2019	BenchmarkingClustering	CodeCode Available
Understanding the World's Museums through Vision-Language Reasoning	Dec 2, 2024	BenchmarkingQuestion Answering	CodeCode Available
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models	Jun 16, 2024	Benchmarking	CodeCode Available
Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes Prosthesis	Mar 18, 2022	BenchmarkingObject Recognition	CodeCode Available
Benchmarking the Fairness of Image Upsampling Methods	Jan 24, 2024	BenchmarkingDiversity	CodeCode Available
Graph-theoretical approach to robust 3D normal extraction of LiDAR data	May 23, 2022	Benchmarking	CodeCode Available
A Modular Workflow for Performance Benchmarking of Neuronal Network Simulations	Dec 16, 2021	Benchmarking	CodeCode Available
Messing Up 3D Virtual Environments: Transferable Adversarial 3D Objects	Sep 17, 2021	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective	Dec 10, 2024	Benchmarking	CodeCode Available
Meta-Black-Box-Optimization through Offline Q-function Learning	May 4, 2025	BenchmarkingMamba	CodeCode Available
Learning Conjoint Attentions for Graph Neural Nets	Feb 5, 2021	BenchmarkingGraph Attention	CodeCode Available
Graph Convolutional Networks Meet with High Dimensionality Reduction	Nov 7, 2019	BenchmarkingDimensionality Reduction	CodeCode Available
Benchmarking the Attribution Quality of Vision Models	Jul 16, 2024	BenchmarkingExplainable Models	CodeCode Available
MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs	May 30, 2025	Benchmarking	CodeCode Available
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking	May 24, 2023	BenchmarkingGraph Mining	CodeCode Available
MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication	Jun 22, 2024	BenchmarkingMeta-Learning	CodeCode Available
S3Simulator: A benchmarking Side Scan Sonar Simulator dataset for Underwater Image Analysis	Aug 23, 2024	Benchmarking	CodeCode Available
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available
GOAL: Towards Benchmarking Few-Shot Sports Game Summarization	Jul 18, 2022	Benchmarking	CodeCode Available
SACoD: Sensor Algorithm Co-Design Towards Efficient CNN-powered Intelligent PhlatCam	Jan 1, 2021	BenchmarkingModel Compression	CodeCode Available
GNNMerge: Merging of GNN Models Without Accessing Training Data	Mar 5, 2025	BenchmarkingComputational Efficiency	CodeCode Available
Meta-survey on outlier and anomaly detection	Dec 12, 2023	Anomaly DetectionBenchmarking	CodeCode Available
The Legal Argument Reasoning Task in Civil Procedure	Nov 5, 2022	Benchmarking	CodeCode Available
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning	Jan 29, 2019	BenchmarkingDeep Learning	CodeCode Available
Safe Multi-Agent Navigation guided by Goal-Conditioned Safe Reinforcement Learning	Feb 25, 2025	BenchmarkingReinforcement Learning (RL)	CodeCode Available

Show:10 25 50

← PrevPage 95 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified