Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2301–2350 of 5548 papers

Title	Date	Tasks	Status	Score
HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction	Jul 9, 2024	Benchmarking	CodeCode Available	5
Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients	Jul 17, 2023	BenchmarkingGPU	CodeCode Available	5
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs	Apr 7, 2025	BenchmarkingFairness	CodeCode Available	5
Harmonization Benchmarking Tool for Neuroimaging Datasets	Nov 15, 2022	BenchmarkingDiffusion MRI	CodeCode Available	5
Harnessing Orthogonality to Train Low-Rank Neural Networks	Jan 16, 2024	Benchmarking	CodeCode Available	5
HATE-ITA: New Baselines for Hate Speech Detection in Italian	Jul 1, 2022	BenchmarkingHate Speech Detection	CodeCode Available	5
gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo	Mar 14, 2019	BenchmarkingOpenAI Gym	CodeCode Available	5
HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios	Dec 21, 2024	Benchmarking	CodeCode Available	5
Improving Sequential Recommendation Models with an Enhanced Loss Function	Jan 3, 2023	BenchmarkingRecommendation Systems	CodeCode Available	5
Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time Apps	Jan 8, 2019	BenchmarkingCPU	CodeCode Available	5
Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained models	Feb 21, 2025	BenchmarkingDiagnostic	CodeCode Available	5
Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces	May 31, 2023	BenchmarkingRecommendation Systems	CodeCode Available	5
Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models	Feb 28, 2024	BenchmarkingHallucination	CodeCode Available	5
Benchmarking Long-tail Generalization with Likelihood Splits	Oct 13, 2022	BenchmarkingLanguage Modeling	CodeCode Available	5
Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora	May 13, 2025	BenchmarkingDiagnostic	CodeCode Available	5
Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN Performance	Jun 4, 2024	BenchmarkingDrug Discovery	CodeCode Available	5
Grounded Intuition of GPT-Vision's Abilities with Scientific Images	Nov 3, 2023	Benchmarkingcounterfactual	CodeCode Available	5
Hard-Label Cryptanalytic Extraction of Neural Network Models	Sep 18, 2024	Benchmarking	CodeCode Available	5
Graph-theoretical approach to robust 3D normal extraction of LiDAR data	May 23, 2022	Benchmarking	CodeCode Available	5
Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere	Mar 27, 2019	Benchmarking	CodeCode Available	5
Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes Prosthesis	Mar 18, 2022	BenchmarkingObject Recognition	CodeCode Available	5
ECBD: Evidence-Centered Benchmark Design for NLP	Jun 13, 2024	Benchmarking	CodeCode Available	5
Benchmarking LLMs' Judgments with No Gold Standard	Nov 11, 2024	BenchmarkingMachine Translation	CodeCode Available	5
Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024)	Dec 2, 2024	BenchmarkingHigh-Level Synthesis	CodeCode Available	5
Benchmarking Machine Translation with Cultural Awareness	May 23, 2023	BenchmarkingIn-Context Learning	CodeCode Available	5
EmProx: Neural Network Performance Estimation For Neural Architecture Search	Jun 13, 2022	BenchmarkingDecoder	CodeCode Available	5
GRATIS: GeneRAting TIme Series with diverse and controllable characteristics	Mar 7, 2019	BenchmarkingClustering	CodeCode Available	5
Learning Conjoint Attentions for Graph Neural Nets	Feb 5, 2021	BenchmarkingGraph Attention	CodeCode Available	5
A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models	Apr 28, 2022	BenchmarkingDiversity	CodeCode Available	5
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum Disorder	Sep 20, 2023	BenchmarkingClustering	CodeCode Available	5
A Review of Testing Object-Based Environment Perception for Safe Automated Driving	Feb 16, 2021	BenchmarkingSensor Modeling	CodeCode Available	5
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective	Dec 10, 2024	Benchmarking	CodeCode Available	5
Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique	Dec 6, 2023	BenchmarkingKnowledge Graphs	CodeCode Available	5
DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning	Mar 9, 2025	BenchmarkingDecision Making	CodeCode Available	5
Hardware Aware Neural Network Architectures using FbNet	Jun 17, 2019	BenchmarkingNeural Architecture Search	CodeCode Available	5
HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction	Jun 25, 2025	BenchmarkingPerson Identification	CodeCode Available	5
DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems	Jun 8, 2023	BenchmarkingDescriptive	CodeCode Available	5
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available	5
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking	May 24, 2023	BenchmarkingGraph Mining	CodeCode Available	5
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program	Apr 9, 2025	Benchmarking	CodeCode Available	5
Effective Stabilized Self-Training on Few-Labeled Graph Data	Oct 7, 2019	BenchmarkingModel Selection	CodeCode Available	5
Enhancing Biomedical Knowledge Discovery for Diseases: An Open-Source Framework Applied on Rett Syndrome and Alzheimer's Disease	Jul 18, 2024	Benchmarking	CodeCode Available	5
GOAL: Towards Benchmarking Few-Shot Sports Game Summarization	Jul 18, 2022	Benchmarking	CodeCode Available	5
GNNMerge: Merging of GNN Models Without Accessing Training Data	Mar 5, 2025	BenchmarkingComputational Efficiency	CodeCode Available	5
A Deep Reinforcement Learning Framework for Dynamic Portfolio Optimization: Evidence from China's Stock Market	Dec 24, 2024	BenchmarkingDecision Making	CodeCode Available	5
Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural Networks	Jan 7, 2024	BenchmarkingGraph Neural Network	CodeCode Available	5
Enhancing Hyper-To-Real Space Projections Through Euclidean Norm Meta-Heuristic Optimization	Jan 31, 2023	Benchmarking	CodeCode Available	5
Geological Inference from Textual Data using Word Embeddings	Apr 10, 2025	BenchmarkingWord Embeddings	CodeCode Available	5
GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search	Jan 26, 2025	BenchmarkingDiversity	CodeCode Available	5
Benchmarking LLM-based Relevance Judgment Methods	Apr 17, 2025	BenchmarkingOpen-Domain Question Answering	CodeCode Available	5

Show:10 25 50

← PrevPage 47 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified