Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2551–2600 of 5548 papers

Title	Date	Tasks	Status	Score
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider	Apr 26, 2025	BenchmarkingGPU	CodeCode Available	5
Benchmarking Intersectional Biases in NLP	Jul 1, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available	5
DFEE: Interactive DataFlow Execution and Evaluation Kit	Dec 4, 2022	BenchmarkingScheduling	CodeCode Available	5
A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild	Jun 11, 2025	Age EstimationBenchmarking	CodeCode Available	5
Graph-theoretical approach to robust 3D normal extraction of LiDAR data	May 23, 2022	Benchmarking	CodeCode Available	5
Benchmarking Commercial Intent Detection Services with Practice-Driven Evaluations	Dec 7, 2020	BenchmarkingGoal-Oriented Dialog	CodeCode Available	5
GenderBench: Evaluation Suite for Gender Biases in LLMs	May 17, 2025	Benchmarking	CodeCode Available	5
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations	Jun 17, 2024	BenchmarkingDataset Generation	CodeCode Available	5
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data	Feb 22, 2024	Benchmarking	CodeCode Available	5
Generalization and Regularization in DQN	Sep 29, 2018	Atari GamesBenchmarking	CodeCode Available	5
Arabic Speech Recognition by End-to-End, Modular Systems and Human	Jan 21, 2021	Arabic Speech RecognitionAutomatic Speech Recognition	CodeCode Available	5
Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings	Apr 4, 2025	Benchmarking	CodeCode Available	5
Recognizing Object Affordances to Support Scene Reasoning for Manipulation Tasks	Sep 12, 2019	Affordance DetectionAffordance Recognition	CodeCode Available	5
Detecting critical treatment effect bias in small subgroups	Apr 29, 2024	BenchmarkingDecision Making	CodeCode Available	5
From Variability to Stability: Advancing RecSys Benchmarking Practices	Feb 15, 2024	BenchmarkingCollaborative Filtering	CodeCode Available	5
Benchmarking Image Perturbations for Testing Automated Driving Assistance Systems	Jan 21, 2025	Autonomous VehiclesBenchmarking	CodeCode Available	5
From raw affiliations to organization identifiers	May 12, 2025	BenchmarkingMetadata quality	CodeCode Available	5
Affine Non-negative Collaborative Representation Based Pattern Classification	Jul 10, 2020	BenchmarkingClassification	CodeCode Available	5
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design	Oct 23, 2023	BenchmarkingImage Generation	CodeCode Available	5
From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code Repositories	Apr 23, 2025	Benchmarking	CodeCode Available	5
Design and implementation of intelligent packet filtering in IoT microcontroller-based devices	May 30, 2023	Benchmarking	CodeCode Available	5
Accurate Peak Detection in Multimodal Optimization via Approximated Landscape Learning	Mar 23, 2025	Benchmarking	CodeCode Available	5
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation	Apr 14, 2024	BenchmarkingDiversity	CodeCode Available	5
A quantum-classical reinforcement learning model to play Atari games	Dec 11, 2024	Atari GamesBenchmarking	CodeCode Available	5
From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in Histopathology	Apr 11, 2022	BenchmarkingCancer Classification	CodeCode Available	5
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering	May 11, 2025	BenchmarkingGeneral Knowledge	CodeCode Available	5
Dermatological Diagnosis Explainability Benchmark for Convolutional Neural Networks	Feb 23, 2023	BenchmarkingMedical Diagnosis	CodeCode Available	5
Benchmarking Human and Automated Prompting in the Segment Anything Model	Oct 29, 2024	BenchmarkingImage Segmentation	CodeCode Available	5
Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms	Apr 19, 2023	BenchmarkingDescriptive	CodeCode Available	5
Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtyping	Jun 23, 2025	BenchmarkingDiversity	CodeCode Available	5
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum Learning	Mar 16, 2023	BenchmarkingContinual Learning	CodeCode Available	5
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma	Oct 4, 2023	BenchmarkingSegmentation	CodeCode Available	5
Benchmarking HillVallEA for the GECCO 2019 Competition on Multimodal Optimization	Jul 25, 2019	Benchmarking	CodeCode Available	5
Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark	Jun 14, 2025	BenchmarkingGraph Learning	CodeCode Available	5
Benchmarking Hierarchical Script Knowledge	Jun 1, 2019	Benchmarking	CodeCode Available	5
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering	May 27, 2025	BenchmarkingQuestion Answering	CodeCode Available	5
Delta-Influence: Unlearning Poisons via Influence Functions	Nov 20, 2024	AttributeBenchmarking	CodeCode Available	5
Forecasting time series with constraints	Feb 14, 2025	Additive modelsBenchmarking	CodeCode Available	5
FHBench: Towards Efficient and Personalized Federated Learning for Multimodal Healthcare	Apr 15, 2025	BenchmarkingDiagnostic	CodeCode Available	5
Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming	Jul 17, 2019	Autonomous DrivingBenchmarking	CodeCode Available	5
Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling	Nov 21, 2024	ArticlesBenchmarking	CodeCode Available	5
Aesthetic Image Captioning From Weakly-Labelled Photographs	Aug 29, 2019	Aesthetic Image CaptioningBenchmarking	CodeCode Available	5
Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation Difficulty	Nov 5, 2020	Adversarial AttackBenchmarking	CodeCode Available	5
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation	Jun 13, 2024	BenchmarkingHallucination	CodeCode Available	5
Forecasting Across Time Series Databases using Recurrent Neural Networks on Groups of Similar Series: A Clustering Approach	Oct 9, 2017	BenchmarkingClustering	CodeCode Available	5
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN Parameters	Sep 8, 2022	Benchmarkingcontinuous-control	CodeCode Available	5
Fluorescence Reference Target Quantitative Analysis Library	Apr 22, 2025	Benchmarking	CodeCode Available	5
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0	Aug 23, 2023	Benchmarkingregression	CodeCode Available	5
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem	Mar 6, 2024	BenchmarkingHallucination	CodeCode Available	5
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification	Jan 14, 2025	BenchmarkingGraph Representation Learning	CodeCode Available	5

Show:10 25 50

← PrevPage 52 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified