Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4751–4800 of 5548 papers

Title	Date	Tasks	Status
Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural Networks	Jan 7, 2024	BenchmarkingGraph Neural Network	CodeCode Available
GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search	Jan 26, 2025	BenchmarkingDiversity	CodeCode Available
Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties	Feb 24, 2025	Benchmarking	CodeCode Available
Safe Trajectory Generation for Complex Urban Environments Using Spatio-temporal Semantic Corridor	Jun 24, 2019	Autonomous VehiclesBenchmarking	CodeCode Available
Natural Image Noise Dataset	Jun 1, 2019	BenchmarkingDenoising	CodeCode Available
Benchmarking Suite for Synthetic Aperture Radar Imagery Anomaly Detection (SARIAD) Algorithms	Apr 10, 2025	Anomaly DetectionBenchmarking	CodeCode Available
SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration	Sep 17, 2024	Benchmarkingcounterfactual	CodeCode Available
Geological Inference from Textual Data using Word Embeddings	Apr 10, 2025	BenchmarkingWord Embeddings	CodeCode Available
Flexible Generation of Preference Data for Recommendation Analysis	Jul 23, 2024	BenchmarkingRecommendation Systems	CodeCode Available
AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels	Oct 26, 2024	BenchmarkingInformation Retrieval	CodeCode Available
MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages	Mar 3, 2025	Benchmarking	CodeCode Available
The LOCATA Challenge: Acoustic Source Localization and Tracking	Sep 3, 2019	BenchmarkingSound Source Localization	CodeCode Available
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider	Apr 26, 2025	BenchmarkingGPU	CodeCode Available
A Meta-Analysis of the Anomaly Detection Problem	Mar 3, 2015	Anomaly DetectionBenchmarking	CodeCode Available
On the Measure of Intelligence	Nov 5, 2019	ARCBenchmarking	CodeCode Available
Generalization and Regularization in DQN	Sep 29, 2018	Atari GamesBenchmarking	CodeCode Available
Automatic Resolution of Domain Name Disputes	Nov 1, 2021	Benchmarking	CodeCode Available
Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AI	Jun 13, 2025	BenchmarkingIn-Context Learning	CodeCode Available
Automatic benchmarking of large multimodal models via iterative experiment programming	Jun 18, 2024	BenchmarkingLanguage Modeling	CodeCode Available
GenderBench: Evaluation Suite for Gender Biases in LLMs	May 17, 2025	Benchmarking	CodeCode Available
MineRL: A Large-Scale Dataset of Minecraft Demonstrations	Jul 29, 2019	BenchmarkingDeep Reinforcement Learning	CodeCode Available
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data	Feb 22, 2024	Benchmarking	CodeCode Available
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations	Jun 17, 2024	BenchmarkingDataset Generation	CodeCode Available
Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling	Mar 24, 2025	BenchmarkingOpenAI Gym	CodeCode Available
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma	Oct 4, 2023	BenchmarkingSegmentation	CodeCode Available
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding	Sep 10, 2024	BenchmarkingLanguage Modeling	CodeCode Available
Mirage: Model-Agnostic Graph Distillation for Graph Classification	Oct 14, 2023	BenchmarkingClassification	CodeCode Available
Benchmarking Subset Selection from Large Candidate Solution Sets in Evolutionary Multi-objective Optimization	Jan 18, 2022	Benchmarking	CodeCode Available
Sanity Simulations for Saliency Methods	May 13, 2021	Benchmarking	CodeCode Available
From Variability to Stability: Advancing RecSys Benchmarking Practices	Feb 15, 2024	BenchmarkingCollaborative Filtering	CodeCode Available
ALTIS: Modernizing GPGPU Benchmarking	Jun 25, 2019	BenchmarkingGPU	CodeCode Available
From raw affiliations to organization identifiers	May 12, 2025	BenchmarkingMetadata quality	CodeCode Available
Automated Text-to-Table for Reasoning-Intensive Table QA: Pipeline Design and Benchmarking Insights	May 26, 2025	BenchmarkingQuestion Answering	CodeCode Available
3D Face Reconstruction Error Decomposed: A Modular Benchmark for Fair and Fast Method Evaluation	May 23, 2025	3D Face ReconstructionBenchmarking	CodeCode Available
MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning	Dec 24, 2024	Benchmarking	CodeCode Available
From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code Repositories	Apr 23, 2025	Benchmarking	CodeCode Available
The Multiple Subnetwork Hypothesis: Enabling Multidomain Learning by Isolating Task-Specific Subnetworks in Feedforward Neural Networks	Jul 18, 2022	Benchmarking	CodeCode Available
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models	Apr 7, 2024	Benchmarkingknowledge editing	CodeCode Available
SATBench: Benchmarking the speed-accuracy tradeoff in object recognition by humans and dynamic neural networks	Jun 16, 2022	BenchmarkingDynamic neural networks	CodeCode Available
MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library Scenarios	Jun 15, 2025	Benchmarking	CodeCode Available
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum Learning	Mar 16, 2023	BenchmarkingContinual Learning	CodeCode Available
SAWEC: Sensing-Assisted Wireless Edge Computing	Feb 15, 2024	BenchmarkingEdge-computing	CodeCode Available
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering	May 11, 2025	BenchmarkingGeneral Knowledge	CodeCode Available
Vote'n'Rank: Revision of Benchmarking with Social Choice Theory	Oct 11, 2022	BenchmarkingResult aggregation	CodeCode Available
AlphaZip: Neural Network-Enhanced Lossless Text Compression	Sep 23, 2024	BenchmarkingData Compression	CodeCode Available
ML-Net: multi-label classification of biomedical texts with deep neural networks	Nov 13, 2018	BenchmarkingClassification	CodeCode Available
From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in Histopathology	Apr 11, 2022	BenchmarkingCancer Classification	CodeCode Available
mlOSP: Towards a Unified Implementation of Regression Monte Carlo Algorithms	Dec 1, 2020	BenchmarkingBIG-bench Machine Learning	CodeCode Available
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation	Apr 14, 2024	BenchmarkingDiversity	CodeCode Available
MLPerf Inference Benchmark	Nov 6, 2019	Benchmarking	CodeCode Available

Show:10 25 50

← PrevPage 96 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified