Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1301–1350 of 5548 papers

Title	Date	Tasks	Status	Hype
Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling	Nov 21, 2024	ArticlesBenchmarking	CodeCode Available	0
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series	Nov 21, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0
Multi-Agent Environments for Vehicle Routing Problems	Nov 21, 2024	Benchmarkingreinforcement-learning	CodeCode Available	1
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking	Nov 20, 2024	BenchmarkingLanguage Modeling	—Unverified	0
Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver	Nov 20, 2024	Benchmarking	—Unverified	0
Delta-Influence: Unlearning Poisons via Influence Functions	Nov 20, 2024	AttributeBenchmarking	CodeCode Available	0
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models	Nov 20, 2024	BenchmarkingImage Generation	CodeCode Available	5
BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation	Nov 20, 2024	BenchmarkingPoint Cloud Segmentation	—Unverified	0
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games	Nov 20, 2024	BenchmarkingNetHack	—Unverified	0
The Moral Mind(s) of Large Language Models	Nov 19, 2024	BenchmarkingDecision Making	—Unverified	0
Integrating Dynamic Correlation Shifts and Weighted Benchmarking in Extreme Value Analysis	Nov 19, 2024	Benchmarking	—Unverified	0
Benchmarking Positional Encodings for GNNs and Graph Transformers	Nov 19, 2024	Benchmarking	CodeCode Available	0
DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models	Nov 19, 2024	BenchmarkingDeep Learning	CodeCode Available	1
Introducing Milabench: Benchmarking Accelerators for AI	Nov 18, 2024	BenchmarkingDeep Learning	CodeCode Available	1
Benchmarking pre-trained text embedding models in aligning built asset information	Nov 18, 2024	Asset ManagementBenchmarking	CodeCode Available	0
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts	Nov 18, 2024	BenchmarkingMultimodal Large Language Model	CodeCode Available	0
Reinforcing Competitive Multi-Agents for Playing So Long Sucker	Nov 17, 2024	BenchmarkingDeep Reinforcement Learning	—Unverified	0
Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies	Nov 17, 2024	Benchmarking	—Unverified	0
Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML	Nov 17, 2024	BenchmarkingFairness	—Unverified	0
FastDraft: How to Train Your Draft	Nov 17, 2024	BenchmarkingCode Completion	—Unverified	0
Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer Sections	Nov 16, 2024	BenchmarkingDiagnostic	CodeCode Available	0
The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods	Nov 15, 2024	3D ReconstructionBenchmarking	—Unverified	0
The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering	Nov 15, 2024	BenchmarkingClustering	—Unverified	0
Automated Coding of Communications in Collaborative Problem-solving Tasks Using ChatGPT	Nov 15, 2024	Benchmarking	—Unverified	0
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level	Nov 15, 2024	Benchmarkingcounterfactual	—Unverified	0
WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking	Nov 14, 2024	BenchmarkingDrug Discovery	—Unverified	0
A survey of probabilistic generative frameworks for molecular simulations	Nov 14, 2024	BenchmarkingDenoising	CodeCode Available	0
Caravan MultiMet: Extending Caravan with Multiple Weather Nowcasts and Forecasts	Nov 14, 2024	Benchmarking	CodeCode Available	3
BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation	Nov 14, 2024	Adversarial AttackAdversarial Robustness	CodeCode Available	0
Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset	Nov 13, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0
A Survey on Vision Autoregressive Model	Nov 13, 2024	3D GenerationBenchmarking	—Unverified	0
HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere	Nov 13, 2024	BenchmarkingDataset Generation	—Unverified	0
FM-TS: Flow Matching for Time Series Generation	Nov 12, 2024	BenchmarkingImputation	CodeCode Available	1
Evaluating the Generation of Spatial Relations in Text and Image Generative Models	Nov 12, 2024	BenchmarkingImage Generation	—Unverified	0
Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation	Nov 11, 2024	16kBenchmarking	CodeCode Available	0
BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes	Nov 11, 2024	BenchmarkingMulti-Object Tracking	—Unverified	0
General Geospatial Inference with a Population Dynamics Foundation Model	Nov 11, 2024	BenchmarkingGraph Neural Network	CodeCode Available	3
Benchmarking LLMs' Judgments with No Gold Standard	Nov 11, 2024	BenchmarkingMachine Translation	CodeCode Available	0
Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantification	Nov 11, 2024	BenchmarkingImage Segmentation	CodeCode Available	1
MolMiner: Towards Controllable, 3D-Aware, Fragment-Based Molecular Design	Nov 10, 2024	3D geometryBenchmarking	—Unverified	0
Low Dynamic Range for RIS-aided Bistatic Integrated Sensing and Communication	Nov 9, 2024	BenchmarkingIntegrated sensing and communication	—Unverified	0
Benchmarking 3D multi-coil NC-PDNet MRI reconstruction	Nov 8, 2024	3D ReconstructionBenchmarking	—Unverified	0
FactLens: Benchmarking Fine-Grained Fact Verification	Nov 8, 2024	BenchmarkingFact Verification	—Unverified	0
Open-set object detection: towards unified problem formulation and benchmarking	Nov 8, 2024	Autonomous DrivingBenchmarking	—Unverified	0
Benchmarking Distributional Alignment of Large Language Models	Nov 8, 2024	Benchmarking	CodeCode Available	0
A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics	Nov 8, 2024	Benchmarking	—Unverified	0
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding	Nov 7, 2024	BenchmarkingMultiple-choice	—Unverified	0
Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale	Nov 7, 2024	Active LearningBenchmarking	—Unverified	0
Deep Learning Models for UAV-Assisted Bridge Inspection: A YOLO Benchmark Analysis	Nov 7, 2024	BenchmarkingModel Selection	—Unverified	0
HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images	Nov 7, 2024	AnatomyBenchmarking	—Unverified	0

Show:10 25 50

← PrevPage 27 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified