Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 701–750 of 5548 papers

Title	Date	Tasks	Status	Hype
SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution	Jun 13, 2024	BenchmarkingImage Super-Resolution	CodeCode Available	1
Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark	Jun 12, 2024	BenchmarkingMixture-of-Experts	CodeCode Available	1
Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark Framework	Jun 12, 2024	BenchmarkingCausal Inference	CodeCode Available	1
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation	Jun 12, 2024	BenchmarkingImage Generation	CodeCode Available	1
RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly Detection	Jun 11, 2024	Anomaly DetectionBenchmarking	CodeCode Available	1
AudioMarkBench: Benchmarking Robustness of Audio Watermarking	Jun 11, 2024	Benchmarkingtext-to-speech	CodeCode Available	1
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models	Jun 9, 2024	Benchmarking	CodeCode Available	1
Smiles2Dock: an open large-scale multi-task dataset for ML-based molecular docking	Jun 9, 2024	BenchmarkingDrug Discovery	CodeCode Available	1
ICU-Sepsis: A Benchmark MDP Built from Real Medical Data	Jun 9, 2024	BenchmarkingManagement	CodeCode Available	1
QGEval: Benchmarking Multi-dimensional Evaluation for Question Generation	Jun 9, 2024	BenchmarkingQuestion Generation	CodeCode Available	1
CLoG: Benchmarking Continual Learning of Image Generation Models	Jun 7, 2024	BenchmarkingContinual Learning	CodeCode Available	1
CattleFace-RGBT: RGB-T Cattle Facial Landmark Benchmark	Jun 5, 2024	Benchmarking	CodeCode Available	1
TIDMAD: Time Series Dataset for Discovering Dark Matter with AI Denoising	Jun 5, 2024	BenchmarkingDenoising	CodeCode Available	1
CommonPower: A Framework for Safe Data-Driven Smart Grid Control	Jun 5, 2024	Benchmarkingenergy management	CodeCode Available	1
An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders	Jun 4, 2024	BenchmarkingClustering	CodeCode Available	1
animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics	Jun 3, 2024	Audio ClassificationBenchmarking	CodeCode Available	1
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models	Jun 1, 2024	Benchmarking	CodeCode Available	1
LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild	May 30, 2024	Benchmarking	CodeCode Available	1
SECURE: Benchmarking Large Language Models for Cybersecurity	May 30, 2024	Benchmarking	CodeCode Available	1
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning	May 30, 2024	Autonomous DrivingBenchmarking	CodeCode Available	1
Quantitative Certification of Bias in Large Language Models	May 29, 2024	Benchmarking	CodeCode Available	1
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions	May 29, 2024	BenchmarkingDialogue Understanding	CodeCode Available	1
DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime	May 28, 2024	BenchmarkingReinforcement Learning (RL)	CodeCode Available	1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences	May 28, 2024	BenchmarkingFeature Engineering	CodeCode Available	1
Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling	May 23, 2024	Benchmarking	CodeCode Available	1
GCondenser: Benchmarking Graph Condensation	May 23, 2024	BenchmarkingGraph Representation Learning	CodeCode Available	1
Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding	May 21, 2024	BenchmarkingKeypoint Detection	CodeCode Available	1
DocuMint: Docstring Generation for Python using Small Language Models	May 16, 2024	BenchmarkingCode Generation	CodeCode Available	1
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation	May 14, 2024	BenchmarkingMultiple-choice	CodeCode Available	1
Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration	May 10, 2024	BenchmarkingPoint Cloud Registration	CodeCode Available	1
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets	May 7, 2024	BenchmarkingCancer Classification	CodeCode Available	1
Position: Quo Vadis, Unsupervised Time Series Anomaly Detection?	May 4, 2024	Anomaly DetectionBenchmarking	CodeCode Available	1
ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance Imaging	Apr 30, 2024	BenchmarkingImage Reconstruction	CodeCode Available	1
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?	Apr 29, 2024	Answer GenerationBenchmarking	CodeCode Available	1
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs	Apr 28, 2024	Benchmarking	CodeCode Available	1
Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments	Apr 27, 2024	Autonomous VehiclesBenchmarking	CodeCode Available	1
Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection	Apr 25, 2024	Benchmarkingobject-detection	CodeCode Available	1
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction	Apr 24, 2024	AttributeAttribute Value Extraction	CodeCode Available	1
SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data	Apr 24, 2024	BenchmarkingFairness	CodeCode Available	1
TAVGBench: Benchmarking Text to Audible-Video Generation	Apr 22, 2024	BenchmarkingContrastive Learning	CodeCode Available	1
Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave Imaging	Apr 22, 2024	Benchmarking	CodeCode Available	1
A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models	Apr 22, 2024	BenchmarkingWorld Knowledge	CodeCode Available	1
REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking	Apr 19, 2024	Benchmarkingcoreference-resolution	CodeCode Available	1
How to Benchmark Vision Foundation Models for Semantic Segmentation?	Apr 18, 2024	BenchmarkingDecoder	CodeCode Available	1
Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data	Apr 16, 2024	BenchmarkingFace Recognition	CodeCode Available	1
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations	Apr 15, 2024	BenchmarkingBias Detection	CodeCode Available	1
A Review and Efficient Implementation of Scene Graph Generation Metrics	Apr 15, 2024	BenchmarkingGraph Generation	CodeCode Available	1
MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems	Apr 15, 2024	BenchmarkingCode Generation	CodeCode Available	1
nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation	Apr 15, 2024	BenchmarkingImage Segmentation	CodeCode Available	1
RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion	Apr 14, 2024	BenchmarkingData Augmentation	CodeCode Available	1

Show:10 25 50

← PrevPage 15 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified