Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3201–3225 of 5548 papers

Title	Date	Tasks	Status	Hype
Datasets and Benchmarks for Offline Safe Reinforcement Learning	Jun 15, 2023	Autonomous DrivingBenchmarking	CodeCode Available	2
MUBen: Benchmarking the Uncertainty of Molecular Representation Models	Jun 14, 2023	BenchmarkingDrug Discovery	CodeCode Available	0
RRSIS: Referring Remote Sensing Image Segmentation	Jun 14, 2023	BenchmarkingImage Segmentation	—Unverified	0
A Cloud-based Machine Learning Pipeline for the Efficient Extraction of Insights from Customer Reviews	Jun 13, 2023	BenchmarkingKeyword Extraction	—Unverified	0
detrex: Benchmarking Detection Transformers	Jun 12, 2023	Benchmarkingobject-detection	—Unverified	0
Benchmarking Neural Network Training Algorithms	Jun 12, 2023	Benchmarking	CodeCode Available	4
Contribution à l'Optimisation d'un Comportement Collectif pour un Groupe de Robots Autonomes	Jun 10, 2023	BenchmarkingDiversity	—Unverified	0
Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception	Jun 10, 2023	3D Object DetectionBenchmarking	CodeCode Available	2
NeuroGraph: Benchmarks for Graph Machine Learning in Brain Connectomics	Jun 9, 2023	BenchmarkingDataset Generation	CodeCode Available	1
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration	Jun 9, 2023	BenchmarkingTime Series	—Unverified	0
A Large-Scale Analysis on Self-Supervised Video Representation Learning	Jun 9, 2023	BenchmarkingRepresentation Learning	—Unverified	0
DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems	Jun 8, 2023	BenchmarkingDescriptive	CodeCode Available	0
FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs	Jun 8, 2023	BenchmarkingFederated Learning	CodeCode Available	0
Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML	Jun 8, 2023	BenchmarkingKidney Function	CodeCode Available	1
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models	Jun 8, 2023	BenchmarkingFairness	CodeCode Available	0
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems	Jun 8, 2023	BenchmarkingEdge-computing	—Unverified	0
Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation Framework	Jun 8, 2023	Benchmarking	CodeCode Available	0
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing	Jun 7, 2023	BenchmarkingPrompt Engineering	CodeCode Available	1
Improved statistical benchmarking of digital pathology models using pairwise frames evaluation	Jun 7, 2023	BenchmarkingClassification	—Unverified	0
RD-Suite: A Benchmark for Ranking Distillation	Jun 7, 2023	Benchmarking	—Unverified	0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals	Jun 7, 2023	BenchmarkingMachine Reading Comprehension	CodeCode Available	0
Benchmarking Foundation Models with Language-Model-as-an-Examiner	Jun 7, 2023	BenchmarkingLanguage Modeling	—Unverified	0
Self-Adjusting Weighted Expected Improvement for Bayesian Optimization	Jun 7, 2023	Bayesian OptimizationBenchmarking	CodeCode Available	0
ICON^2: Reliably Benchmarking Predictive Inequity in Object Detection	Jun 7, 2023	AttributeAutonomous Driving	—Unverified	0
Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities	Jun 6, 2023	BenchmarkingDepth Completion	—Unverified	0

Show:10 25 50

← PrevPage 129 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified