Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3526–3550 of 5548 papers

Title	Date	Tasks	Status	Hype
Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias	Dec 20, 2022	Benchmarking	CodeCode Available	0
Distributed Software-Defined Network Architecture for Smart Grid Resilience to Denial-of-Service Attacks	Dec 20, 2022	Benchmarking	—Unverified	0
AI applications in forest monitoring need remote sensing benchmark datasets	Dec 20, 2022	Benchmarking	—Unverified	0
Benchmarking person re-identification datasets and approaches for practical real-world implementations	Dec 20, 2022	BenchmarkingPedestrian Detection	CodeCode Available	0
A Comprehensive Study of the Robustness for LiDAR-based 3D Object Detectors against Adversarial Attacks	Dec 20, 2022	3D Object DetectionBenchmarking	CodeCode Available	1
AnyTOD: A Programmable Task-Oriented Dialog System	Dec 20, 2022	BenchmarkingLanguage Modeling	—Unverified	0
Benchmarking Spatial Relationships in Text-to-Image Generation	Dec 20, 2022	BenchmarkingImage Generation	CodeCode Available	1
Trial-Based Dominance Enables Non-Parametric Tests to Compare both the Speed and Accuracy of Stochastic Optimizers	Dec 19, 2022	BenchmarkingStochastic Optimization	—Unverified	0
GiCCS: A German in-Context Conversational Similarity Benchmark	Dec 16, 2022	BenchmarkingSemantic Textual Similarity	—Unverified	0
Biomedical image analysis competitions: The state of current participation practice	Dec 16, 2022	BenchmarkingSurvey	—Unverified	0
Automatic vehicle trajectory data reconstruction at scale	Dec 15, 2022	Benchmarkingvehicle detection	—Unverified	0
Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift	Dec 15, 2022	BenchmarkingImage Captioning	CodeCode Available	1
Benchmarking Large Language Models for Automated Verilog RTL Code Generation	Dec 13, 2022	BenchmarkingCode Generation	CodeCode Available	1
Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction	Dec 12, 2022	BenchmarkingMulti-step retrosynthesis	—Unverified	0
PyPop7: A Pure-Python Library for Population-Based Black-Box Optimization	Dec 12, 2022	BenchmarkingEvolutionary Algorithms	CodeCode Available	2
On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline	Dec 12, 2022	BenchmarkingData Augmentation	CodeCode Available	1
Momentum Contrastive Pre-training for Question Answering	Dec 12, 2022	BenchmarkingContrastive Learning	—Unverified	0
Progressive Multi-view Human Mesh Recovery with Self-Supervision	Dec 10, 2022	BenchmarkingDiversity	—Unverified	0
Ego-Body Pose Estimation via Ego-Head Pose Estimation	Dec 9, 2022	BenchmarkingDisentanglement	CodeCode Available	1
On Distribution Grid Optimal Power Flow Development and Integration	Dec 9, 2022	Benchmarking	—Unverified	0
Benchmarking Self-Supervised Learning on Diverse Pathology Datasets	Dec 9, 2022	BenchmarkingClassification	CodeCode Available	1
Is Bio-Inspired Learning Better than Backprop? Benchmarking Bio Learning vs. Backprop	Dec 9, 2022	Benchmarking	—Unverified	0
Model-based trajectory stitching for improved behavioural cloning and its applications	Dec 8, 2022	Behavioural cloningBenchmarking	—Unverified	0
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework	Dec 7, 2022	Benchmarking	CodeCode Available	1
An open unified deep graph learning framework for discovering drug leads	Dec 6, 2022	BenchmarkingDrug Discovery	CodeCode Available	0

Show:10 25 50

← PrevPage 142 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified