SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 211–220 of 5548 papers

Title	Date	Tasks	Status	Hype
Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond	Oct 9, 2024	Benchmarking	CodeCode Available	2
FedGraph: A Research Library and Benchmark for Federated Graph Learning	Oct 8, 2024	BenchmarkingFederated Learning	CodeCode Available	2
MIBench: A Comprehensive Framework for Benchmarking Model Inversion Attack and Defense	Oct 7, 2024	Adversarial RobustnessBenchmarking	CodeCode Available	2
dattri: A Library for Efficient Data Attribution	Oct 6, 2024	Benchmarking	CodeCode Available	2
AutoPenBench: Benchmarking Generative Agents for Penetration Testing	Oct 4, 2024	Benchmarking	CodeCode Available	2
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models	Sep 30, 2024	BenchmarkingContinual Learning	CodeCode Available	2
A Survey on Graph Neural Networks for Remaining Useful Life Prediction: Methodologies, Evaluation and Future Trends	Sep 29, 2024	Benchmarkinggraph construction	CodeCode Available	2
Small Language Models: Survey, Measurements, and Insights	Sep 24, 2024	BenchmarkingDecoder	CodeCode Available	2
GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization	Sep 24, 2024	3D geometry3DGS	CodeCode Available	2
A Survey on Multimodal Benchmarks: In the Era of Large AI Models	Sep 21, 2024	BenchmarkingSurvey	CodeCode Available	2

Show:10 25 50

← PrevPage 22 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified