SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 981–990 of 5548 papers

Title	Date	Tasks	Status	Hype
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated Samples	Feb 6, 2025	BenchmarkingDeepFake Detection	CodeCode Available	0
SoK: Benchmarking Poisoning Attacks and Defenses in Federated Learning	Feb 6, 2025	BenchmarkingData Poisoning	CodeCode Available	2
PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data	Feb 6, 2025	BenchmarkingTime Series	CodeCode Available	0
TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics	Feb 5, 2025	BenchmarkingLink Prediction	CodeCode Available	0
MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf	Feb 5, 2025	BenchmarkingScheduling	—Unverified	0
Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation	Feb 5, 2025	BenchmarkingLarge Language Model	CodeCode Available	2
Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications	Feb 5, 2025	BenchmarkingFeature Engineering	—Unverified	0
Energy & Force Regression on DFT Trajectories is Not Enough for Universal Machine Learning Interatomic Potentials	Feb 5, 2025	Benchmarking	—Unverified	0
Optimal PMU Placement for Kalman Filtering of DAE Power System Models	Feb 5, 2025	BenchmarkingState Estimation	—Unverified	0
PICBench: Benchmarking LLMs for Photonic Integrated Circuits Design	Feb 5, 2025	BenchmarkingPrompt Engineering	CodeCode Available	1

Show:10 25 50

← PrevPage 99 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified