SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–610 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design	Apr 14, 2025	BenchmarkingLanguage Modeling	—Unverified	0
Trade-offs in Privacy-Preserving Eye Tracking through Iris Obfuscation: A Benchmarking Study	Apr 14, 2025	BenchmarkingGaze Estimation	CodeCode Available	0
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding	Apr 12, 2025	BenchmarkingDocument AI	—Unverified	0
TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning	Apr 11, 2025	BenchmarkingLanguage Modeling	—Unverified	0
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs	Apr 11, 2025	BenchmarkingImage Generation	CodeCode Available	1
SortBench: Benchmarking LLMs based on their ability to sort lists	Apr 11, 2025	Benchmarking	—Unverified	0
TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration	Apr 11, 2025	Audio Signal ProcessingBenchmarking	CodeCode Available	2
Adaptive Shrinkage Estimation For Personalized Deep Kernel Regression In Modeling Brain Trajectories	Apr 10, 2025	Additive modelsBenchmarking	CodeCode Available	0
Benchmarking Suite for Synthetic Aperture Radar Imagery Anomaly Detection (SARIAD) Algorithms	Apr 10, 2025	Anomaly DetectionBenchmarking	CodeCode Available	0
SydneyScapes: Image Segmentation for Australian Environments	Apr 10, 2025	Autonomous VehiclesBenchmarking	—Unverified	0

Show:10 25 50

← PrevPage 61 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified