Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4176–4200 of 5548 papers

Title	Date	Tasks	Status
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games	Nov 20, 2024	BenchmarkingNetHack	—Unverified
Polyp-E: Benchmarking the Robustness of Deep Segmentation Models via Polyp Editing	Oct 22, 2024	AttributeBenchmarking	—Unverified
Balanced Random Survival Forests for Extremely Unbalanced, Right Censored Data	Mar 24, 2018	BenchmarkingPrediction	—Unverified
A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness	May 5, 2023	BenchmarkingDataset Distillation	—Unverified
Portfolio Benchmarking under Drawdown Constraint and Stochastic Sharpe Ratio	Oct 26, 2016	Benchmarking	—Unverified
PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions	Jun 20, 2024	Animal Pose EstimationAutonomous Driving	—Unverified
Pose Estimation for Non-Cooperative Spacecraft Rendezvous Using Convolutional Neural Networks	Sep 19, 2018	BenchmarkingImage Generation	—Unverified
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving	Mar 6, 2024	Automated Theorem ProvingBenchmarking	—Unverified
Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation	May 1, 2025	BenchmarkingPosition	—Unverified
BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text	May 22, 2025	BenchmarkingRAG	—Unverified
Position: Benchmarking is Limited in Reinforcement Learning Research	Jun 23, 2024	BenchmarkingPosition	—Unverified
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks	Feb 20, 2025	BenchmarkingCombinatorial Optimization	—Unverified
Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods	May 2, 2024	Benchmarking	—Unverified
Position: There are no Champions in Long-Term Time Series Forecasting	Feb 19, 2025	BenchmarkingPosition	—Unverified
Post-FEC BER Benchmarking for Bit-Interleaved Coded Modulation with Probabilistic Shaping	Apr 24, 2020	Benchmarking	—Unverified
Post-hoc labeling of arbitrary EEG recordings for data-efficient evaluation of neural decoding methods	Nov 22, 2017	BenchmarkingEEG	—Unverified
Deep Neural Operator Driven Real Time Inference for Nuclear Systems to Enable Digital Twin Solutions	Aug 15, 2023	BenchmarkingComputational Efficiency	—Unverified
PowerGraph: A power grid benchmark dataset for graph neural networks	Feb 5, 2024	ArticlesBenchmarking	—Unverified
Power Line Communication vs. Talkative Power Conversion: A Benchmarking Study	Apr 16, 2025	Benchmarking	—Unverified
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs	Jun 5, 2025	BenchmarkingVideo Understanding	—Unverified
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning	May 21, 2025	BenchmarkingImitation Learning	—Unverified
UAV Immersive Video Streaming: A Comprehensive Survey, Benchmarking, and Open Challenges	Oct 31, 2023	Benchmarking	—Unverified
Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding	Jan 7, 2025	BenchmarkingCode Generation	—Unverified
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval	Nov 30, 2023	BenchmarkingRetrieval	—Unverified
Practical, Fast and Robust Point Cloud Registration for 3D Scene Stitching and Object Localization	Nov 8, 2021	3D Feature MatchingBenchmarking	—Unverified

Show:10 25 50

← PrevPage 168 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified