Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4501–4550 of 5548 papers

Title	Date	Tasks	Status
A Roadmap for Improving Data Reliability and Sharing in Crosslinking Mass Spectrometry	Apr 9, 2025	Benchmarking	—Unverified
Unsupervised Single Image Deraining with Self-supervised Constraints	Nov 21, 2018	BenchmarkingGenerative Adversarial Network	—Unverified
Robust 2D/3D Vehicle Parsing in CVIS	Mar 11, 2021	BenchmarkingData Augmentation	—Unverified
A Risk Taxonomy for Evaluating AI-Powered Psychotherapy Agents	May 21, 2025	BenchmarkingDecompensation	—Unverified
A rigorous benchmarking of methods for SARS-CoV-2 lineage abundance estimation in wastewater	Sep 29, 2023	Benchmarking	—Unverified
Unsupervised Spectral Demosaicing with Lightweight Spectral Attention Networks	Jul 5, 2023	BenchmarkingDemosaicking	—Unverified
Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM	Nov 13, 2019	BenchmarkingPose Estimation	—Unverified
Robust measurement of innovation performances in Europe with a hierarchy of interacting composite indicators	May 18, 2019	BenchmarkingDecision Making	—Unverified
Robust Medical Instrument Segmentation Challenge 2019	Mar 23, 2020	BenchmarkingInstance Segmentation	—Unverified
RobustMQ: Benchmarking Robustness of Quantized Models	Aug 4, 2023	Adversarial RobustnessBenchmarking	—Unverified
Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition	Jun 13, 2024	Benchmarking	—Unverified
Robustness of Reinforcement Learning-Based Traffic Signal Control under Incidents: A Comparative Study	Jun 16, 2025	BenchmarkingTraffic Signal Control	—Unverified
A Review of Reinforcement Learning in Financial Applications	Nov 1, 2024	BenchmarkingDecision Making	—Unverified
Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks	Sep 20, 2024	Benchmarkingobject-detection	—Unverified
A Review of Intelligent Music Generation Systems	Nov 16, 2022	BenchmarkingMusic Generation	—Unverified
RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo	May 14, 2025	BenchmarkingOptical Flow Estimation	—Unverified
Robust Vision Challenge 2020 -- 1st Place Report for Panoptic Segmentation	Aug 23, 2020	BenchmarkingPanoptic Segmentation	—Unverified
A review of faithfulness metrics for hallucination assessment in Large Language Models	Dec 31, 2024	BenchmarkingHallucination	—Unverified
A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling	Oct 5, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified
Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic-Structural Constraints	Apr 25, 2023	BenchmarkingContrastive Learning	—Unverified
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation	Nov 25, 2024	Active LearningBayesian Inference	—Unverified
A Review of 315 Benchmark and Test Functions for Machine Learning Optimization Algorithms and Metaheuristics with Mathematical and Visual Descriptions	Jun 13, 2024	Benchmarking	—Unverified
A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics	Nov 8, 2024	Benchmarking	—Unverified
Are SNNs Truly Energy-efficient? - A Hardware Perspective	Sep 6, 2023	Benchmarking	—Unverified
WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution	Apr 28, 2025	BenchmarkingImage Attribution	—Unverified
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands	Aug 20, 2024	BenchmarkingContact-rich Manipulation	—Unverified
A Report on the 2020 Sarcasm Detection Shared Task	May 12, 2020	BenchmarkingSarcasm Detection	—Unverified
RRSIS: Referring Remote Sensing Image Segmentation	Jun 14, 2023	BenchmarkingImage Segmentation	—Unverified
A Report on the 2018 VUA Metaphor Detection Shared Task	Jun 1, 2018	Benchmarking	—Unverified
Arena-Web -- A Web-based Development and Benchmarking Platform for Autonomous Navigation Approaches	Feb 6, 2023	Autonomous NavigationBenchmarking	—Unverified
RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark	Jul 18, 2024	3D Human Pose EstimationBenchmarking	—Unverified
Unveiling the potential of large language models in generating semantic and cross-language clones	Sep 12, 2023	BenchmarkingCode Generation	—Unverified
Arena 4.0: A Comprehensive ROS2 Development and Benchmarking Platform for Human-centric Navigation Using Generative-Model-based Environment Generation	Sep 19, 2024	BenchmarkingSocial Navigation	—Unverified
Rule-based Data Selection for Large Language Models	Oct 7, 2024	BenchmarkingMath	—Unverified
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach	Mar 10, 2022	BenchmarkingSentence	—Unverified
Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs	Nov 1, 2023	BenchmarkingQuestion Answering	—Unverified
RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy	Mar 26, 2025	BenchmarkingRepresentation Learning	—Unverified
A Reinforcement Learning Environment for Directed Quantum Circuit Synthesis	Jan 13, 2024	Benchmarkingreinforcement-learning	—Unverified
UPREVE: An End-to-End Causal Discovery Benchmarking System	Jul 25, 2023	BenchmarkingCausal Discovery	—Unverified
Urania: Differentially Private Insights into AI Use	Jun 5, 2025	BenchmarkingChatbot	—Unverified
Sadeed: Advancing Arabic Diacritization Through Small Language Model	Apr 30, 2025	Arabic Text DiacritizationBenchmarking	—Unverified
Safe Load Balancing in Software-Defined-Networking	Oct 22, 2024	BenchmarkingDeep Reinforcement Learning	—Unverified
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces	Mar 8, 2025	Benchmarkingcounterfactual	—Unverified
A Real-time Spatio-Temporal Trajectory Planner for Autonomous Vehicles with Semantic Graph Optimization	Feb 25, 2025	Autonomous VehiclesBenchmarking	—Unverified
MAPS: Multi-Fidelity AI-Augmented Photonic Simulation and Inverse Design Infrastructure	Mar 2, 2025	Benchmarking	—Unverified
Are All Steps Equally Important? Benchmarking Essentiality Detection of Events	Oct 8, 2022	AllBenchmarking	—Unverified
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification	Jul 16, 2024	BenchmarkingFew-Shot Learning	—Unverified
SAIBench: A Structural Interpretation of AI for Science Through Benchmarks	Nov 29, 2023	BenchmarkingComputational Efficiency	—Unverified
SAIBench: Benchmarking AI for Science	Jun 11, 2022	BenchmarkingFriction	—Unverified
Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics	Apr 27, 2017	AllBenchmarking	—Unverified

Show:10 25 50

← PrevPage 91 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified