SOTAVerified

Benchmarking

Papers

Showing 45014550 of 5548 papers

TitleStatusHype
A Roadmap for Improving Data Reliability and Sharing in Crosslinking Mass Spectrometry0
Unsupervised Single Image Deraining with Self-supervised Constraints0
Robust 2D/3D Vehicle Parsing in CVIS0
A Risk Taxonomy for Evaluating AI-Powered Psychotherapy Agents0
A rigorous benchmarking of methods for SARS-CoV-2 lineage abundance estimation in wastewater0
Unsupervised Spectral Demosaicing with Lightweight Spectral Attention Networks0
Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM0
Robust measurement of innovation performances in Europe with a hierarchy of interacting composite indicators0
Robust Medical Instrument Segmentation Challenge 20190
RobustMQ: Benchmarking Robustness of Quantized Models0
Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition0
Robustness of Reinforcement Learning-Based Traffic Signal Control under Incidents: A Comparative Study0
A Review of Reinforcement Learning in Financial Applications0
Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks0
A Review of Intelligent Music Generation Systems0
RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo0
Robust Vision Challenge 2020 -- 1st Place Report for Panoptic Segmentation0
A review of faithfulness metrics for hallucination assessment in Large Language Models0
A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling0
Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic-Structural Constraints0
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation0
A Review of 315 Benchmark and Test Functions for Machine Learning Optimization Algorithms and Metaheuristics with Mathematical and Visual Descriptions0
A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics0
Are SNNs Truly Energy-efficient? - A Hardware Perspective0
WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution0
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands0
A Report on the 2020 Sarcasm Detection Shared Task0
RRSIS: Referring Remote Sensing Image Segmentation0
A Report on the 2018 VUA Metaphor Detection Shared Task0
Arena-Web -- A Web-based Development and Benchmarking Platform for Autonomous Navigation Approaches0
RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark0
Unveiling the potential of large language models in generating semantic and cross-language clones0
Arena 4.0: A Comprehensive ROS2 Development and Benchmarking Platform for Human-centric Navigation Using Generative-Model-based Environment Generation0
Rule-based Data Selection for Large Language Models0
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach0
Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs0
RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy0
A Reinforcement Learning Environment for Directed Quantum Circuit Synthesis0
UPREVE: An End-to-End Causal Discovery Benchmarking System0
Urania: Differentially Private Insights into AI Use0
Sadeed: Advancing Arabic Diacritization Through Small Language Model0
Safe Load Balancing in Software-Defined-Networking0
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces0
A Real-time Spatio-Temporal Trajectory Planner for Autonomous Vehicles with Semantic Graph Optimization0
MAPS: Multi-Fidelity AI-Augmented Photonic Simulation and Inverse Design Infrastructure0
Are All Steps Equally Important? Benchmarking Essentiality Detection of Events0
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification0
SAIBench: A Structural Interpretation of AI for Science Through Benchmarks0
SAIBench: Benchmarking AI for Science0
Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics0
Show:102550
← PrevPage 91 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified