SOTAVerified

Benchmarking

Papers

Showing 24012450 of 5548 papers

TitleStatusHype
Benchmarking the Robustness of Quantized Models0
Benchmarking the Robustness of Panoptic Segmentation for Automated Driving0
Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models0
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery0
A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior0
Benchmarking the Robustness of Instance Segmentation Models0
Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem0
Generalized Conflict-directed Search for Optimal Ordering Problems0
Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey0
Alibaba’s Submission for the WMT 2020 APE Shared Task: Improving Automatic Post-Editing with Pre-trained Conditional Cross-Lingual BERT0
Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance0
Benchmarking the rationality of AI decision making using the transitivity axiom0
Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN)0
Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization0
Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection0
AutoLay: Benchmarking amodal layout estimation for autonomous driving0
Benchmarking the Neural Linear Model for Regression0
Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model0
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow0
General Scales Unlock AI Evaluation with Explanatory and Predictive Power0
Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges0
Benchmarking the Impact of Noise on Deep Learning-based Classification of Atrial Fibrillation in 12-Lead ECG0
Benchmarking the human brain against computational architectures0
A Conformance Checking-based Approach for Drift Detection in Business Processes0
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases0
AutoAI-TS: AutoAI for Time Series Forecasting0
Benchmarking the Gerchberg-Saxton Algorithm0
ALdataset: a benchmark for pool-based active learning0
Benchmarking the Fidelity and Utility of Synthetic Relational Data0
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing0
Generalised Gaussian Process Latent Variable Models (GPLVM) with Stochastic Variational Inference0
AA3DNet: Attention Augmented Real Time 3D Object Detection0
Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web0
Benchmarking the Effectiveness of Classification Algorithms and SVM Kernels for Dry Beans0
A Computer Vision System to Localize and Classify Wastes on the Streets0
Practical Guidelines for Cell Segmentation Models Under Optical Aberrations in Microscopy0
Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors0
Benchmarking the Benchmark -- Analysis of Synthetic NIDS Datasets0
A Universal Protocol to Benchmark Camera Calibration for Sports0
A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration0
A Unified Taylor Framework for Revisiting Attribution Methods0
Benchmarking the Accuracy and Robustness of Feedback Alignment Algorithms0
A Latent Fingerprint in the Wild Database0
Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices0
Benchmarking terminology building capabilities of ChatGPT on an English-Russian Fashion Corpus0
A Unified Study of Machine Learning Explanation Evaluation Metrics0
Benchmarking Table Comprehension In The Wild0
A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking0
A Large-scale Study on Training Sample Memorization in Generative Modeling0
Benchmarking Systematic Relational Reasoning with Large Language and Reasoning Models0
Show:102550
← PrevPage 49 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified