SOTAVerified

Benchmarking

Papers

Showing 10011050 of 5548 papers

TitleStatusHype
Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution TracesCode1
Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial AttacksCode1
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought MethodCode1
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language ModelsCode1
PMC-VQA: Visual Instruction Tuning for Medical Visual Question AnsweringCode1
An Empirical Study on Google Research Football Multi-agent ScenariosCode1
A Platform for the Biomedical Application of Large Language ModelsCode1
Benchmarking large language models for biomedical natural language processing applications and recommendationsCode1
InfoMetIC: An Informative Metric for Reference-free Image Caption EvaluationCode1
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated ObjectsCode1
Working Memory Capacity of ChatGPT: An Empirical StudyCode1
Event-Free Moving Object Segmentation from Moving Ego VehicleCode1
MF-NeRF: Memory Efficient NeRF with Mixed-Feature Hash TableCode1
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and EarbudsCode1
RGB-D Indiscernible Object Counting in Underwater ScenesCode1
Benchmarking Low-Shot Robustness to Natural Distribution ShiftsCode1
SCoDA: Domain Adaptive Shape Completion for Real ScansCode1
Graph Neural Network-Based Anomaly Detection for River Network SystemsCode1
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action ConstraintsCode1
A Comparison of Image Denoising MethodsCode1
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and SystemsCode1
Interpretable statistical representations of neural population dynamics and geometryCode1
MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy CodingCode1
SLPerf: a Unified Framework for Benchmarking Split LearningCode1
Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam DetectionCode1
ScandEval: A Benchmark for Scandinavian Natural Language ProcessingCode1
ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetryCode1
What Makes for Effective Few-shot Point Cloud Classification?Code1
A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise ModelsCode1
ImageNet-E: Benchmarking Neural Network Robustness via Attribute EditingCode1
MGTBench: Benchmarking Machine-Generated Text DetectionCode1
MEGA: Multilingual Evaluation of Generative AICode1
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4Code1
Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering Regularized Self-TrainingCode1
CCTV-Gun: Benchmarking Handgun Detection in CCTV ImagesCode1
COVID-19 event extraction from Twitter via extractive question answering with continuous promptsCode1
TransNetR: Transformer-based Residual Network for Polyp Segmentation with Multi-Center Out-of-Distribution TestingCode1
What Can We Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet ClassifiersCode1
Revisiting the Gumbel-Softmax in MADDPGCode1
A framework for benchmarking class-out-of-distribution detection and its application to ImageNetCode1
A SWAT-based Reinforcement Learning Framework for Crop ManagementCode1
SurgT challenge: Benchmark of Soft-Tissue Trackers for Robotic SurgeryCode1
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasksCode1
Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfilerCode1
Rethinking low-cost microscopy workflow: Image enhancement using deep based Extended Depth of Field methodsCode1
Benchmarking Large Language Models for News SummarizationCode1
Benchmarking Robustness to Adversarial Image ObfuscationsCode1
TemporAI: Facilitating Machine Learning Innovation in Time Domain Tasks for MedicineCode1
BiBench: Benchmarking and Analyzing Network BinarizationCode1
Young Labeled Faces in the Wild (YLFW): A Dataset for Children Faces RecognitionCode1
Show:102550
← PrevPage 21 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified