SOTAVerified

Benchmarking

Papers

Showing 10011025 of 5548 papers

TitleStatusHype
Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution TracesCode1
Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial AttacksCode1
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought MethodCode1
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language ModelsCode1
PMC-VQA: Visual Instruction Tuning for Medical Visual Question AnsweringCode1
An Empirical Study on Google Research Football Multi-agent ScenariosCode1
A Platform for the Biomedical Application of Large Language ModelsCode1
InfoMetIC: An Informative Metric for Reference-free Image Caption EvaluationCode1
Benchmarking large language models for biomedical natural language processing applications and recommendationsCode1
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated ObjectsCode1
Working Memory Capacity of ChatGPT: An Empirical StudyCode1
Event-Free Moving Object Segmentation from Moving Ego VehicleCode1
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and EarbudsCode1
MF-NeRF: Memory Efficient NeRF with Mixed-Feature Hash TableCode1
RGB-D Indiscernible Object Counting in Underwater ScenesCode1
Benchmarking Low-Shot Robustness to Natural Distribution ShiftsCode1
SCoDA: Domain Adaptive Shape Completion for Real ScansCode1
Graph Neural Network-Based Anomaly Detection for River Network SystemsCode1
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action ConstraintsCode1
A Comparison of Image Denoising MethodsCode1
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and SystemsCode1
Interpretable statistical representations of neural population dynamics and geometryCode1
MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy CodingCode1
SLPerf: a Unified Framework for Benchmarking Split LearningCode1
Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam DetectionCode1
Show:102550
← PrevPage 41 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified