SOTAVerified

Benchmarking

Papers

Showing 23012325 of 5548 papers

TitleStatusHype
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIsCode0
Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time AppsCode0
Dynamic Neighborhood Construction for Structured Large Discrete Action SpacesCode0
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person ScenariosCode0
Grounded Intuition of GPT-Vision's Abilities with Scientific ImagesCode0
GRATIS: GeneRAting TIme Series with diverse and controllable characteristicsCode0
Improving Sequential Recommendation Models with an Enhanced Loss FunctionCode0
Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained modelsCode0
Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes ProsthesisCode0
Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document CorporaCode0
Editing Factual Knowledge and Explanatory Ability of Medical Large Language ModelsCode0
Benchmarking Long-tail Generalization with Likelihood SplitsCode0
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral PerspectiveCode0
Learning Conjoint Attentions for Graph Neural NetsCode0
Graph-theoretical approach to robust 3D normal extraction of LiDAR dataCode0
HRNET: AI on Edge for mask detection and social distancingCode0
Echo State Networks with Self-Normalizing Activations on the Hyper-SphereCode0
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and BenchmarkingCode0
ECBD: Evidence-Centered Benchmark Design for NLPCode0
Benchmarking LLMs' Judgments with No Gold StandardCode0
Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024)Code0
A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning ModelsCode0
GOAL: Towards Benchmarking Few-Shot Sports Game SummarizationCode0
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum DisorderCode0
A Review of Testing Object-Based Environment Perception for Safe Automated DrivingCode0
Show:102550
← PrevPage 93 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified