SOTAVerified

Benchmarking

Papers

Showing 23012350 of 5548 papers

TitleStatusHype
HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability predictionCode0
Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate GradientsCode0
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIsCode0
Harmonization Benchmarking Tool for Neuroimaging DatasetsCode0
Harnessing Orthogonality to Train Low-Rank Neural NetworksCode0
HATE-ITA: New Baselines for Hate Speech Detection in ItalianCode0
gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and GazeboCode0
HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device ScenariosCode0
Improving Sequential Recommendation Models with an Enhanced Loss FunctionCode0
Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time AppsCode0
Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained modelsCode0
Dynamic Neighborhood Construction for Structured Large Discrete Action SpacesCode0
Editing Factual Knowledge and Explanatory Ability of Medical Large Language ModelsCode0
Benchmarking Long-tail Generalization with Likelihood SplitsCode0
Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document CorporaCode0
Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN PerformanceCode0
Grounded Intuition of GPT-Vision's Abilities with Scientific ImagesCode0
Hard-Label Cryptanalytic Extraction of Neural Network ModelsCode0
Graph-theoretical approach to robust 3D normal extraction of LiDAR dataCode0
Echo State Networks with Self-Normalizing Activations on the Hyper-SphereCode0
Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes ProsthesisCode0
ECBD: Evidence-Centered Benchmark Design for NLPCode0
Benchmarking LLMs' Judgments with No Gold StandardCode0
Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024)Code0
Benchmarking Machine Translation with Cultural AwarenessCode0
EmProx: Neural Network Performance Estimation For Neural Architecture SearchCode0
GRATIS: GeneRAting TIme Series with diverse and controllable characteristicsCode0
Learning Conjoint Attentions for Graph Neural NetsCode0
A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning ModelsCode0
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum DisorderCode0
A Review of Testing Object-Based Environment Perception for Safe Automated DrivingCode0
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral PerspectiveCode0
Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking TechniqueCode0
DynCIM: Dynamic Curriculum for Imbalanced Multimodal LearningCode0
Hardware Aware Neural Network Architectures using FbNetCode0
HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot InteractionCode0
DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization ProblemsCode0
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation dataCode0
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and BenchmarkingCode0
Benchmarking Multimodal CoT Reward Model Stepwise by Visual ProgramCode0
Effective Stabilized Self-Training on Few-Labeled Graph DataCode0
Enhancing Biomedical Knowledge Discovery for Diseases: An Open-Source Framework Applied on Rett Syndrome and Alzheimer's DiseaseCode0
GOAL: Towards Benchmarking Few-Shot Sports Game SummarizationCode0
GNNMerge: Merging of GNN Models Without Accessing Training DataCode0
A Deep Reinforcement Learning Framework for Dynamic Portfolio Optimization: Evidence from China's Stock MarketCode0
Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural NetworksCode0
Enhancing Hyper-To-Real Space Projections Through Euclidean Norm Meta-Heuristic OptimizationCode0
Geological Inference from Textual Data using Word EmbeddingsCode0
GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree searchCode0
Benchmarking LLM-based Relevance Judgment MethodsCode0
Show:102550
← PrevPage 47 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified