SOTAVerified

Benchmarking

Papers

Showing 29512975 of 5548 papers

TitleStatusHype
Experimental Benchmarking of Energy-saving Sub-Optimal Sliding Mode Control0
NativQA: Multilingual Culturally-Aligned Natural Query for LLMs0
Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem0
Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Decision-Making in Dynamic EnvironmentCode0
Evaluating Nuanced Bias in Large Language Model Free Response Answers0
A Comprehensive Survey on Retrieval Methods in Recommender Systems0
Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models0
How Aligned are Different Alignment Metrics?0
HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability predictionCode0
Analyzing the Effectiveness of Listwise Reranking with Positional Invariance on Temporal Generalizability0
SPINEX-Clustering: Similarity-based Predictions with Explainable Neighbors Exploration for Clustering Problems0
GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation0
Simulation-based Benchmarking for Causal Structure Learning in Gene Perturbation ExperimentsCode0
TARGO: Benchmarking Target-driven Object Grasping under Occlusions0
MERGE -- A Bimodal Audio-Lyrics Dataset for Static Music Emotion Recognition0
A Benchmark for Multi-speaker Anonymization0
Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNsCode0
From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano0
Benchmarking GNNs Using Lightning Network Data0
Towards Stable 3D Object Detection0
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation0
Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious BiasCode0
Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms0
TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations0
Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining TasksCode0
Show:102550
← PrevPage 119 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified