SOTAVerified

Benchmarking

Papers

Showing 32513275 of 5548 papers

TitleStatusHype
Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells0
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian ContextCode0
Are Large Language Models Good at Utility Judgments?Code0
Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data0
GPTs and Language Barrier: A Cross-Lingual Legal QA Examination0
Benchmarking Video Frame Interpolation0
NSINA: A News Corpus for SinhalaCode0
DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts0
On the Fragility of Active Learners for Text ClassificationCode0
TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based ScoringCode0
Unifying Large Language Model and Deep Reinforcement Learning for Human-in-Loop Interactive Socially-aware Navigation0
Transactive Local Energy Markets Enable Community-Level Resource Coordination Using Individual Rewards0
Subjective Quality Assessment of Compressed Tone-Mapped High Dynamic Range Videos0
Broadening the Scope of Neural Network Potentials through Direct Inclusion of Additional Molecular Attributes0
ChatGPT Alternative Solutions: Large Language Models Survey0
Embarrassingly Simple Scribble Supervision for 3D Medical Segmentation0
MARTA: a model for the automatic phonemic grouping of the parkinsonian speechCode0
Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset0
Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks0
A Sober Look at the Robustness of CLIPs to Spurious Features0
Benchmarking the Robustness of UAV Tracking Against Common CorruptionsCode0
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety0
Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking0
FlowMind: Automatic Workflow Generation with LLMs0
Depression Detection on Social Media with Large Language Models0
Show:102550
← PrevPage 131 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified