SOTAVerified

Benchmarking

Papers

Showing 36013625 of 5548 papers

TitleStatusHype
Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media DataCode0
Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE CorpusCode0
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue SystemsCode0
Simple GNNs with Low Rank Non-parametric AggregatorsCode0
Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction0
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets0
Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data0
Full-scale modal testing of a Hawk T1A aircraft for benchmarking vibration-based methods0
CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis0
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation0
AKFruitYield: Modular benchmarking and video analysis software for Azure Kinect cameras for fruit size and fruit yield estimation in apple orchardsCode0
Bringing Quantum Algorithms to Automated Machine Learning: A Systematic Review of AutoML Frameworks Regarding Extensibility for QML Algorithms0
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning0
Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report0
A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling0
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference0
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal CarcinomaCode0
Deep Reinforcement Learning Algorithms for Hybrid V2X Communication: A Benchmarking Study0
On the Performance of Multimodal Language Models0
EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations0
Learning Quantum Processes with Quantum Statistical QueriesCode0
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods0
Benchmarking and Improving Generator-Validator Consistency of Language Models0
CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems0
A New Real-World Video Dataset for the Comparison of Defogging Algorithms0
Show:102550
← PrevPage 145 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified