SOTAVerified

Benchmarking

Papers

Showing 36013650 of 5548 papers

TitleStatusHype
Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media DataCode0
Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE CorpusCode0
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue SystemsCode0
Simple GNNs with Low Rank Non-parametric AggregatorsCode0
Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction0
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets0
Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data0
Full-scale modal testing of a Hawk T1A aircraft for benchmarking vibration-based methods0
CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis0
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation0
AKFruitYield: Modular benchmarking and video analysis software for Azure Kinect cameras for fruit size and fruit yield estimation in apple orchardsCode0
Bringing Quantum Algorithms to Automated Machine Learning: A Systematic Review of AutoML Frameworks Regarding Extensibility for QML Algorithms0
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning0
Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report0
A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling0
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference0
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal CarcinomaCode0
Deep Reinforcement Learning Algorithms for Hybrid V2X Communication: A Benchmarking Study0
On the Performance of Multimodal Language Models0
EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations0
Learning Quantum Processes with Quantum Statistical QueriesCode0
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods0
Benchmarking and Improving Generator-Validator Consistency of Language Models0
CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems0
A New Real-World Video Dataset for the Comparison of Defogging Algorithms0
TRAM: Benchmarking Temporal Reasoning for Large Language Models0
Adaptive Visual Scene Understanding: Incremental Scene Graph GenerationCode0
The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks0
Adaptive Control of an Inverted Pendulum by a Reinforcement Learning-based LQR Method0
Benchmarking Collaborative Learning Methods Cost-Effectiveness for Prostate Segmentation0
A rigorous benchmarking of methods for SARS-CoV-2 lineage abundance estimation in wastewater0
Intuitive or Dependent? Investigating LLMs' Behavior Style to Conflicting Prompts0
Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection0
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors0
Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym0
Language Models as a Service: Overview of a New Paradigm and its Challenges0
Demographic Parity: Mitigating Biases in Real-World Data0
On quantifying and improving realism of images generated with diffusion0
Advancing The Rate-Distortion-Computation Frontier For Neural Image Compression0
Thalamic nuclei segmentation from T_1-weighted MRI: unifying and benchmarking state-of-the-art methods with young and old cohorts0
Optimization Techniques for a Physical Model of Human Vocalisation0
Efficient Pauli channel estimation with logarithmic quantum memory0
VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph0
Categorization and analysis of 14 computational methods for estimating cell potency from single-cell RNA-seq data0
Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligenceCode0
Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data0
Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts0
Multimodal Deep Learning for Scientific Imaging Interpretation0
Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam0
On the relationship between Benchmarking, Standards and Certification in Robotics and AI0
Show:102550
← PrevPage 73 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified