SOTAVerified

Benchmarking

Papers

Showing 32013250 of 5548 papers

TitleStatusHype
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future0
From Precision to Perception: User-Centred Evaluation of Keyword Extraction Algorithms for Internet-Scale Contextual Advertising0
From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification0
From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution0
From Sound Representation to Model Robustness0
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems0
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference0
FSD-10: A Dataset for Competitive Sports Content Analysis0
Full-scale modal testing of a Hawk T1A aircraft for benchmarking vibration-based methods0
Full-stack evaluation of Machine Learning inference workloads for RISC-V systems0
FunBench: Benchmarking Fundus Reading Skills of MLLMs0
Functional Code Building Genetic Programming0
Efficient Pauli channel estimation with logarithmic quantum memory0
FuzzWiz -- Fuzzing Framework for Efficient Hardware Coverage0
Fuzzy Knowledge Distillation from High-Order TSK to Low-Order TSK0
Genetic algorithm for feature selection of EEG heterogeneous data0
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training0
GAN-based disentanglement learning for chest X-ray rib suppression0
GANmut: Generating and Modifying Facial Expressions0
GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR0
GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics0
Gauss-Ramanujan Functions: Constructions, Properties, and Applications in Communications and Signal Processing0
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing0
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases0
Generalised Gaussian Process Latent Variable Models (GPLVM) with Stochastic Variational Inference0
Generalization Bias in Large Language Model Summarization of Scientific Research0
Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization0
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow0
Generalized Conflict-directed Search for Optimal Ordering Problems0
Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey0
General Scales Unlock AI Evaluation with Explanatory and Predictive Power0
Generating Artificial Outliers in the Absence of Genuine Ones -- a Survey0
Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems0
Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems0
Hierarchical Data Generator based on Tree-Structured Stick Breaking Process for Benchmarking Clustering Methods0
Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking0
Generation of Large District Heating System Models Using Open-Source Data and Tools: An Exemplary Workflow0
Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?0
Generative Adversarial Networks with Limited Data: A Survey and Benchmarking0
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors0
Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges0
Learning Dynamic Feature Selection for Fast Sequential Prediction0
Learning Environment Models with Continuous Stochastic Dynamics0
Learning Graphs for Knowledge Transfer With Limited Labels0
Learning Hidden Physics and System Parameters with Deep Operator Networks0
Learning Multimorbidity Patterns from Electronic Health Records Using Non-negative Matrix Factorisation0
Benchmarking Augmentation Methods for Learning Robust Navigation Agents: the Winning Entry of the 2021 iGibson Challenge0
Learning to Adapt to Online Streams with Distribution Shifts0
Realistic Large-Scale Fine-Depth Dehazing Dataset from 3D Videos0
Learning to Disambiguate by Asking Discriminative Questions0
Show:102550
← PrevPage 65 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified