SOTAVerified

Benchmarking

Papers

Showing 23512400 of 5548 papers

TitleStatusHype
SnCQA: A hardware-efficient equivariant quantum convolutional circuit architecture0
A Look at the Evaluation Setup of the M5 Forecasting Competition0
Fuzzy Knowledge Distillation from High-Order TSK to Low-Order TSK0
Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data0
A Comprehensive Survey on Retrieval Methods in Recommender Systems0
ALOJA-ML: A Framework for Automating Characterization and Knowledge Discovery in Hadoop Deployments0
Benchmarking unsupervised near-duplicate image detection0
Abasy Atlas v2.2: The most comprehensive and up-to-date inventory of meta-curated, historical, bacterial regulatory networks, their completeness and system-level characterization0
FunBench: Benchmarking Fundus Reading Skills of MLLMs0
Benchmarking Unsupervised Anomaly Detection and Localization0
Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning0
Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs0
Benchmarking Uncertainty Quantification on Biosignal Classification Tasks under Dataset Shift0
Automatic vehicle trajectory data reconstruction at scale0
ALOJA: A Framework for Benchmarking and Predictive Analytics in Big Data Deployments0
Functional Code Building Genetic Programming0
Benchmarking Ultra-Low-Power μNPUs0
Automatic Target Recognition on Synthetic Aperture Radar Imagery: A Survey0
Benchmarking Ultra-High-Definition Image Super-Resolution0
Almost Equivariance via Lie Algebra Convolutions0
Benchmarking performance, explainability, and evaluation strategies of vision-language models for surgery: Challenges and opportunities0
Benchmarking Twitter Sentiment Analysis Tools0
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models0
Automatic segmenting teeth in X-ray images: Trends, a novel data set, benchmarking and future perspectives0
Benchmarking Transformers-based models on French Spoken Language Understanding tasks0
Scaling laws in global corporations as a benchmarking approach to assess environmental performance0
A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis0
Full-stack evaluation of Machine Learning inference workloads for RISC-V systems0
Efficient Pauli channel estimation with logarithmic quantum memory0
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models0
Benchmarking Transformer-based Language Models for Arabic Sentiment and Sarcasm Detection0
Automatic Microprocessor Performance Bug Detection0
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems0
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference0
Benchmarking Toxic Molecule Classification using Graph Neural Networks and Few Shot Learning0
Automatic detection of passable roads after floods in remote sensed and social media data0
From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution0
A Line-of-Sight Channel Model for the 100-450 Gigahertz Frequency Band0
A Continuously Growing Dataset of Sentential Paraphrases0
From Sound Representation to Model Robustness0
FSD-10: A Dataset for Competitive Sports Content Analysis0
Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications0
Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization0
Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation0
Automated Structured Radiology Report Generation0
From Precision to Perception: User-Centred Evaluation of Keyword Extraction Algorithms for Internet-Scale Contextual Advertising0
Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions0
Benchmarking the Sim-to-Real Gap in Cloth Manipulation0
Automated Machine Learning on Big Data using Stochastic Algorithm Tuning0
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future0
Show:102550
← PrevPage 48 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified