SOTAVerified

Benchmarking

Papers

Showing 32513300 of 5548 papers

TitleStatusHype
Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells0
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian ContextCode0
Are Large Language Models Good at Utility Judgments?Code0
Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data0
GPTs and Language Barrier: A Cross-Lingual Legal QA Examination0
Benchmarking Video Frame Interpolation0
NSINA: A News Corpus for SinhalaCode0
DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts0
On the Fragility of Active Learners for Text ClassificationCode0
TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based ScoringCode0
Unifying Large Language Model and Deep Reinforcement Learning for Human-in-Loop Interactive Socially-aware Navigation0
Transactive Local Energy Markets Enable Community-Level Resource Coordination Using Individual Rewards0
Subjective Quality Assessment of Compressed Tone-Mapped High Dynamic Range Videos0
Broadening the Scope of Neural Network Potentials through Direct Inclusion of Additional Molecular Attributes0
ChatGPT Alternative Solutions: Large Language Models Survey0
Embarrassingly Simple Scribble Supervision for 3D Medical Segmentation0
MARTA: a model for the automatic phonemic grouping of the parkinsonian speechCode0
Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset0
Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks0
A Sober Look at the Robustness of CLIPs to Spurious Features0
Benchmarking the Robustness of UAV Tracking Against Common CorruptionsCode0
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety0
Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking0
FlowMind: Automatic Workflow Generation with LLMs0
Depression Detection on Social Media with Large Language Models0
Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks0
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot StudyCode0
SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different LanguagesCode0
Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object DetectorsCode0
Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing FlowsCode0
An Approach to Evaluate Modeling Adequacy for Small-Signal Stability Analysis of IBR-related SSOs in Multimachine Systems0
A tutorial on multi-view autoencoders using the multi-view-AE library0
IndicSTR12: A Dataset for Indic Scene Text Recognition0
(N,K)-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model0
Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation StrategiesCode0
A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation0
Multi-GPU-Enabled Hybrid Quantum-Classical Workflow in Quantum-HPC Middleware: Applications in Quantum SimulationsCode0
Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume0
Synth4bench: a framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithmsCode0
Benchmarking Large Language Models for Molecule Prediction TasksCode0
Improvements & Evaluations on the MLCommons CloudMask BenchmarkCode0
NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems0
Benchmarking News Recommendation in the Era of Green AI0
Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AICode0
Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition TaskCode0
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving0
Three Revisits to Node-Level Graph Anomaly Detection: Outliers, Message Passing and Hyperbolic Neural NetworksCode0
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word ProblemCode0
A Density-Guided Temporal Attention Transformer for Indiscernible Object Counting in Underwater Video0
Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation0
Show:102550
← PrevPage 66 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified