SOTAVerified

Benchmarking

Papers

Showing 33513400 of 5548 papers

TitleStatusHype
BdSLW60: A Word-Level Bangla Sign Language DatasetCode0
Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems0
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages0
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking StudyCode0
Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices0
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation0
Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation0
A Functional Analysis Approach to Symbolic Regression0
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education0
Efficient Expression Neutrality Estimation with Application to Face Recognition Utility Prediction0
Transparent and Scrutable Recommendations Using Natural Language User ProfilesCode0
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and DatasetCode0
Towards Biologically Plausible and Private Gene Expression Data GenerationCode0
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory PerceptionCode0
AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness DetectionCode0
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification0
Quantitative Metrics for Benchmarking Medical Image Harmonization0
PowerGraph: A power grid benchmark dataset for graph neural networks0
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical SegmentationCode0
Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based VisualizationsCode0
Probing Critical Learning Dynamics of PLMs for Hate Speech DetectionCode0
Can LLMs perform structured graph reasoning?Code0
Variational Quantum Circuits Enhanced Generative Adversarial Network0
Benchmarking Spiking Neural Network Learning Methods with Varying Locality0
Coherent Feed Forward Quantum Neural Network0
MRAnnotator: multi-Anatomy and many-Sequence MRI segmentation of 44 structures0
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation dataCode0
Benchmarking Sensitivity of Continual Graph Learning for Skeleton-Based Action Recognition0
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling TasksCode0
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA0
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation ModelsCode0
Benchmarking with MIMIC-IV, an irregular, spare clinical time series dataset0
SAM-based instance segmentation models for the automation of structural damage detection0
Biological Valuation Map of Flanders: A Sentinel-2 Imagery Analysis0
Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs0
Automated legal reasoning with discretion to act using s(LAW)0
TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images0
Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding0
Benchmarking the Fairness of Image Upsampling MethodsCode0
LLpowershap: Logistic Loss-based Automated Shapley Values Feature Selection MethodCode0
Deep Neural Network Benchmarks for Selective ClassificationCode0
What the Weight?! A Unified Framework for Zero-Shot Knowledge CompositionCode0
Subgroup analysis methods for time-to-event outcomes in heterogeneous randomized controlled trialsCode0
Data-Driven Target Localization: Benchmarking Gradient Descent Using the Cramer-Rao Bound0
Data Augmentation for Traffic Classification0
Harnessing Orthogonality to Train Low-Rank Neural NetworksCode0
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription0
OpenDPD: An Open-Source End-to-End Learning & Benchmarking Framework for Wideband Power Amplifier Modeling and Digital Pre-Distortion0
Large Language Models are Null-Shot Learners0
Show:102550
← PrevPage 68 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified