SOTAVerified

Benchmarking

Papers

Showing 21512175 of 5548 papers

TitleStatusHype
Categorization of 33 computational methods to detect spatially variable genes from spatially resolved transcriptomics data0
MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification0
Benchmarking and Improving Detail Image CaptionCode2
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn InteractionsCode1
Quantitative Certification of Bias in Large Language ModelsCode1
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion0
Risk-Neutral Generative Networks0
DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment RegimeCode1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking SequencesCode1
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of ParametersCode2
Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous DrivingCode3
A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis0
Benchmarking General-Purpose In-Context Learning0
BOLD: Boolean Logic Deep Learning0
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases0
Application based Evaluation of an Efficient Spike-Encoder, "Spiketrum"0
Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification0
NuwaTS: a Foundation Model Mending Every Incomplete Time Series0
Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images0
Full-stack evaluation of Machine Learning inference workloads for RISC-V systems0
MCDFN: Supply Chain Demand Forecasting via an Explainable Multi-Channel Data Fusion Network Model0
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study0
Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks0
Analog or Digital In-memory Computing? Benchmarking through Quantitative ModelingCode1
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language ModelsCode2
Show:102550
← PrevPage 87 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified