SOTAVerified

Benchmarking

Papers

Showing 15261550 of 5548 papers

TitleStatusHype
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and ChallengesCode0
Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?Code0
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual TrackingCode0
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency PredictionCode0
Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph GenerationCode0
Laparoscopic Image Desmoking Using the U-Net with New Loss Function and Integrated Differentiable Wiener FilterCode0
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methodsCode0
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive SegmentationCode0
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision TransformersCode0
Adversarial Metric Attack and Defense for Person Re-identificationCode0
Language-based Image Colorization: A Benchmark and BeyondCode0
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG RoutingCode0
Benchmarking Feature-based Algorithm Selection Systems for Black-box Numerical OptimizationCode0
Benchmarking Failures in Tool-Augmented Language ModelsCode0
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regionsCode0
Knowledge Enhanced Conditional Imputation for Healthcare Time-seriesCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
Towards Enhancing Fault Tolerance in Neural NetworksCode0
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World KnowledgeCode0
Ants can orienteer a thief in their robberyCode0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
Benchmarking Educational Program RepairCode0
ANTHROPOS-V: benchmarking the novel task of Crowd Volume EstimationCode0
Adversarial Environment Generation for Learning to Navigate the WebCode0
Show:102550
← PrevPage 62 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified