SOTAVerified

Benchmarking

Papers

Showing 44014450 of 5548 papers

TitleStatusHype
LAVIS: A Library for Language-Vision Intelligence0
LayoutXLM vs. GNN: An Empirical Evaluation of Relation Extraction for Documents0
LCFO: Long Context and Long Form Output Dataset and Benchmarking0
LEAF: A Benchmark for Federated Settings0
Leaf Segmentation and Counting with Deep Learning: on Model Certainty, Test-Time Augmentation, Trade-Offs0
Learning a CNN-based End-to-End Controller for a Formula SAE Racecar0
Learning a quantum computer's capability0
Learning a Representation with the Block-Diagonal Structure for Pattern Classification0
Learning a Saliency Evaluation Metric Using Crowdsourced Perceptual Judgments0
Learning Best Paths in Quantum Networks0
Learning Disentangled Audio Representations through Controlled Synthesis0
Learning Disentangled Speech Representations0
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regionsCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
Benchmark data and method for real-time people counting in cluttered scenes using depth sensorsCode0
Reassessing Layer Pruning in LLMs: New Insights and MethodsCode0
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision TransformersCode0
Re-Benchmarking Pool-Based Active Learning for Binary ClassificationCode0
Knowledge Enhanced Conditional Imputation for Healthcare Time-seriesCode0
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methodsCode0
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue SystemsCode0
CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM PipelinesCode0
Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic EnvironmentsCode0
Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule EndoscopyCode0
Language-based Image Colorization: A Benchmark and BeyondCode0
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language ModelsCode0
BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture SearchCode0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
TFW2V: An Enhanced Document Similarity Method for the Morphologically Rich Finnish LanguageCode0
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking StudyCode0
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency PredictionCode0
Laparoscopic Image Desmoking Using the U-Net with New Loss Function and Integrated Differentiable Wiener FilterCode0
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG RoutingCode0
Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-BenchCode0
Recurrent Quantum Neural NetworksCode0
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
Can geometric combinatorics improve RNA branching predictions?Code0
BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture SearchCode0
Can a single neuron learn predictive uncertainty?Code0
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringCode0
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and ChallengesCode0
Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation FrameworkCode0
KArSL: Arabic Sign Language DatabaseCode0
Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim Evidence ReasoningCode0
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language ModelsCode0
TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential DynamicsCode0
Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-TuningCode0
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-ZenCode0
Joint Multi-Scale Tone Mapping and Denoising for HDR Image EnhancementCode0
Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language ModelsCode0
Show:102550
← PrevPage 89 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified