SOTAVerified

Benchmarking

Papers

Showing 38513900 of 5548 papers

TitleStatusHype
Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture0
Benchmarking CNN on 3D Anatomical Brain MRI: Architectures, Data Augmentation and Deep Ensemble Learning0
Benchmarking Clinical Decision Support Search0
No Dataset Needed for Downstream Knowledge Benchmarking: Response Dispersion Inversely Correlates with Accuracy on Domain-specific QA0
NODDI-SH: a computational efficient NODDI extension for fODF estimation in diffusion MRI0
Benchmarking Classical, Deep, and Generative Models for Human Activity Recognition0
Node Classification Meets Link Prediction on Knowledge Graphs0
Nodule detection and generation on chest X-rays: NODE21 Challenge0
Training Transformers with Enforced Lipschitz Constants0
NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries0
NoisyHate: Mining Online Human-Written Perturbations for Realistic Robustness Benchmarking of Content Moderation Models0
Noisy intermediate-scale quantum (NISQ) algorithms0
Trajectory Normalized Gradients for Distributed Optimization0
ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities0
InferBench: Understanding Deep Learning Inference Serving with an Automatic Benchmarking System0
Non-Contextual Modeling of Sarcasm using a Neural Network Benchmark0
Non-Reference Quality Assessment for Medical Imaging: Application to Synthetic Brain MRIs0
Nonstochastic Bandits with Infinitely Many Experts0
TRAM: Benchmarking Temporal Reasoning for Large Language Models0
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding0
Not Every Tree Is a Forest: Benchmarking Forest Types from Satellite Remote Sensing0
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription0
NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI0
NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open Worlds0
Long Short-Term Memory with Gate and State Level Fusion for Light Field-Based Face Recognition0
Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies0
Novel Real-Time EMT-TS Modeling Architecture for Feeder Blackstart Simulations0
NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics0
Now you see me: evaluating performance in long-term visual tracking0
CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs0
N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition0
Transactive Local Energy Markets Enable Community-Level Resource Coordination Using Individual Rewards0
Benchmarking Chest X-ray Diagnosis Models Across Multinational Datasets0
NTP : A Neural Network Topology Profiler0
Benchmarking changepoint detection algorithms on cardiac time series0
Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions0
Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models0
NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks0
NuwaTS: a Foundation Model Mending Every Incomplete Time Series0
Benchmarking CFAR and CNN-based Peak Detection Algorithms in ISAC under Hardware Impairments0
Benchmarking Causal Study to Interpret Large Language Models for Source Code0
Object Detection based on LIDAR Temporal Pulses using Spiking Neural Networks0
Benchmarking Burst Super-Resolution for Polarization Images: Noise Dataset and Analysis0
Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment0
Benchmarking BioRelEx for Entity Tagging and Relation Extraction0
Benchmarking Biopharmaceuticals Retrieval-Augmented Generation Evaluation0
OctoPath: An OcTree Based Self-Supervised Learning Approach to Local Trajectory Planning for Mobile Robots0
Benchmarking Biomedical Nested NER and Relation Extraction Models0
OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking0
Benchmarking Bias in Large Language Models during Role-Playing0
Show:102550
← PrevPage 78 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified