SOTAVerified

Benchmarking

Papers

Showing 35013550 of 5548 papers

TitleStatusHype
NEXT-EVAL: Next Evaluation of Traditional and LLM Web Data Record Extraction0
Next-generation MRD assays: do we have the tools to evaluate them properly?0
NL2KQL: From Natural Language to Kusto Query0
Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E50
NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems0
No Dataset Needed for Downstream Knowledge Benchmarking: Response Dispersion Inversely Correlates with Accuracy on Domain-specific QA0
NODDI-SH: a computational efficient NODDI extension for fODF estimation in diffusion MRI0
Node Classification Meets Link Prediction on Knowledge Graphs0
Nodule detection and generation on chest X-rays: NODE21 Challenge0
NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries0
NoisyHate: Mining Online Human-Written Perturbations for Realistic Robustness Benchmarking of Content Moderation Models0
Noisy intermediate-scale quantum (NISQ) algorithms0
InferBench: Understanding Deep Learning Inference Serving with an Automatic Benchmarking System0
Non-Contextual Modeling of Sarcasm using a Neural Network Benchmark0
Non-Reference Quality Assessment for Medical Imaging: Application to Synthetic Brain MRIs0
Nonstochastic Bandits with Infinitely Many Experts0
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding0
Not Every Tree Is a Forest: Benchmarking Forest Types from Satellite Remote Sensing0
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription0
NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI0
NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open Worlds0
Long Short-Term Memory with Gate and State Level Fusion for Light Field-Based Face Recognition0
Novel Real-Time EMT-TS Modeling Architecture for Feeder Blackstart Simulations0
NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics0
Now you see me: evaluating performance in long-term visual tracking0
N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition0
NTP : A Neural Network Topology Profiler0
Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions0
Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models0
NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks0
NuwaTS: a Foundation Model Mending Every Incomplete Time Series0
Object Detection based on LIDAR Temporal Pulses using Spiking Neural Networks0
OctoPath: An OcTree Based Self-Supervised Learning Approach to Local Trajectory Planning for Mobile Robots0
OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking0
Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection0
Off-policy Evaluation for Payments at Adyen0
OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics0
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking0
Omnibenchmark (alpha) for continuous and open benchmarking in bioinformatics0
OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions0
OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB0
On Benchmarking Code LLMs for Android Malware Analysis0
On Benchmarking Iris Recognition within a Head-mounted Display for AR/VR Application0
On Continual Model Refinement in Out-of-Distribution Data Streams0
On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events0
On Distribution Grid Optimal Power Flow Development and Integration0
ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities0
One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision0
One of these (Few) Things is Not Like the Others0
One-Shot Federated Learning with Classifier-Free Diffusion Models0
Show:102550
← PrevPage 71 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified