SOTAVerified

Benchmarking

Papers

Showing 38763900 of 5548 papers

TitleStatusHype
Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies0
Novel Real-Time EMT-TS Modeling Architecture for Feeder Blackstart Simulations0
NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics0
Now you see me: evaluating performance in long-term visual tracking0
CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs0
N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition0
Transactive Local Energy Markets Enable Community-Level Resource Coordination Using Individual Rewards0
Benchmarking Chest X-ray Diagnosis Models Across Multinational Datasets0
NTP : A Neural Network Topology Profiler0
Benchmarking changepoint detection algorithms on cardiac time series0
Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions0
Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models0
NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks0
NuwaTS: a Foundation Model Mending Every Incomplete Time Series0
Benchmarking CFAR and CNN-based Peak Detection Algorithms in ISAC under Hardware Impairments0
Benchmarking Causal Study to Interpret Large Language Models for Source Code0
Object Detection based on LIDAR Temporal Pulses using Spiking Neural Networks0
Benchmarking Burst Super-Resolution for Polarization Images: Noise Dataset and Analysis0
Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment0
Benchmarking BioRelEx for Entity Tagging and Relation Extraction0
Benchmarking Biopharmaceuticals Retrieval-Augmented Generation Evaluation0
OctoPath: An OcTree Based Self-Supervised Learning Approach to Local Trajectory Planning for Mobile Robots0
Benchmarking Biomedical Nested NER and Relation Extraction Models0
OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking0
Benchmarking Bias in Large Language Models during Role-Playing0
Show:102550
← PrevPage 156 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified