SOTAVerified

Benchmarking

Papers

Showing 32013225 of 5548 papers

TitleStatusHype
Benchmarking projective simulation in navigation problems0
Benchmarking Processor Performance by Multi-Threaded Machine Learning Algorithms0
JuStRank: Benchmarking LLM Judges for System Ranking0
Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images0
Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling0
AERF: Adaptive ensemble random fuzzy algorithm for anomaly detection in cloud computing0
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models0
Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks0
KemenkeuGPT: Leveraging a Large Language Model on Indonesia's Government Financial Data and Regulations to Enhance Decision Making0
Keras Sig: Efficient Path Signature Computation on GPU in Keras 30
KetGPT -- Dataset Augmentation of Quantum Circuits using Transformers0
Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy0
Classification of Single-View Object Point Clouds0
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design0
Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition0
Benchmarking Poisoning Attacks against Retrieval-Augmented Generation0
Benchmarking person re-identification approaches and training datasets for practical real-world implementations0
Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations0
Knowledge-aware contrastive heterogeneous molecular graph learning0
AEON: Adaptive Estimation of Instance-Dependent In-Distribution and Out-of-Distribution Label Noise for Robust Learning0
TIIF-Bench: How Does Your T2I Model Follow Your Instructions?0
Knowledge Sharing in Manufacturing using Large Language Models: User Evaluation and Model Benchmarking0
3D Compositional Zero-shot Learning with DeCompositional Consensus0
Benchmarking Performance of Deep Learning Model for Material Segmentation on Two HPC Systems0
Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges0
Show:102550
← PrevPage 129 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified