SOTAVerified

Benchmarking

Papers

Showing 14761500 of 5548 papers

TitleStatusHype
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
Benchmarking Graph Neural Networks for FMRI analysisCode1
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMsCode1
BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose EstimationCode1
ClearPose: Large-scale Transparent Object Dataset and BenchmarkCode1
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice TextCode1
Performance Evaluation of Deep Transfer Learning on Multiclass Identification of Common Weed Species in Cotton Production SystemsCode1
PGDQN: Preference-Guided Deep Q-NetworkCode1
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
Beyond neural scaling laws: beating power law scaling via data pruningCode1
Beyond Normal: On the Evaluation of Mutual Information EstimatorsCode1
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language ModelsCode1
dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal ProcessingCode1
PLANTAIN: Diffusion-inspired Pose Score Minimization for Fast and Accurate Molecular DockingCode1
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph EngineeringCode1
ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core LearningCode1
Kvasir-Instrument: Diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopyCode1
RADIATE: A Radar Dataset for Automotive Perception in Bad WeatherCode1
POGEMA: A Benchmark Platform for Cooperative Multi-Agent PathfindingCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
Positional Encoding in Transformer-Based Time Series Models: A SurveyCode1
PowerMamba: A Deep State Space Model and Comprehensive Benchmark for Time Series Prediction in Electric Power SystemsCode1
Benchmarking Graph Learning for Drug-Drug Interaction Prediction0
A practical generalization metric for deep networks benchmarking0
AERF: Adaptive ensemble random fuzzy algorithm for anomaly detection in cloud computing0
Show:102550
← PrevPage 60 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified