SOTAVerified

Benchmarking

Papers

Showing 15761600 of 5548 papers

TitleStatusHype
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset0
Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic AnalysisCode1
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity LearningCode0
Benchmarking Adaptive Intelligence and Computer Vision on Human-Robot Collaboration0
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs0
Match Stereo Videos via Bidirectional Alignment0
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language ModelsCode2
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks0
Tracking Everything in Robotic-Assisted Surgery0
A Survey on Graph Neural Networks for Remaining Useful Life Prediction: Methodologies, Evaluation and Future TrendsCode2
AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy0
Constrained Reinforcement Learning for Safe Heat Pump ControlCode0
SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement0
EarthquakeNPP: Benchmark Datasets for Earthquake Forecasting with Neural Point Processes0
bnRep: A repository of Bayesian networks from the academic literature0
CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting0
MCUBench: A Benchmark of Tiny Object Detectors on MCUs0
Data Analysis in the Era of Generative AI0
Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark StudyCode0
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement LearningCode1
The Elephant in the Room: Towards A Reliable Time-Series Anomaly Detection BenchmarkCode3
Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in GraphsCode0
MALPOLON: A Framework for Deep Species Distribution ModelingCode1
Omnibenchmark (alpha) for continuous and open benchmarking in bioinformatics0
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning0
Show:102550
← PrevPage 64 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified