SOTAVerified

Benchmarking

Papers

Showing 27262750 of 5548 papers

TitleStatusHype
Benchmarking Adaptive Intelligence and Computer Vision on Human-Robot Collaboration0
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity LearningCode0
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs0
Constrained Reinforcement Learning for Safe Heat Pump ControlCode0
Tracking Everything in Robotic-Assisted Surgery0
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks0
AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy0
SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement0
Data Analysis in the Era of Generative AI0
Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark StudyCode0
CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting0
bnRep: A repository of Bayesian networks from the academic literature0
MCUBench: A Benchmark of Tiny Object Detectors on MCUs0
EarthquakeNPP: Benchmark Datasets for Earthquake Forecasting with Neural Point Processes0
Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in GraphsCode0
Benchmarking Domain Generalization Algorithms in Computational PathologyCode0
Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices0
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning0
Omnibenchmark (alpha) for continuous and open benchmarking in bioinformatics0
SEN12-WATER: A New Dataset for Hydrological Applications and its Benchmarking0
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting FrameworkCode0
HLB: Benchmarking LLMs' Humanlikeness in Language Use0
Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted DataCode0
Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling0
Ducho meets Elliot: Large-scale Benchmarks for Multimodal RecommendationCode0
Show:102550
← PrevPage 110 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified