SOTAVerified

Benchmarking

Papers

Showing 751775 of 5548 papers

TitleStatusHype
Towards Sim-to-Real Industrial Parts Classification with Synthetic DatasetCode1
Implicit Multi-Spectral Transformer: An Lightweight and Effective Visible to Infrared Image Translation ModelCode1
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM AgentsCode1
PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal ModelCode1
Outlier-Efficient Hopfield Layers for Large Transformer-Based ModelsCode1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPTCode1
Atom-Level Optical Chemical Structure Recognition with Limited SupervisionCode1
PREGO: online mistake detection in PRocedural EGOcentric videosCode1
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal CorruptionsCode1
Benchmarking Counterfactual Image GenerationCode1
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAMCode1
RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of TransformersCode1
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic ObjectCode1
Towards Image Ambient Lighting NormalizationCode1
Benchmarking Object Detectors with COCO: A New Path ForwardCode1
ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringCode1
CodeS: Natural Language to Code Repository via Multi-Layer SketchCode1
Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmarkCode1
DomainLab: A modular Python package for domain generalization in deep learningCode1
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization CorrelationsCode1
RoDLA: Benchmarking the Robustness of Document Layout Analysis ModelsCode1
Can 3D Vision-Language Models Truly Understand Natural Language?Code1
Practical End-to-End Optical Music Recognition for Pianoform MusicCode1
MELTing point: Mobile Evaluation of Language TransformersCode1
ERASE: Benchmarking Feature Selection Methods for Deep Recommender SystemsCode1
Show:102550
← PrevPage 31 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified