SOTAVerified

Benchmarking

Papers

Showing 626650 of 5548 papers

TitleStatusHype
A Critical Assessment of State-of-the-Art in Entity AlignmentCode1
dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal ProcessingCode1
DCL-Net: Deep Correspondence Learning Network for 6D Pose EstimationCode1
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model EvaluationCode1
Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language ModelsCode1
Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working MemoryCode1
Benchmarking Image Retrieval for Visual LocalizationCode1
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAMCode1
Benchmarking LLMs' Swarm intelligenceCode1
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasetsCode1
Benchmarking Language Model Creativity: A Case Study on Code GenerationCode1
Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet DatasetsCode1
Decoding the Underlying Meaning of Multimodal Hateful MemesCode1
DFGC 2021: A DeepFake Game CompetitionCode1
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite ImageryCode1
Data-Driven Denoising of Stationary Accelerometer SignalsCode1
D2S: Document-to-Slide Generation Via Query-Based Text SummarizationCode1
DACBench: A Benchmark Library for Dynamic Algorithm ConfigurationCode1
Data Generating Process to Evaluate Causal Discovery Techniques for Time Series DataCode1
Align and Distill: Unifying and Improving Domain Adaptive Object DetectionCode1
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
Curious Hierarchical Actor-Critic Reinforcement LearningCode1
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language ModelsCode1
DataRec: A Python Library for Standardized and Reproducible Data Management in Recommender SystemsCode1
CRoW: Benchmarking Commonsense Reasoning in Real-World TasksCode1
Show:102550
← PrevPage 26 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified