SOTAVerified

Benchmarking

Papers

Showing 27012725 of 5548 papers

TitleStatusHype
Identifying patterns and recommendations of and for sustainable open data initiatives: a benchmarking-driven analysis of open government data initiatives among European countries0
Benchmarking and Enhancing Disentanglement in Concept-Residual Models0
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval0
Event-based Continuous Color Video Decompression from Single Frames0
LucidDreaming: Controllable Object-Centric 3D Generation0
Enhancing Ligand Pose Sampling for Molecular DockingCode1
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning AlgorithmsCode1
TaskBench: Benchmarking Large Language Models for Task AutomationCode6
Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction0
AlignBench: Benchmarking Chinese Alignment of Large Language ModelsCode2
Z_2 Z_2 Equivariant Quantum Neural Networks: Benchmarking against Classical Neural NetworksCode0
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy EvaluationCode1
TransOpt: Transformer-based Representation Learning for Optimization Problem Classification0
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices0
ROBBIE: Robust Bias Evaluation of Large Generative Language Models0
Biomedical knowledge graph-optimized prompt generation for large language modelsCode2
Enhancing Post-Hoc Explanation Benchmark Reliability for Image Classification0
SAIBench: A Structural Interpretation of AI for Science Through Benchmarks0
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMsCode1
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers0
SEED-Bench-2: Benchmarking Multimodal Large Language ModelsCode2
PAWS-VMK: A Unified Approach To Semi-Supervised Learning And Out-of-Distribution Detection0
Riemannian Self-Attention Mechanism for SPD Networks0
FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections0
Comprehensive Benchmarking of Entropy and Margin Based Scoring Metrics for Data Selection0
Show:102550
← PrevPage 109 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified