SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 82018225 of 474278 papers

TitleStatusHype
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-PolygraphCode2
MoA: Mixture of Sparse Attention for Automatic Large Language Model CompressionCode2
SelfReg-UNet: Self-Regularized UNet for Medical Image SegmentationCode2
Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A BenchmarkCode2
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language ModelsCode2
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian GenerationCode2
FIRST: Faster Improved Listwise Reranking with Single Token DecodingCode2
DExter: Learning and Controlling Performance Expression with Diffusion ModelsCode2
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary AlgorithmsCode2
MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency TradingCode2
LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path PlanningCode2
CodeRAG-Bench: Can Retrieval Augment Code Generation?Code2
Can LLMs Learn by Teaching for Better Reasoning? A Preliminary StudyCode2
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based FrameworkCode2
How far are today's time-series models from real-world weather forecasting applications?Code2
HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting?Code2
Asynchronous Large Language Model Enhanced Planner for Autonomous DrivingCode2
LeYOLO, New Scalable and Efficient CNN Architecture for Object DetectionCode2
TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language modelsCode2
CityNav: Language-Goal Aerial Navigation Dataset with Geographic InformationCode2
Feature Fusion Based on Mutual-Cross-Attention Mechanism for EEG Emotion RecognitionCode2
RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone DesignCode2
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized RationalesCode2
WATT: Weight Average Test-Time Adaptation of CLIPCode2
Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging casesCode2
Show:102550
← PrevPage 329 of 18972Next →