SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 34513460 of 474278 papers

TitleStatusHype
A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback LearningCode3
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future DirectionsCode3
GameBench: Evaluating Strategic Reasoning Abilities of LLM AgentsCode3
Multi-Head RAG: Solving Multi-Aspect Problems with LLMsCode3
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the WildCode3
FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language ModelsCode3
CRAG -- Comprehensive RAG BenchmarkCode3
VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed TomographyCode3
Probabilistic Weather Forecasting with Hierarchical Graph Neural NetworksCode3
VideoTetris: Towards Compositional Text-to-Video GenerationCode3
Show:102550
← PrevPage 346 of 47428Next →