SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 83518375 of 474278 papers

TitleStatusHype
Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement LearningCode0
DAIL: Beyond Task Ambiguity for Language-Conditioned Reinforcement LearningCode0
FerretNet: Efficient Synthetic Image Detection via Local Pixel DependenciesCode0
Multi-modal Co-learning for Earth Observation: Enhancing single-modality models via modality collaboration0
Seabed-Net: A multi-task network for joint bathymetry estimation and seabed classification from remote sensing imagery in shallow watersCode0
The Temporal Graph of Bitcoin TransactionsCode0
KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical KnowledgeCode0
dInfer: An Efficient Inference Framework for Diffusion Language ModelsCode0
One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-ResolutionCode0
Democratizing AI scientists using ToolUniverse0
kabr-tools: Automated Framework for Multi-Species Behavioral Monitoring0
ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints0
Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs0
MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation ModelsCode0
HAD: HAllucination Detection Language Models Based on a Comprehensive Hallucination TaxonomyCode0
KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints0
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos0
CARES: Context-Aware Resolution Selector for VLMs0
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.10
HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application0
The Massive Legal Embedding Benchmark (MLEB)0
Deep Research Brings Deeper Harm0
Data Efficient Adaptation in Large Language Models via Continuous Low-Rank Fine-TuningCode0
ToMMeR -- Efficient Entity Mention Detection from Large Language Models0
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing0
Show:102550
← PrevPage 335 of 18972Next →