SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 27512775 of 661570 papers

TitleStatusHype
MDCrow: Automating Molecular Dynamics Workflows with Large Language ModelsCode3
MetaDE: Evolving Differential Evolution by Differential EvolutionCode3
The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety AnalysisCode3
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented GenerationCode3
Cognify: Supercharging Gen-AI Workflows With Hierarchical AutotuningCode3
GENERator: A Long-Context Generative Genomic Foundation ModelCode3
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem ProvingCode3
FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading AgentsCode3
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time ScalingCode3
EVEv2: Improved Baselines for Encoder-Free Vision-Language ModelsCode3
History-Guided Video DiffusionCode3
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal UnderstandingCode3
PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural MapCode3
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency PolicyCode3
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video GenerationCode3
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context AccurayCode3
VideoRoPE: What Makes for Good Video Rotary Position Embedding?Code3
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation TasksCode3
Multi-agent Architecture Search via Agentic SupernetCode3
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare CopilotCode3
ConceptAttention: Diffusion Transformers Learn Highly Interpretable FeaturesCode3
Ola: Pushing the Frontiers of Omni-Modal Language ModelCode3
Demystifying Long Chain-of-Thought Reasoning in LLMsCode3
Transolver++: An Accurate Neural Solver for PDEs on Million-Scale GeometriesCode3
ParetoQ: Scaling Laws in Extremely Low-bit LLM QuantizationCode3
Show:102550
← PrevPage 111 of 26463Next →