SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 69516975 of 474278 papers

TitleStatusHype
RSL-SQL: Robust Schema Linking in Text-to-SQL GenerationCode2
EgoMimic: Scaling Imitation Learning via Egocentric VideoCode2
APEBench: A Benchmark for Autoregressive Neural Emulators of PDEsCode2
Language Models can Self-Lengthen to Generate Long TextsCode2
Ada-MSHyper: Adaptive Multi-Scale Hypergraph Transformer for Time Series ForecastingCode2
GPT or BERT: why not both?Code2
The Importance of Being Scalable: Improving the Speed and Accuracy of Neural Network Interatomic Potentials Across Chemical DomainsCode2
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient PerspectiveCode2
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video DetectionCode2
VecCity: A Taxonomy-guided Library for Map Entity Representation LearningCode2
What is Wrong with Perplexity for Long-context Language Modeling?Code2
End-to-End Ontology Learning with Large Language ModelsCode2
Towards Generative Ray Path Sampling for Faster Point-to-Point Ray TracingCode2
EnsIR: An Ensemble Algorithm for Image Restoration via Gaussian Mixture ModelsCode2
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control TasksCode2
CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic SegmentationCode2
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesisCode2
Consistency Diffusion Bridge ModelsCode2
Multi-Programming Language Sandbox for LLMsCode2
Multi-Agent Large Language Models for Conversational Task-SolvingCode2
Very fast Bayesian Additive Regression Trees on GPUCode2
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation GenerationCode2
SciPIP: An LLM-based Scientific Paper Idea ProposerCode2
MassSpecGym: A benchmark for the discovery and identification of moleculesCode2
$100K or 100 Days: Trade-offs when Pre-Training with Academic ResourcesCode2
Show:102550
← PrevPage 279 of 18972Next →