SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 18761900 of 177339 papers

TitleStatusHype
DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement LearningCode4
LLMMapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long ResourcesCode4
KISS-Matcher: Fast and Robust Point Cloud Registration RevisitedCode4
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal ControlCode4
Prototypical Verbalizer for Prompt-based Few-shot TuningCode4
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual ReasoningCode4
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual SynthesisCode4
Autoregressive Video Generation without Vector QuantizationCode4
Best-of-N JailbreakingCode4
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN ProblemsCode4
Continual Learning of Large Language Models: A Comprehensive SurveyCode4
KTO: Model Alignment as Prospect Theoretic OptimizationCode4
Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image RetrievalCode4
Text2SQL is Not Enough: Unifying AI and Databases with TAGCode4
Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected LossCode4
Convolutional Differentiable Logic Gate NetworksCode4
Billion-scale similarity search with GPUsCode4
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained TransformersCode4
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock LevelCode4
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous DrivingCode4
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale KnowledgeCode4
GLIGEN: Open-Set Grounded Text-to-Image GenerationCode4
Simulation-free Schrödinger bridges via score and flow matchingCode4
Constitutional AI: Harmlessness from AI FeedbackCode4
Revisiting Self-Attentive Sequential RecommendationCode4
Show:102550
← PrevPage 76 of 7094Next →