SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 12511300 of 659983 papers

TitleStatusHype
ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMs4
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs4
SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds4
SpatialTrackerV2: 3D Point Tracking Made EasyCode4
Streaming 4D Visual Geometry TransformerCode4
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow MatchingCode4
XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQLCode4
Energy-Based Transformers are Scalable Learners and ThinkersCode4
Kwai Keye-VL Technical ReportCode4
A Survey on Vision-Language-Action Models for Autonomous DrivingCode4
WorldVLA: Towards Autoregressive Action World ModelCode4
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT ModulationCode4
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code GenerationCode4
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning AgentsCode4
VLN-R1: Vision-Language Navigation via Reinforcement Fine-TuningCode4
YOLOv11-RGBT: Towards a Comprehensive Single-Stage Multispectral Object Detection FrameworkCode4
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow MatchingCode4
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and MetricsCode4
DeepResearch Bench: A Comprehensive Benchmark for Deep Research AgentsCode4
Ming-Omni: A Unified Multimodal Model for Perception and GenerationCode4
Efficient Part-level 3D Object Generation via Dual Volume PackingCode4
SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion RefinementCode4
MiMo-VL Technical ReportCode4
Seed-Coder: Let the Code Model Curate Data for ItselfCode4
Pseudo-Simulation for Autonomous DrivingCode4
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and GenerationCode4
Co-Evolving LLM Coder and Unit Tester via Reinforcement LearningCode4
ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and UnderstandingCode4
RewardBench 2: Advancing Reward Model EvaluationCode4
GigaAM: Efficient Self-Supervised Learner for Speech RecognitionCode4
AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale CorporaCode4
RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global IlluminationCode4
Skywork Open Reasoner 1 Technical ReportCode4
ImgEdit: A Unified Image Editing Dataset and BenchmarkCode4
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-EvolutionCode4
DeepInverse: A Python package for solving imaging inverse problems with deep learningCode4
On Path to Multimodal Historical Reasoning: HistBench and HistAgentCode4
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data GenerationCode4
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video GenerationCode4
LORE: Lagrangian-Optimized Robust Embeddings for Visual EncodersCode4
Partition Generative Modeling: Masked Modeling Without MasksCode4
A Survey of LLM DATACode4
Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal LearningCode4
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language ModelsCode4
Qiskit Machine Learning: an open-source library for quantum machine learning tasks at scale on quantum hardware and classical simulatorsCode4
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement LearningCode4
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement LearningCode4
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPOCode4
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory SynthesisCode4
lmgame-Bench: How Good are LLMs at Playing Games?Code4
Show:102550
← PrevPage 26 of 13200Next →