SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,364 code links4,818 tasks

Papers

Showing 401450 of 180343 papers

TitleStatusHype
Skywork-R1V3 Technical ReportCode7
Interactive Prompt Debugging with Sequence SalienceCode7
gsplat: An Open-Source Library for Gaussian SplattingCode7
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained TransformersCode7
EvoAgentX: An Automated Framework for Evolving Agentic WorkflowsCode7
DataComp-LM: In search of the next generation of training sets for language modelsCode7
VITA: Towards Open-Source Interactive Omni Multimodal LLMCode7
Segment Anything in Medical Images and Videos: Benchmark and DeploymentCode7
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language ModelsCode7
Cradle: Empowering Foundation Agents Towards General Computer ControlCode7
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer EnvironmentsCode7
Efficient Track AnythingCode7
Streamlining Ocean Dynamics Modeling with Fourier Neural Operators: A Multiobjective Hyperparameter and Architecture Optimization ApproachCode7
Embedding Atlas: Low-Friction, Interactive Embedding VisualizationCode7
A Library for Learning Neural OperatorsCode7
Kimi k1.5: Scaling Reinforcement Learning with LLMsCode7
AutoCodeRover: Autonomous Program ImprovementCode7
S*: Test Time Scaling for Code GenerationCode7
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection TransformerCode7
AI-Researcher: Autonomous Scientific InnovationCode7
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language ModelsCode7
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency ModelsCode7
Large Language Model Agent: A Survey on Methodology, Applications and ChallengesCode7
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion TransformersCode7
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the WildCode7
Logo-LLM: Local and Global Modeling with Large Language Models for Time Series ForecastingCode7
DragAnything: Motion Control for Anything using Entity RepresentationCode7
EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement LearningCode7
Efficient MedSAMs: Segment Anything in Medical Images on LaptopCode7
Aligning Anime Video Generation with Human FeedbackCode7
Chronos: Learning the Language of Time SeriesCode7
Adding Conditional Control to Text-to-Image Diffusion ModelsCode7
OASIS: Open Agent Social Interaction Simulations with One Million AgentsCode7
Muon is Scalable for LLM TrainingCode7
An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM AgentsCode7
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language ModelCode7
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark ConditionsCode7
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph DatabasesCode7
Adaptive In-conversation Team Building for Language Model AgentsCode7
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation ModelsCode7
MiniMax-01: Scaling Foundation Models with Lightning AttentionCode7
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image ManifoldCode7
MeshAnything: Artist-Created Mesh Generation with Autoregressive TransformersCode7
BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGOCode7
EAGLE-2: Faster Inference of Language Models with Dynamic Draft TreesCode7
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-InferenceCode7
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio SynthesisCode7
AniSora: Exploring the Frontiers of Animation Video Generation in the Sora EraCode7
Better than classical? The subtle art of benchmarking quantum machine learning modelsCode7
Ichigo: Mixed-Modal Early-Fusion Realtime Voice AssistantCode7
Show:102550
← PrevPage 9 of 3607Next →