SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,018 code links4,818 tasks

Papers

Showing 51100 of 658356 papers

TitleStatusHype
Structured 3D Latents for Scalable and Versatile 3D GenerationCode11
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V TrustworthinessCode11
Qwen2.5-VL Technical ReportCode11
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precisionCode11
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion ModelCode11
Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language ModelsCode11
ROMAS: A Role-Based Multi-Agent System for Database monitoring and PlanningCode11
Agent S: An Open Agentic Framework that Uses Computers Like a HumanCode11
The AI Scientist: Towards Fully Automated Open-Ended Scientific DiscoveryCode11
WebLLM: A High-Performance In-Browser LLM Inference EngineCode11
Deep Time Series Models: A Comprehensive Survey and BenchmarkCode11
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model ScalingCode11
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting ControlCode11
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task AutomationCode11
Wan: Open and Advanced Large-Scale Video Generative ModelsCode11
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets GenerationCode11
SCORE: Systematic COnsistency and Robustness Evaluation for Large Language ModelsCode11
Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language ModelsCode11
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language ModelsCode11
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMsCode11
Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the WayCode11
WebDancer: Towards Autonomous Information Seeking AgencyCode11
YOLOE: Real-Time Seeing AnythingCode11
VGGT: Visual Geometry Grounded TransformerCode11
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient RoboticsCode11
Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection AdaptationCode11
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space DualityCode11
Packing Input Frame Context in Next-Frame Prediction Models for Video GenerationCode11
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech TokensCode11
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language ModelsCode11
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use AgentsCode11
TinyLlama: An Open-Source Small Language ModelCode11
Open-Sora Plan: Open-Source Large Video Generation ModelCode11
YOLOv10: Real-Time End-to-End Object DetectionCode11
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head SynthesisCode11
Very Large-Scale Multi-Agent Simulation in AgentScopeCode11
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic TokensCode11
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and GenerationCode11
WebSailor: Navigating Super-human Reasoning for Web AgentCode11
Magika: AI-Powered Content-Type DetectionCode11
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech SystemCode11
Scaling Synthetic Data Creation with 1,000,000,000 PersonasCode11
AutoDev: Automated AI-Driven DevelopmentCode11
SWE-agent: Agent-Computer Interfaces Enable Automated Software EngineeringCode11
HybridFlow: A Flexible and Efficient RLHF FrameworkCode11
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code IntelligenceCode11
Qwen2.5-Coder Technical ReportCode11
EAP4EMSIG -- Experiment Automation Pipeline for Event-Driven Microscopy to Smart Microfluidic Single-Cells AnalysisCode11
AgentScope: A Flexible yet Robust Multi-Agent PlatformCode11
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive SecurityCode11
Show:102550
← PrevPage 2 of 13168Next →