SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,319 code links4,818 tasks

Papers

Showing 451500 of 658356 papers

TitleStatusHype
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese UnderstandingCode7
MambaOut: Do We Really Need Mamba for Vision?Code7
AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI AgentsCode7
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion TransformersCode7
Mirage: A Multi-Level Superoptimizer for Tensor ProgramsCode7
xLSTM: Extended Long Short-Term MemoryCode7
Labeling supervised fine-tuning data with the scaling lawCode7
PuLID: Pure and Lightning ID Customization via Contrastive AlignmentCode7
Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and OrchestrationCode7
Better Synthetic Data by Retrieving and Transforming Existing DatasetsCode7
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language ModelsCode7
MiniCheck: Efficient Fact-Checking of LLMs on Grounding DocumentsCode7
Long-form music generation with latent diffusionCode7
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer EnvironmentsCode7
Interactive Prompt Debugging with Sequence SalienceCode7
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction ModelsCode7
AutoCodeRover: Autonomous Program ImprovementCode7
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language ModelsCode7
Streamlining Ocean Dynamics Modeling with Fourier Neural Operators: A Multiobjective Hyperparameter and Architecture Optimization ApproachCode7
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image GenerationCode7
Mini-Gemini: Mining the Potential of Multi-modality Vision Language ModelsCode7
2D Gaussian Splatting for Geometrically Accurate Radiance FieldsCode7
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal EstimationCode7
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt SynergyCode7
Champ: Controllable and Consistent Human Image Animation with 3D Parametric GuidanceCode7
Foundation Models for Time Series Analysis: A Tutorial and SurveyCode7
One-Step Image Translation with Text-to-Image ModelsCode7
DSP: Dynamic Sequence Parallelism for Multi-Dimensional TransformersCode7
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding PreferencesCode7
GenAD: Generalized Predictive Model for Autonomous DrivingCode7
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image GenerationCode7
DragAnything: Motion Control for Anything using Entity RepresentationCode7
Chronos: Learning the Language of Time SeriesCode7
Better than classical? The subtle art of benchmarking quantum machine learning modelsCode7
DeepSeek-VL: Towards Real-World Vision-Language UnderstandingCode7
Improving Diffusion Models for Authentic Virtual Try-on in the WildCode7
Symmetry Considerations for Learning Task Symmetric Robot PoliciesCode7
Cradle: Empowering Foundation Agents Towards General Computer ControlCode7
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale RecommendationCode7
SoftTiger: A Clinical Foundation Model for Healthcare WorkflowsCode7
TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous VariablesCode7
StarCoder 2 and The Stack v2: The Next GenerationCode7
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language ModelsCode7
Transparent Image Layer Diffusion using Latent TransparencyCode7
Dynamic Evaluation of Large Language Models by Meta Probing AgentsCode7
Revisiting Feature Prediction for Learning Visual Representations from VideoCode7
On the Vulnerability of LLM/VLM-Controlled RoboticsCode7
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language ModelsCode7
Fast Timing-Conditioned Latent Audio DiffusionCode7
Show:102550
← PrevPage 10 of 13168Next →