SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 52015250 of 661570 papers

TitleStatusHype
EasyText: Controllable Diffusion Transformer for Multilingual Text RenderingCode2
FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning EvaluationCode2
Optimal Density Functions for Weighted Convolution in Learning ModelsCode2
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM AgentsCode2
PDE-Transformer: Efficient and Versatile Transformers for Physics SimulationsCode2
ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RLCode2
ViStoryBench: Comprehensive Benchmark Suite for Story VisualizationCode2
Tackling View-Dependent Semantics in 3D Language Gaussian SplattingCode2
GeoVision Labeler: Zero-Shot Geospatial Classification with Vision and Language ModelsCode2
Logits-Based FinetuningCode2
TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor CoresCode2
Optimal Weighted Convolution for Classification and DenosingCode2
When Large Multimodal Models Confront Evolving Knowledge:Challenges and PathwaysCode2
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion ModulationCode2
SWE-bench Goes Live!Code2
ZeroGUI: Automating Online GUI Learning at Zero Human CostCode2
Diffusion Guidance Is a Controllable Policy Improvement OperatorCode2
D-AR: Diffusion via Autoregressive ModelsCode2
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation ModelsCode2
ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing TasksCode2
HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex MotionsCode2
MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary ProgrammingCode2
Vision Language Models are BiasedCode2
UniTEX: Universal High Fidelity Generative Texturing for 3D ShapesCode2
VERINA: Benchmarking Verifiable Code GenerationCode2
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement LearningCode2
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion ModelCode2
OpenUni: A Simple Baseline for Unified Multimodal Understanding and GenerationCode2
GSO: Challenging Software Optimization Tasks for Evaluating SWE-AgentsCode2
ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning EngineeringCode2
TextRegion: Text-Aligned Region Tokens from Frozen Image-Text ModelsCode2
Securing AI Agents with Information-Flow ControlCode2
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-TuningCode2
Model-Preserving Adaptive RoundingCode2
ZIPA: A family of efficient models for multilingual phone recognitionCode2
DRO: A Python Library for Distributionally Robust Optimization in Machine LearningCode2
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGSCode2
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPOCode2
Zero-Shot Vision Encoder Grafting via LLM SurrogatesCode2
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action ControlCode2
cadrille: Multi-modal CAD Reconstruction with Online Reinforcement LearningCode2
DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic PotentialsCode2
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail PredictionCode2
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal AlignmentCode2
Improved Representation Steering for Language ModelsCode2
SPA-RL: Reinforcing LLM Agents via Stepwise Progress AttributionCode2
Reinforcing General Reasoning without VerifiersCode2
TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-stateCode2
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI AgentsCode2
Show:102550
← PrevPage 105 of 13232Next →