SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 38013850 of 177340 papers

TitleStatusHype
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference OptimizationCode3
WHAC: World-grounded Humans and CamerasCode3
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic EvaluationsCode3
Generative AI Act II: Test Time Scaling Drives Cognition EngineeringCode3
ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language ModelsCode3
Cognify: Supercharging Gen-AI Workflows With Hierarchical AutotuningCode3
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AICode3
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI AgentsCode3
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language ModelsCode3
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation ModelsCode3
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language ModelsCode3
Chain of Draft: Thinking Faster by Writing LessCode3
Data Augmentation for Sequential Recommendation: A SurveyCode3
Programming Every Example: Lifting Pre-training Data Quality like Experts at ScaleCode3
MLVU: Benchmarking Multi-task Long Video UnderstandingCode3
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image RecognitionCode3
ECON: Explicit Clothed humans Optimized via Normal integrationCode3
Partially Rewriting a Transformer in Natural LanguageCode3
A Clean Slate for Offline Reinforcement LearningCode3
MarioGPT: Open-Ended Text2Level Generation through Large Language ModelsCode3
PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural MapCode3
VisualRWKV: Exploring Recurrent Neural Networks for Visual Language ModelsCode3
OS-ATLAS: A Foundation Action Model for Generalist GUI AgentsCode3
HadaCore: Tensor Core Accelerated Hadamard Transform KernelCode3
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token RecyclingCode3
Description Boosting for Zero-Shot Entity and Relation ClassificationCode3
LibCity: A Unified Library Towards Efficient and Comprehensive Urban Spatial-Temporal PredictionCode3
Bird-Eye Transformers for Text Generation ModelsCode3
Lightplane: Highly-Scalable Components for Neural 3D FieldsCode3
Apollo: Band-sequence Modeling for High-Quality Audio RestorationCode3
ExTrans: Multilingual Deep Reasoning Translation via Exemplar-Enhanced Reinforcement LearningCode3
Image Quality Assessment for Magnetic Resonance ImagingCode3
RoadBEV: Road Surface Reconstruction in Bird's Eye ViewCode3
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the MetaverseCode3
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning TasksCode3
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory ModelCode3
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language InterfaceCode3
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationCode3
RoMa: Robust Dense Feature MatchingCode3
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion ModelsCode3
ViTamin: Designing Scalable Vision Models in the Vision-Language EraCode3
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching OptimizationCode3
Deep Learning for Multivariate Time Series Imputation: A SurveyCode3
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object InteractionsCode3
PathoTune: Adapting Visual Foundation Model to Pathological SpecialistsCode3
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RLCode3
Bench: Extending Long Context Evaluation Beyond 100K TokensCode3
CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous DrivingCode3
MMSearch-R1: Incentivizing LMMs to SearchCode3
Taming Stable Diffusion for Text to 360° Panorama Image GenerationCode3
Show:102550
← PrevPage 77 of 3547Next →