SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 66266650 of 474278 papers

TitleStatusHype
PARROT: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs0
TBT-Former: Learning Temporal Boundary Distributions for Action LocalizationCode0
FreeSwim: Revisiting Sliding-Window Attention Mechanisms for Training-Free Ultra-High-Resolution Video GenerationCode0
PromptBridge: Cross-Model Prompt Transfer for Large Language Models0
AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimateCode0
Measuring and Guiding Monosemanticity0
Adaptive Nonlinear Vector Autoregression: Robust Forecasting for Noisy Chaotic Time Series0
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection0
Extended Physics Informed Neural Network for Hyperbolic Two-Phase Flow in Porous MediaCode0
Real-World Reinforcement Learning of Active Perception Behaviors0
EGG-Fusion: Efficient 3D Reconstruction with Geometry-aware Gaussian Surfel on the Fly0
Rethinking Intracranial Aneurysm Vessel Segmentation: A Perspective from Computational Fluid Dynamics ApplicationsCode0
FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution0
MDiff4STR: Mask Diffusion Model for Scene Text Recognition0
T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction TuningCode0
PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency RewardsCode0
Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task MergingCode0
MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 LanguagesCode0
QuantumCanvas: A Multimodal Benchmark for Visual Learning of Atomic InteractionsCode0
DrawingBench: Evaluating Spatial Reasoning and UI Interaction Capabilities of Large Language Models through Mouse-Based Drawing TasksCode0
GFT: Graph Feature Tuning for Efficient Point Cloud AnalysisCode0
One-to-All Animation: Alignment-Free Character Animation and Image Pose TransferCode0
VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette RenderingCode0
Toward a benchmark for CTR prediction in online advertising: datasets, evaluation protocols and perspectivesCode0
Efficient Training of Diffusion Mixture-of-Experts Models: A Practical RecipeCode0
Show:102550
← PrevPage 266 of 18972Next →