SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 28012850 of 659983 papers

TitleStatusHype
TxSum: User-Centered Ethereum Transaction Understanding with Micro-Level Semantic Grounding0
KeyframeFace: Language-Driven Facial Animation via Semantic Keyframes0
Speculative Decoding: Performance or Illusion?0
A Comprehensive Benchmark of Histopathology Foundation Models for Kidney Digital Pathology Images0
Trajectory-Optimized Time Reparameterization for Learning-Compatible Reduced-Order Modeling of Stiff Dynamical Systems0
When Openclaw Agents Learn from Each Other: Insights from Emergent AI Agent Communities for Human-AI Partnership in Education0
DexGrasp-Zero: A Morphology-Aligned Policy for Zero-Shot Cross-Embodiment Dexterous Grasping0
ShuttleEnv: An Interactive Data-Driven RL Environment for Badminton Strategy Modeling0
VisionNVS: Self-Supervised Inpainting for Novel View Synthesis under the Virtual-Shift Paradigm0
FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion0
KA2L: A Knowledge-Aware Active Learning Framework for LLMs0
ReLaGS: Relational Language Gaussian Splatting0
Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment0
Governed Memory: A Production Architecture for Multi-Agent Workflows0
Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding0
"I'm Not Reading All of That": Understanding Software Engineers' Level of Cognitive Engagement with Agentic Coding Assistants0
HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment0
Interpretable Context Methodology: Folder Structure as Agentic Architecture0
Simple Additions, Substantial Gains: Expanding Scripts, Languages, and Lineage Coverage in URIEL+0
CoT-PL: Chain-of-Thought Pseudo-Labeling for Open-Vocabulary Object DetectionCode0
Deep learning and the rate of approximation by flows0
Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport0
AR-Flow VAE: A Structured Autoregressive Flow Prior Variational Autoencoder for Unsupervised Blind Source Separation0
The Comprehension-Gated Agent Economy: A Robustness-First Architecture for AI Economic Agency0
World Reconstruction From Inconsistent Views0
Neural Pushforward Samplers for the Fokker-Planck Equation on Embedded Riemannian Manifolds0
Attention-guided Evidence Grounding for Spoken Question Answering0
Explanations Go Linear: Post-hoc Explainability for Tabular Data with Interpretable Meta-Encoding0
Hebbian Physics Networks: A Self-Organizing Computational Architecture Based on Local Physical Laws0
ReviewScore: Misinformed Peer Review Detection with Large Language Models0
On the identifiability of causal graphs with multiple environments0
Provably Safe Model Updates0
Efficient Cross-Domain Offline Reinforcement Learning with Dynamics- and Value-Aligned Data Filtering0
The Moralization Corpus: Frame-Based Annotation and Analysis of Moralizing Speech Acts across Diverse Text Genres0
Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning0
Global Optimization By Gradient From Hierarchical Score-Matching Spaces0
Federated Causal Representation Learning in State-Space Systems for Decentralized Counterfactual Reasoning0
CogGen: Cognitive-Load-Informed Fully Unsupervised Deep Generative Modeling for Compressively Sampled MRI Reconstruction0
LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis0
Event-Driven Video Generation0
Next-Frame Decoding for Ultra-Low-Bitrate Image Compression with Video Diffusion Priors0
NV-Bench: Benchmark of Nonverbal Vocalization Synthesis for Expressive Text-to-Speech Generation0
EngGPT2: Sovereign, Efficient and Open Intelligence0
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas0
HGP-Mamba: Integrating Histology and Generated Protein Features for Mamba-based Multimodal Survival Risk PredictionCode0
Draft-and-Prune: Improving the Reliability of Auto-formalization for Logical Reasoning0
ConfusionBench: An Expert-Validated Benchmark for Confusion Recognition and Localization in Educational Videos0
Directing the Narrative: A Finetuning Method for Controlling Coherence and Style in Story Generation0
Embedding World Knowledge into Tabular Models: Towards Best Practices for Embedding Pipeline Design0
Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing0
Show:102550
← PrevPage 57 of 13200Next →