SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 95019525 of 474278 papers

TitleStatusHype
Debunk the Myth of SFT GeneralizationCode0
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViTCode0
Which Programming Language and Model Work Best With LLM-as-a-Judge For Code Retrieval?Code0
Flow Autoencoders are Effective Protein TokenizersCode0
Noise-Guided Transport for Imitation LearningCode0
AutoLabs: Cognitive Multi-Agent Systems with Self-Correction for Autonomous Chemical ExperimentationCode0
Annotation-Efficient Active Test-Time Adaptation with Conformal PredictionCode0
Controlled Generation for Private Synthetic Text0
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes0
Towards Agentic OS: An LLM Agent Framework for Linux SchedulersCode0
MMPB: It's Time for Multi-Modal Personalization0
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning0
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting0
Swift: An Autoregressive Consistency Model for Efficient Weather Forecasting0
DeepCodeSeek: Real-Time API Retrieval for Context-Aware Code Generation0
Explore-Execute Chain: Towards an Efficient Structured Reasoning ParadigmCode0
TADA: Improved Diffusion Sampling with Training-free Augmented DynamicsCode0
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer ChunkingCode0
Conda: Column-Normalized Adam for Training Large Language Models FasterCode0
Boundary-to-Region Supervision for Offline Safe Reinforcement LearningCode0
Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image PairsCode0
DescribeEarth: Describe Anything for Remote Sensing ImagesCode0
PIPer: On-Device Environment Setup via Online Reinforcement LearningCode0
A-MemGuard: A Proactive Defense Framework for LLM-Based Agent MemoryCode0
OIG-Bench: A Multi-Agent Annotated Benchmark for Multimodal One-Image Guides UnderstandingCode0
Show:102550
← PrevPage 381 of 18972Next →