SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 78017825 of 474278 papers

TitleStatusHype
LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling0
Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept NetworkCode0
M3PD Dataset: Dual-view Photoplethysmography (PPG) Using Front-and-rear Cameras of Smartphones in Lab and Clinical SettingsCode0
MammoClean: Toward Reproducible and Bias-Aware AI in Mammography through Dataset HarmonizationCode0
Zero-Shot Multi-Animal Tracking in the WildCode0
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual RepresentationCode0
MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement LearningCode0
Evaluating Large Language Models for Detecting AntisemitismCode0
Weakly Supervised Object Segmentation by Background Conditional DivergenceCode0
Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce ScenariosCode0
WXSOD: A Benchmark for Robust Salient Object Detection in Adverse Weather ConditionsCode0
Exploring Human-AI Conceptual Alignment through the Prism of ChessCode0
Demo: Statistically Significant Results On Biases and Errors of LLMs Do Not Guarantee Generalizable ResultsCode0
Monocular absolute depth estimation from endoscopy via domain-invariant feature learning and latent consistencyCode0
A Novel Grouping-Based Hybrid Color Correction Algorithm for Color Point CloudsCode0
SigmaCollab: An Application-Driven Dataset for Physically Situated CollaborationCode0
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and ClassificationCode0
NABench: Large-Scale Benchmarks of Nucleotide Foundation Models for Fitness PredictionCode0
Identity Increases Stability in Neural Cellular AutomataCode0
MCFCN: Multi-View Clustering via a Fusion-Consensus Graph Convolutional NetworkCode0
Efficient Tool-Calling Multi-Expert NPC Agent for Commonsense Persona-Grounded DialogueCode0
Vote-in-Context: Turning VLMs into Zero-Shot Rank FusersCode0
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process0
When to Trust the Answer: Question-Aligned Semantic Nearest Neighbor Entropy for Safer Surgical VQACode0
TPS-Bench: Evaluating AI Agents' Tool Planning \& Scheduling Abilities in Compounding TasksCode0
Show:102550
← PrevPage 313 of 18972Next →