SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 83268350 of 474278 papers

TitleStatusHype
LEGO: A Lightweight and Efficient Multiple-Attribute Unlearning Framework for Recommender SystemsCode0
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Reward Design0
Sherlock: Self-Correcting Reasoning in Vision-Language Models0
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning0
PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo MatchingCode0
RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time ScalingCode0
COS3D: Collaborative Open-Vocabulary 3D SegmentationCode0
SynTSBench: Rethinking Temporal Pattern Learning in Deep Learning Models for Time SeriesCode0
DesignX: Human-Competitive Algorithm Designer for Black-Box OptimizationCode0
FuseUNet: A Multi-Scale Feature Fusion Method for U-like NetworksCode0
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?Code0
A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic SystemCode0
Towards Robust Zero-Shot Reinforcement LearningCode0
Illusions of reflection: open-ended task reveals systematic failures in Large Language Models' reflective reasoningCode0
A Renaissance of Explicit Motion Information Mining from Transformers for Action RecognitionCode0
Calibrating Multimodal Consensus for Emotion RecognitionCode0
Learning To Defer To A Population With Limited DemonstrationsCode0
Revisiting Logit Distributions for Reliable Out-of-Distribution DetectionCode0
Teaching Language Models to Reason with ToolsCode0
Attentive Convolution: Unifying the Expressivity of Self-Attention with Convolutional EfficiencyCode0
Federated Learning via Meta-Variational DropoutCode0
FedGPS: Statistical Rectification Against Data Heterogeneity in Federated LearningCode0
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test CasesCode0
ROOT: Rethinking Offline Optimization as Distributional Translation via Probabilistic BridgeCode0
VT-FSL: Bridging Vision and Text with LLMs for Few-Shot LearningCode0
Show:102550
← PrevPage 334 of 18972Next →