SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 72517275 of 474278 papers

TitleStatusHype
MIMIC-4-Ext-22MCTS: A 22 Millions-Event Temporal Clinical Time-Series Dataset with Relative Timestamp for Risk PredictionCode0
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model0
Fine-grained Image Quality Assessment for Perceptual Image Restoration0
VocalBench-zh: Decomposing and Benchmarking the Speech Conversational Abilities in Mandarin ContextCode0
WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance0
Are Graph Transformers Necessary? Efficient Long-Range Message Passing with Fractal Nodes in MPNNs0
One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlowCode0
TokenSqueeze: Performance-Preserving Compression for Reasoning LLMsCode0
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance0
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image0
UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity0
Segment Anything Across Shots: A Method and Benchmark0
Uncertainty-Calibrated Prediction of Randomly-Timed Biomarker Trajectories with Conformal BandsCode0
Start Small, Think Big: Curriculum-based Relative Policy Optimization for Visual GroundingCode0
LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering0
CAMAR: Continuous Actions Multi-Agent Routing0
SoK: Large Language Model Copyright Auditing via FingerprintingCode0
Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal FrameworkCode0
Efficient SAR Vessel Detection for FPGA-Based On-Satellite Sensing0
Enhancing All-to-X Backdoor Attacks with Optimized Target Class MappingCode0
P1: Mastering Physics Olympiads with Reinforcement Learning0
ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic InferenceCode0
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization0
Temporal Realism Evaluation of Generated Videos Using Compressed-Domain Motion VectorsCode0
Towards Methane Detection Onboard SatellitesCode0
Show:102550
← PrevPage 291 of 18972Next →