SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 91769200 of 474278 papers

TitleStatusHype
Training-Free Time Series Classification via In-Context Reasoning with LLM AgentsCode0
VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy OptimizationCode0
Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance ChoicesCode0
Belief-Calibrated Multi-Agent Consensus Seeking for Complex NLP TasksCode0
AutoEdit: Automatic Hyperparameter Tuning for Image EditingCode0
ASPO: Asymmetric Importance Sampling Policy OptimizationCode0
Benchmark It Yourself (BIY): Preparing a Dataset and Benchmarking AI Models for Scatterplot-Related TasksCode0
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language ModelsCode0
LLM-JEPA: Large Language Models Meet Joint Embedding Predictive ArchitecturesCode0
InstaGeo: Compute-Efficient Geospatial Machine Learning from Data to DeploymentCode0
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning0
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding0
GeoRemover: Removing Objects and Their Causal Visual ArtifactsCode0
Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned0
NorMuon: Making Muon more efficient and scalable0
HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video0
VCoT-Grasp: Grasp Foundation Models with Visual Chain-of-Thought Reasoning for Language-driven Grasp Generation0
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use0
Deformable Image Registration for Self-supervised Cardiac Phase Detection in Multi-View Multi-Disease Cardiac Magnetic Resonance ImagesCode0
AgeBooth: Controllable Facial Aging and Rejuvenation via Diffusion Models0
PLSemanticsBench: Large Language Models As Programming Language InterpretersCode0
SD-MVSum: Script-Driven Multimodal Video Summarization Method and DatasetsCode0
QGraphLIME - Explaining Quantum Graph Neural NetworksCode0
DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process SupervisionCode0
D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image DetectionCode0
Show:102550
← PrevPage 368 of 18972Next →