SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 1215112200 of 661570 papers

TitleStatusHype
Towards Improved Sentence Representations using Token GraphsCode0
Think, But Don't Overthink: Reproducing Recursive Language ModelsCode0
Privacy Risk Predictions Based on Fundamental Understanding of Personal Data and an Evolving Threat LandscapeCode0
An Effective Data Augmentation Method by Asking Questions about Scene Text ImagesCode0
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward1
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing2
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?1
WristMIR: Coarse-to-Fine Region-Aware Retrieval of Pediatric Wrist Radiographs with Radiology Report-Driven LearningCode0
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing1
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI2
DREAM: Where Visual Understanding Meets Text-to-Image Generation1
SimRecon: SimReady Compositional Scene Reconstruction from Real Videos2
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory3
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs1
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing3
Proact-VL: A Proactive VideoLLM for Real-Time AI Companions2
Human3R: Everyone Everywhere All at Once3
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle2
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning2
CASR-Net: An Image Processing-focused Deep Learning-based Coronary Artery Segmentation and Refinement Network for X-ray Coronary Angiogram0
A Short Note on a Variant of the Squint Algorithm0
EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive ModelCode0
AccurateRAG: A Framework for Building Accurate Retrieval-Augmented Question-Answering Applications0
Super Research: Answering Highly Complex Questions with Large Language Models through Super Deep and Super Wide Research0
When Small Variations Become Big Failures: Reliability Challenges in Compute-in-Memory Neural Accelerators0
Zono-Conformal Prediction: Zonotope-Based Uncertainty Quantification for Regression and Classification Tasks0
E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition0
Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models0
PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference0
ModalPatch: A Plug-and-Play Module for Robust Multi-Modal 3D Object Detection under Modality Drop0
MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models0
Optimizing Orbital Parameters of Satellites for a Global Quantum Network0
Viability-Preserving Passive Torque Control0
Exploring Teacher-Chatbot Interaction and Affect in Block-Based Programming0
Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models0
AgenticGEO: A Self-Evolving Agentic System for Generative Engine Optimization0
Beyond Detection: Governing GenAI in Academic Peer Review as a Sociotechnical Challenge0
RedacBench: Can AI Erase Your Secrets?0
URAG: A Benchmark for Uncertainty Quantification in Retrieval-Augmented Large Language Models0
Framing Effects in Independent-Agent Large Language Models: A Cross-Family Behavioral Analysis0
One Operator to Rule Them All? On Boundary-Indexed Operator Families in Neural PDE Solvers0
Machine Learning Models to Identify Promising Nested Antiresonance Nodeless Fiber Designs0
PolyMon: A Unified Framework for Polymer Property PredictionCode0
Safety-Guided Flow (SGF): A Unified Framework for Negative Guidance in Safe Generation0
Not All Queries Need Rewriting: When Prompt-Only LLM Refinement Helps and Hurts Dense Retrieval0
DreamReader: An Interpretability Toolkit for Text-to-Image Models0
FusionCast: Enhancing Precipitation Nowcasting with Asymmetric Cross-Modal Fusion and Future Radar Priors0
VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings0
MuFlex: A Scalable, Physics-based Platform for Multi-Building Flexibility Analysis and Coordination0
TriageSim: A Conversational Emergency Triage Simulation Framework from Structured Electronic Health Records0
Show:102550
← PrevPage 244 of 13232Next →