SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 61516200 of 661570 papers

TitleStatusHype
Towards One-for-All Anomaly Detection for Tabular Data0
End-to-End Spatial-Temporal Transformer for Real-time 4D HOI Reconstruction0
An Industrial-Scale Insurance LLM Achieving Verifiable Domain Mastery and Hallucination Control without Competence Trade-offs0
Physics-Informed Policy Optimization via Analytic Dynamics Regularization0
Wi-Spike: A Low-power WiFi Human Multi-action Recognition Model with Spiking Neural Networks0
Disentangling Dynamical Systems: Causal Representation Learning Meets Local Sparse Attention0
Predicting Stress-strain Behaviors of Additively Manufactured Materials via Loss-based and Activation-based Physics-informed Machine Learning0
R3DP: Real-Time 3D-Aware Policy for Embodied Manipulation0
Unlocking the Latent Canvas: Eliciting and Benchmarking Symbolic Visual Expression in LLMs0
Interp3R: Continuous-time 3D Geometry Estimation with Frames and Events0
Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms0
IQP Born Machines under Data-dependent and Agnostic Initialization Strategies0
Texel Splatting: Perspective-Stable 3D Pixel Art0
PA^3: Policy-Aware Agent Alignment through Chain-of-Thought0
Tactile Modality Fusion for Vision-Language-Action Models0
ResearchPilot: A Local-First Multi-Agent System for Literature Synthesis and Related Work Drafting0
Early Failure Detection and Intervention in Video Diffusion Models0
Emergent Coordination in Multi-Agent Language Models0
More Agents Improve Math Problem Solving but Adversarial Robustness Gap Persists0
Vavanagi: a Community-run Platform for Documentation of the Hula Language in Papua New Guinea0
A Physically-Grounded Attack and Adaptive Defense Framework for Real-World Low-Light Image EnhancementCode0
Seeking Physics in Diffusion Noise0
An End-to-end Architecture for Collider Physics and Beyond0
The Scenic Route to Deception: Dark Patterns and Explainability Pitfalls in Conversational Navigation0
Agentic DAG-Orchestrated Planner Framework for Multi-Modal, Multi-Hop Question Answering in Hybrid Data Lakes0
Deep probabilistic model synthesis enables unified modeling of whole-brain neural activity across individual subjects0
Rethinking Evaluation in Retrieval-Augmented Personalized Dialogue: A Cognitive and Linguistic Perspective0
Survey on Neural Routing Solvers0
AEX: Non-Intrusive Multi-Hop Attestation and Provenance for LLM APIs0
MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding0
How to find expressible and trainable parameterized quantum circuits?0
Top-b: Entropic Regulation of Relative Probability Bands in Autoregressive Language Processes0
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language ModelsCode0
Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration TokensCode0
Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange InterventionsCode0
A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial RobustnessCode0
GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow PoliciesCode0
Bridging the Gap in the Responsible AI DividesCode0
CangjieBench: Benchmarking LLMs on a Low-Resource General-Purpose Programming LanguageCode0
Representation Alignment for Just Image Transformers is not Easier than You ThinkCode0
DC-Merge: Improving Model Merging with Directional ConsistencyCode0
OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic CameraCode0
ECG-Reasoning-Benchmark: A Benchmark for Evaluating Clinical Reasoning Capabilities in ECG InterpretationCode0
HomeGuard: VLM-based Embodied Safeguard for Identifying Contextual Risk in Household TaskCode0
Joint Segmentation and Grading with Iterative Optimization for Multimodal Glaucoma DiagnosisCode0
Self-transcendence: Is External Feature Guidance Indispensable for Accelerating Diffusion Transformer Training?Code0
Unsupervised Online 3D Instance Segmentation with Synthetic Sequences and Dynamic LossCode0
PREDICT-GBM: A multi-center platform to advance personalized glioblastoma radiotherapy planningCode0
Null-Space Filtering for Data-Free Continual Model Merging: Preserving Stability, Promoting PlasticityCode0
Towards Understanding Valuable Preference Data for Large Language Model AlignmentCode0
Show:102550
← PrevPage 124 of 13232Next →