SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 1080110850 of 661570 papers

TitleStatusHype
Kraus Constrained Sequence Learning For Quantum Trajectories from Continuous Measurement0
Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval0
Merging Memory and Space: A State Space Neural Operator0
Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels0
A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy ApproximationCode0
Accelerating Text-to-Video Generation with Calibrated Sparse Attention0
FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning0
Transformer-Based Inpainting for Real-Time 3D Streaming in Sparse Multi-Camera Setups0
Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence0
HCT-QA: A Benchmark for Question Answering on Human-Centric Tables0
From Raw Corpora to Domain Benchmarks: Automated Evaluation of LLM Domain Expertise0
Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding0
Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space0
TCR-EML: Explainable Model Layers for TCR-pMHC Prediction0
Just-In-Time Objectives: A General Approach for Specialized AI Interactions0
Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence0
MiDAS: A Multimodal Data Acquisition System and Dataset for Robot-Assisted Minimally Invasive Surgery0
LA-MARRVEL: A Knowledge-Grounded, Language-Aware LLM Framework for Clinically Robust Rare Disease Gene Prioritization0
CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering0
SpatialMem: Metric-Aligned Long-Horizon Video Memory for Language Grounding and QA0
Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection0
The Compute ICE-AGE: Invariant Compute Envelope under Addressable Graph Evolution0
Diverse Word Choices, Same Reference: Annotating Lexically-Rich Cross-Document Coreference0
Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs0
Aligning the True Semantics: Constrained Decoupling and Distribution Sampling for Cross-Modal Alignment0
EigenData: A Self-Evolving Multi-Agent Platform for Function-Calling Data Synthesis, Auditing, and Repair0
IntSeqBERT: Learning Arithmetic Structure in OEIS via Modulo-Spectrum Embeddings0
Autocorrelation effects in a stochastic-process model for decision making via time series0
PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions0
Prediction-Powered Conditional Inference0
Koopman Regularized Deep Speech Disentanglement for Speaker Verification0
Tool-Genesis: A Task-Driven Tool Creation Benchmark for Self-Evolving Language Agent0
A Novel Hybrid Heuristic-Reinforcement Learning Optimization Approach for a Class of Railcar Shunting Problems0
From Decoupled to Coupled: Robustness Verification for Learning-based Keypoint Detection with Joint Specifications0
DreamCAD: Scaling Multi-modal CAD Generation using Differentiable Parametric Surfaces0
Adversarial Batch Representation Augmentation for Batch Correction in High-Content Cellular Screening0
Post Fusion Bird's Eye View Feature Stabilization for Robust Multimodal 3D Detection0
Behavior-dLDS: A decomposed linear dynamical systems model for neural activity partially constrained by behavior0
Real-Time AI Service Economy: A Framework for Agentic Computing Across the Continuum0
Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs0
Rethinking Concept Bottleneck Models: From Pitfalls to Solutions0
The Value of Graph-based Encoding in NBA Salary Prediction0
Gabor Primitives for Accelerated Cardiac Cine MRI Reconstruction0
OWL: A Novel Approach to Machine Perception During Motion0
FreeTxt-Vi: A Benchmarked Vietnamese-English Toolkit for Segmentation, Sentiment, and Summarisation0
Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression0
Structured Multidimensional Representation Learning for Large Language Models0
Warm Starting State-Space Models with Automata Learning0
MultiHaystack: Benchmarking Multimodal Retrieval and Reasoning over 40K Images, Videos, and Documents0
Reasoning Models Struggle to Control their Chains of Thought0
Show:102550
← PrevPage 217 of 13232Next →