SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 32013250 of 659983 papers

TitleStatusHype
An End-to-End Framework for Functionality-Embedded Provenance Graph Construction and Threat Interpretation0
Knowledge Localization in Mixture-of-Experts LLMs Using Cross-Lingual Inconsistency0
When the Specification Emerges: Benchmarking Faithfulness Loss in Long-Horizon Coding Agents0
SENSE: Efficient EEG-to-Text via Privacy-Preserving Semantic Retrieval0
Pixel-level Counterfactual Contrastive Learning for Medical Image Segmentation0
Hidden Clones: Exposing and Fixing Family Bias in Vision-Language Model Ensembles0
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models0
Security Assessment and Mitigation Strategies for Large Language Models: A Comprehensive Defensive Framework0
Topology-Preserving Deep Joint Source-Channel Coding for Semantic Communication0
Personalized Fall Detection by Balancing Data with Selective Feedback Using Contrastive Learning0
Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents0
GazeOnce360: Fisheye-Based 360° Multi-Person Gaze Estimation with Global-Local Feature Fusion0
Quadratic Surrogate Attractor for Particle Swarm Optimization0
SLAM Adversarial Lab: An Extensible Framework for Visual SLAM Robustness Evaluation under Adverse Conditions0
PAuth - Precise Task-Scoped Authorization For Agents0
Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning0
Domain-informed explainable boosting machines for trustworthy lateral spread predictions0
Catching rationalization in the act: detecting motivated reasoning before and after CoT via activation probing0
Visual Product Search Benchmark0
Abstraction as a Memory-Efficient Inductive Bias for Continual Learning0
CODMAS: A Dialectic Multi-Agent Collaborative Framework for Structured RTL Optimization0
OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation0
A scalable neural bundle map for multiphysics prediction in lithium-ion battery across varying configurations0
AI Scientist via Synthetic Task Scaling0
Alignment Makes Language Models Normative, Not Descriptive0
Multilingual, Multimodal Pipeline for Creating Authentic and Structured Fact-Checked Claim Dataset0
One-Shot Badminton Shuttle Detection for Mobile Robots0
Gradient Atoms: Unsupervised Discovery, Attribution and Steering of Model Behaviors via Sparse Decomposition of Training GradientsCode0
Manifold-Matching Autoencoders0
RaDAR: Relation-aware Diffusion-Asymmetric Graph Contrastive Learning for Recommendation0
PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction0
Robust Physics-Guided Diffusion for Full-Waveform Inversion0
Optimal uncertainty bounds for multivariate kernel regression under bounded noise: A Gaussian process-based dual function0
Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures0
VideoMatGen: PBR Materials through Joint Generative Modeling0
Shielded Reinforcement Learning Under Dynamic Temporal Logic Constraints0
Dual Stream Independence Decoupling for True Emotion Recognition under Masked Expressions0
REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge0
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic WeightingCode0
Distilling Feedback into Memory-as-a-Tool0
A Lensless Polarization Camera0
When AI Navigates the Fog of War0
Exposing Blindspots: Cultural Bias Evaluation in Generative Image Models0
SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised No-Reference Image Quality Assessment0
Adaptive Contracts for Cost-Effective AI Delegation0
From Natural Language to Executable Option Strategies via Large Language Models0
Tabular LLMs for Interpretable Few-Shot Alzheimer's Disease Prediction with Multimodal Biomedical DataCode0
Ethical Fairness without Demographics in Human-Centered AI0
The Cost of Reasoning: Chain-of-Thought Induces Overconfidence in Vision-Language Models0
Incongruent Positivity: When Miscalibrated Positivity Undermines Online Supportive Conversations0
Show:102550
← PrevPage 65 of 13200Next →