SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 64016450 of 661570 papers

TitleStatusHype
DeceptGuard :A Constitutional Oversight Framework For Detecting Deception in LLM Agents0
IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware ScoringCode0
MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos0
Boosted GFlowNets: Improving Exploration via Sequential Learning0
Manifold-Orthogonal Dual-spectrum Extrapolation for Parameterized Physics-Informed Neural Networks0
LabelFusion: Fusing Large Language Models with Transformer Encoders for Robust Financial News Classification0
Intelligent Materials Modelling: Large Language Models Versus Partial Least Squares Regression for Predicting Polysulfone Membrane Mechanical Performance0
Hierarchy of extreme-event predictability in turbulence revealed by machine learning0
A Benchmark for Multi-Party Negotiation Games from Real Negotiation Data0
Locally Linear Continual Learning for Time Series based on VC-Theoretical Generalization Bounds0
Beyond Medical Diagnostics: How Medical Multimodal Large Language Models Think in Space0
Toward Scalable Co-located Practical Learning: Assisting with Computer Vision and Multimodal Analytics0
QuarkMedBench: A Real-World Scenario Driven Benchmark for Evaluating Large Language Models0
Concisely Explaining the Doubt: Minimum-Size Abductive Explanations for Linear Models with a Reject Option0
Faithful or Just Plausible? Evaluating the Faithfulness of Closed-Source LLMs in Medical Reasoning0
Bootstrapped Physically-Primed Neural Networks for Robust T2 Distribution Estimation in Low-SNR Pancreatic MRI0
VLD: Visual Language Goal Distance for Reinforcement Learning Navigation0
MultiGraSCCo: A Multilingual Anonymization Benchmark with Annotations of Personal Identifiers0
Not All Latent Spaces Are Flat: Hyperbolic Concept Control0
Every Error has Its Magnitude: Asymmetric Mistake Severity Training for Multiclass Multiple Instance Learning0
Robust Self-Training with Closed-loop Label Correction for Learning from Noisy Labels0
Data-driven Progressive Discovery of Physical Laws0
R3-REC: Reasoning-Driven Recommendation via Retrieval-Augmented LLMs over Multi-Granular Interest Signals0
Knowledge Distillation for Large Language Models0
An Interpretable and Stable Framework for Sparse Principal Component Analysis0
An Alternative Trajectory for Generative AI0
High-speed Imaging through Turbulence with Event-based Light Fields0
EchoLVFM: One-Step Video Generation via Latent Flow Matching for Echocardiogram SynthesisCode0
Dynamical Mechanisms for Coordinating Long-term Working Memory Based on the Precision of Spike-timing in Cortical Neurons0
Estimating Text Temperature with Language Models0
Greedy Information Projection for LLM Data Selection0
Balancing Safety and Optimality in Robot Path Planning: Algorithm and Metric0
AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models0
Should LLMs, like, Generate How Users Talk? Building Dialect-Accurate Dialog[ue]s Beyond the American Default with MDial0
Distributed Acoustic Sensing for Urban Traffic Monitoring: Spatio-Temporal Attention in Recurrent Neural Networks0
Close to Reality: Interpretable and Feasible Data Augmentation for Imbalanced Learning0
MOGeo: Beyond One-to-One Cross-View Object Geo-localization0
A Hyperbolic Perspective on Hierarchical Structure in Object-Centric Scene RepresentationsCode0
Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors0
Is He Extroverted? Identifying Missing Relevant Personas for Faithful User Simulation0
Depth to Anatomy: Organ Localization from Depth Images for Automated Patient Table Positioning in Radiology Workflow0
SimLens for Early Exit in Large Language Models: Eliciting Accurate Latent Predictions with One More Token0
AI for Scientific Discovery is a Social Problem0
Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning0
Automated Genomic Interpretation via Concept Bottleneck Models for Medical Robotics0
Bid2X: Revealing Dynamics of Bidding Environment in Online Advertising from A Foundation Model Lens0
StreamingTOM: Streaming Token Compression for Efficient Video Understanding0
PAS: A Training-Free Stabilizer for Temporal Encoding in Video LLMs0
ABounD: Adversarial Boundary-Driven Few-Shot Learning for Multi-Class Anomaly Detection0
MSSSeg: Learning Multi-Scale Structural Complexity for Self-Supervised Segmentation0
Show:102550
← PrevPage 129 of 13232Next →