SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 70517100 of 661570 papers

TitleStatusHype
SectEval: Evaluating the Latent Sectarian Preferences of Large Language ModelsCode0
Hierarchical Reference Sets for Robust Unsupervised Detection of Scattered and Clustered OutliersCode0
Thinking in Streaming VideoCode0
Multi-Agent Guided Policy OptimizationCode0
Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement LearningCode0
Referee: Reference-aware Audiovisual Deepfake DetectionCode0
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation2
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs2
XSkill: Continual Learning from Experience and Skills in Multimodal Agents2
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks4
Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models0
JigsawComm: Joint Semantic Feature Encoding and Transmission for Communication-Efficient Cooperative Perception0
Prompt-Driven Lightweight Foundation Model for Instance Segmentation-Based Fault Detection in Freight TrainsCode0
Adaptive Vision-Language Model Routing for Computer Use AgentsCode0
Altered Thoughts, Altered Actions: Probing Chain-of-Thought Vulnerabilities in VLA Robotic Manipulation0
Deep Learning Based Estimation of Blood Glucose Levels from Multidirectional Scleral Blood Vessel Imaging0
MRGeo: Robust Cross-View Geo-Localization of Corrupted Images via Spatial and Channel Feature Enhancement0
Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference0
GLEAM: A Multimodal Imaging Dataset and HAMM for Glaucoma Classification0
Contextual Graph Representations for Task-Driven 3D Perception and Planning0
PLDR-LLMs Reason At Self-Organized Criticality0
ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics0
Agreement Between Large Language Models, Human Reviewers, and Authors in Evaluating STROBE Checklists for Observational Studies in Rheumatology0
MST-Direct: Matching via Sinkhorn Transport for Multivariate Geostatistical Simulation with Complex Non-Linear Dependencies0
Adapting Methods for Domain-Specific Japanese Small LMs: Scale, Architecture, and Quantization0
Gaussian Process Regression-based Knowledge Distillation Framework for Simultaneous Prediction of Physical and Mechanical Properties of Epoxy Polymers0
An Intent of Collaboration: On Agencies between Designers and Emerging (Intelligent) Technologies0
Quantum-Secure-By-Construction (QSC): A Paradigm Shift For Post-Quantum Agentic Intelligence0
A Dynamic Survey of Fuzzy, Intuitionistic Fuzzy, Neutrosophic, Plithogenic, and Extensional Sets0
Compiled Memory: Not More Information, but More Precise Instructions for Language Agents0
Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization0
OMNIA: Closing the Loop by Leveraging LLMs for Knowledge Graph Completion0
Diabetic Retinopathy Grading with CLIP-based Ranking-Aware Adaptation:A Comparative Study on Fundus Image0
Schema First Tool APIs for LLM Agents: A Controlled Study of Tool Misuse, Recovery, and Budgeted Performance0
Accelerating Suffix Jailbreak attacks with Prefix-Shared KV-cache0
Bridging the Visual-to-Physical Gap: Physically Aligned Representations for Fall Risk Analysis0
WAT: Online Video Understanding Needs Watching Before Thinking0
Neuro-Symbolic Generation and Validation of Memory-Aware Formal Function Specifications0
Bridging Protocol and Production: Design Patterns for Deploying AI Agents with Model Context Protocol0
GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models0
Diffusion Models Generalize but Not in the Way You Might Think0
Generalization and Memorization in Rectified Flow0
Projection Guided Personalized Federated Learning for Low Dose CT Denoising0
Distance-aware Soft Prompt Learning for Multimodal Valence-Arousal Estimation0
Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video DiffusionCode0
Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH DetectionCode0
CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges1
Distribution estimation via Flow Matching with Lipschitz guarantees0
Impact of Markov Decision Process Design on Sim-to-Real Reinforcement Learning0
MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization0
Show:102550
← PrevPage 142 of 13232Next →