SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 1150111550 of 661570 papers

TitleStatusHype
Two-Stage Photovoltaic Forecasting: Separating Weather Prediction from Plant-Characteristics0
Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization0
HBRB-BoW: A Retrained Bag-of-Words Vocabulary for ORB-SLAM via Hierarchical BRB-KMeans0
LISTA-Transformer Model Based on Sparse Coding and Attention Mechanism and Its Application in Fault Diagnosis0
Traces of Social Competence in Large Language Models0
Learning Hip Exoskeleton Control Policy via Predictive Neuromusculoskeletal Simulation0
CodeTaste: Can LLMs Generate Human-Level Code Refactorings?0
Architectural Proprioception in State Space Models: Thermodynamic Training Induces Anticipatory Halt Detection0
REDNET-ML: A Multi-Sensor Machine Learning Pipeline for Harmful Algal Bloom Risk Detection Along the Omani Coast0
Noise-aware Client Selection for carbon-efficient Federated Learning via Gradient Norm Thresholding0
Stable and Steerable Sparse Autoencoders with Weight Regularization0
DeepScan: A Training-Free Framework for Visually Grounded Reasoning in Large Vision-Language Models0
From Threat Intelligence to Firewall Rules: Semantic Relations in Hybrid AI Agent and Expert System Architectures0
Generative Recommendation for Large-Scale Advertising0
Hierarchical Inference and Closure Learning via Adaptive Surrogates for ODEs and PDEs0
Beyond the Prompt: An Empirical Study of Cursor Rules0
An Adaptive KKT-Based Indicator for Convergence Assessment in Multi-Objective Optimization0
Order Is Not Layout: Order-to-Space Bias in Image Generation0
To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks0
Mozi: Governed Autonomy for Drug Discovery LLM Agents0
MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation0
REVISION:Reflective Intent Mining and Online Reasoning Auxiliary for E-commerce Visual Search System Optimization0
SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance0
When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?0
Empirical Evaluation of No Free Lunch Violations in Permutation-Based Optimization0
Weakly Supervised Patch Annotation for Improved Screening of Diabetic Retinopathy0
CLIP-Guided Multi-Task Regression for Multi-View Plant PhenotypingCode0
MultiWikiQA: A Reading Comprehension Benchmark in 300+ Languages0
MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities of Daily Living0
A Systematic Analysis of Biases in Large Language Models0
Improved MambdaBDA Framework for Robust Building Damage Assessment Across Disaster Domains0
Causal Circuit Tracing Reveals Distinct Computational Architectures in Single-Cell Foundation Models: Inhibitory Dominance, Biological Coherence, and Cross-Model Convergence0
DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation0
LEA: Label Enumeration Attack in Vertical Federated Learning0
A Bi-Stage Framework for Automatic Development of Pixel-Based Planar Antenna Structures0
Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation0
Training-Free Reward-Guided Image Editing via Trajectory Optimal Control0
AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation0
GarmentPile++: Affordance-Driven Cluttered Garments Retrieval with Vision-Language Reasoning0
Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions0
A Study on Building Efficient Zero-Shot Relation Extraction Models0
Momentum Memory for Knowledge Distillation in Computational Pathology0
FlowCorrect: Efficient Interactive Correction of Generative Flow Policies for Robotic Manipulation0
ProSMA-UNet: Decoder Conditioning for Proximal-Sparse Skip Feature Selection0
Polyp Segmentation Using Wavelet-Based Cross-Band Integration for Enhanced Boundary Representation0
On the Learnability of Offline Model-Based Optimization: A Ranking Perspective0
STEM Faculty Perspectives on Generative AI in Higher Education0
A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality0
Dynamic Adversarial Reinforcement Learning for Robust Multimodal Large Language Models0
Extending Czech Aspect-Based Sentiment Analysis with Opinion Terms: Dataset and LLM Benchmarks0
Show:102550
← PrevPage 231 of 13232Next →