SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 96769700 of 474278 papers

TitleStatusHype
Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces0
PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation0
CE-Bench: Towards a Reliable Contrastive Evaluation Benchmark of Interpretability of Sparse Autoencoders0
IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech ProcessingCode0
MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action RecognitionCode0
How to Make Large Language Models Generate 100% Valid Molecules?Code0
Towards Monotonic Improvement in In-Context Reinforcement LearningCode0
Seeing Through the Blur: Unlocking Defocus Maps for Deepfake DetectionCode0
No Loss, No Gain: Gated Refinement and Adaptive Compression for Prompt OptimizationCode0
Power Battery DetectionCode0
Benchmarking DINOv3 for Multi-Task Stroke Analysis on Non-Contrast CTCode0
TimeExpert: Boosting Long Time Series Forecasting with Temporal Mix of ExpertsCode0
GRAPE: Let GPRO Supervise Query Rewriting by Ranking for RetrievalCode0
No Concept Left Behind: Test-Time Optimization for Compositional Text-to-Image GenerationCode0
Memory-Efficient Fine-Tuning via Low-Rank Activation CompressionCode0
Flow Matching for Efficient and Scalable Data AssimilationCode0
Rule-Based Reinforcement Learning for Document Image Classification with Vision Language ModelsCode0
See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation0
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning0
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient TracingCode0
AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question AnsweringCode0
POEM: Explore Unexplored Reliable Samples to Enhance Test-Time AdaptationCode0
AutoPK: Leveraging LLMs and a Hybrid Similarity Metric for Advanced Retrieval of Pharmacokinetic Data from Complex Tables and DocumentsCode0
A Framework for Scalable Heterogeneous Multi-Agent Adversarial Reinforcement Learning in IsaacLabCode0
Pedestrian Attribute Recognition via Hierarchical Cross-Modality HyperGraph Learning0
Show:102550
← PrevPage 388 of 18972Next →