SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 26512675 of 661570 papers

TitleStatusHype
FluidGaussian: Propagating Simulation-Based Uncertainty Toward Functionally-Intelligent 3D Reconstruction0
AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling0
Benchmarking Bengali Dialectal Bias: A Multi-Stage Framework Integrating RAG-Based Translation and Human-Augmented RLAIF0
AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation0
TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference0
Relax Forcing: Relaxed KV-Memory for Consistent Long Video Generation0
Conspiracy Frame: a Semiotically-Driven Approach for Conspiracy Theories Detection0
PLR: Plackett-Luce for Reordering In-Context Learning Examples0
Constrained Online Convex Optimization with Memory and Predictions0
HamVision: Hamiltonian Dynamics as Inductive Bias for Medical Image Analysis0
An InSAR Phase Unwrapping Framework for Large-scale and Complex Events0
PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost0
Mitigating Objectness Bias and Region-to-Text Misalignment for Open-Vocabulary Panoptic Segmentation0
Task-Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models0
A Generalised Exponentiated Gradient Approach to Enhance Fairness in Binary and Multi-class Classification Tasks0
Mechanisms of Introspective Awareness0
Persona Vectors in Games: Measuring and Steering Strategies via Activation Vectors0
The Myhill-Nerode Theorem for Bounded Interaction: Canonical Abstractions via Agent-Bounded Indistinguishability0
Multi-Perspective LLM Annotations for Valid Analyses in Subjective Tasks0
Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach0
Silent Commitment Failure in Instruction-Tuned Language Models: Evidence of Governability Divergence Across Architectures0
Efficient Fine-Tuning Methods for Portuguese Question Answering: A Comparative Study of PEFT on BERTimbau and Exploratory Evaluation of Generative LLMs0
Is the future of AI green? What can innovation diffusion models say about generative AI's environmental impact?0
HyReach: Vision-Guided Hybrid Manipulator Reaching in Unseen Cluttered Environments0
Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models0
Show:102550
← PrevPage 107 of 26463Next →