SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1515115200 of 474278 papers

TitleStatusHype
Transformer IMU Calibrator: Dynamic On-body IMU Calibration for Inertial Motion CaptureCode1
RePO: Replay-Enhanced Policy OptimizationCode1
Towards Practical Alzheimer's Disease Diagnosis: A Lightweight and Interpretable Spiking Neural ModelCode1
ScaleLSD: Scalable Deep Line Segment Detection StreamlinedCode1
Noise Conditional Variational Score DistillationCode1
Unmasking real-world audio deepfakes: A data-centric approachCode1
DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety PromptCode1
Non-Contact Health Monitoring During Daily Personal Care RoutinesCode1
Interpreting learned search: finding a transition model and value function in an RNN that plays SokobanCode1
3D-Aware Vision-Language Models Fine-Tuning with Geometric DistillationCode1
Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMsCode1
BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio GenerationCode1
GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric AlgebrasCode1
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMsCode1
Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language ModelsCode1
Exposure-slot: Exposure-centric representations learning with Slot-in-Slot Attention for Region-aware Exposure CorrectionCode1
On the Similarities of Embeddings in Contrastive LearningCode1
LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection ChallengeCode1
California Crop Yield Benchmark: Combining Satellite Image, Climate, Evapotranspiration, and Soil Data Layers for County-Level Yield Forecasting of Over 70 CropsCode1
Attention, Please! Revisiting Attentive Probing for Masked Image ModelingCode1
Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic SegmentationCode1
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error ScenariosCode1
Revisiting Diffusion Models: From Generative Pre-training to One-Step GenerationCode1
The Four Color Theorem for Cell Instance SegmentationCode1
Mutual-Supervised Learning for Sequential-to-Parallel Code TranslationCode1
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-rankingCode1
Rethinking Brain Tumor Segmentation from the Frequency Domain PerspectiveCode1
Training-Free Voice Conversion with Factorized Optimal TransportCode1
Resa: Transparent Reasoning Models via SAEsCode1
FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented GenerationCode1
InceptionMamba: An Efficient Hybrid Network with Large Band Convolution and Bottleneck MambaCode1
Intention-Conditioned Flow Occupancy ModelsCode1
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
SPEED-RL: Faster Training of Reasoning Models via Online Curriculum LearningCode1
RS-MTDF: Multi-Teacher Distillation and Fusion for Remote Sensing Semi-Supervised Semantic SegmentationCode1
SLEEPYLAND: trust begins with fair evaluation of automatic sleep staging modelsCode1
Token Perturbation Guidance for Diffusion ModelsCode1
mLaSDI: Multi-stage latent space dynamics identificationCode1
DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMsCode1
Draft-based Approximate Inference for LLMsCode1
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell DataCode1
SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language ModelsCode1
On Reasoning Strength Planning in Large Reasoning ModelsCode1
EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial StatementsCode1
GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech InstructionsCode1
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health CounselingCode1
Robot-Gated Interactive Imitation Learning with Adaptive Intervention MechanismCode1
LaDCast: A Latent Diffusion Model for Medium-Range Ensemble Weather ForecastingCode1
HSG-12M: A Large-Scale Spatial Multigraph DatasetCode1
On Finetuning Tabular Foundation ModelsCode1
Show:102550
← PrevPage 304 of 9486Next →