SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1585115900 of 474278 papers

TitleStatusHype
Dynadiff: Single-stage Decoding of Images from Continuously Evolving fMRICode1
EEG-to-Text Translation: A Model for Deciphering Human Brain ActivityCode1
Learning Concept-Driven Logical Rules for Interpretable and Generalizable Medical Image ClassificationCode1
Safety Subspaces are Not Distinct: A Fine-Tuning Case StudyCode1
MGStream: Motion-aware 3D Gaussian for Streamable Dynamic Scene ReconstructionCode1
Enhancing Classification with Semi-Supervised Deep Learning Using Distance-Based Sample WeightsCode1
Time series saliency maps: explaining models across multiple domainsCode1
Cross-modal feature fusion for robust point cloud registration with ambiguous geometryCode1
Accelerate TarFlow Sampling with GS-Jacobi IterationCode1
TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World ScenariosCode1
One-Step Offline Distillation of Diffusion-based Models via Koopman ModelingCode1
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and InferenceCode1
Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual SignalsCode1
VTBench: Evaluating Visual Tokenizers for Autoregressive Image GenerationCode1
WriteViT: Handwritten Text Generation with Vision TransformerCode1
MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian ConditioningCode1
GuRE:Generative Query REwriter for Legal Passage RetrievalCode1
Shadow-FT: Tuning Instruct via BaseCode1
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio InformationCode1
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on InequalitiesCode1
Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues?Code1
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)Code1
Know Or Not: a library for evaluating out-of-knowledge base robustnessCode1
EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated CodeCode1
SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed ScienceCode1
Aneumo: A Large-Scale Multimodal Aneurysm Dataset with Computational Fluid Dynamics Simulations and Deep Learning BenchmarksCode1
HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videosCode1
3D Visual Illusion Depth EstimationCode1
Effective and Transparent RAG: Adaptive-Reward Reinforcement Learning for Decision TraceabilityCode1
What is Stigma Attributed to? A Theory-Grounded, Expert-Annotated Interview Corpus for Demystifying Mental-Health StigmaCode1
Learning Collision Risk from Naturalistic Driving with Generalised Surrogate Safety MeasuresCode1
Fine-tuning Quantized Neural Networks with Zeroth-order OptimizationCode1
From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based SelectionCode1
Role-Playing Evaluation for Large Language ModelsCode1
What Lives? A meta-analysis of diverse opinions on the definition of lifeCode1
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and ExplanationCode1
CALM-PDE: Continuous and Adaptive Convolutions for Latent Space Modeling of Time-dependent PDEsCode1
A Skull-Adaptive Framework for AI-Based 3D Transcranial Focused Ultrasound SimulationCode1
AutoMat: Enabling Automated Crystal Structure Reconstruction from Microscopy via Agentic Tool UseCode1
TimeSeriesGym: A Scalable Benchmark for (Time Series) Machine Learning Engineering AgentsCode1
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable RewardsCode1
FlowPure: Continuous Normalizing Flows for Adversarial PurificationCode1
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMsCode1
AGI-Elo: How Far Are We From Mastering A Task?Code1
Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language ModelsCode1
R3: Robust Rubric-Agnostic Reward ModelsCode1
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank CloneCode1
Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time AlignmentCode1
Is Artificial Intelligence Generated Image Detection a Solved Problem?Code1
Hyperspectral Image Land Cover Captioning Dataset for Vision Language ModelsCode1
Show:102550
← PrevPage 318 of 9486Next →