SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1480114850 of 474278 papers

TitleStatusHype
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents0
DRIVE Through the Unpredictability:From a Protocol Investigating Slip to a Metric Estimating Command Uncertainty0
Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining0
Quantum Artificial Intelligence for Secure Autonomous Vehicle Navigation: An Architectural Proposal0
EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised TrainingCode1
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language ModelsCode1
StoryWriter: A Multi-Agent Framework for Long Story GenerationCode1
Subspace-Boosted Model Merging0
From Coarse to Continuous: Progressive Refinement Implicit Neural Representation for Motion-Robust Anisotropic MRI Reconstruction0
From General to Targeted Rewards: Surpassing GPT-4 in Open-Ended Long-Context Generation0
LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning0
SemAgent: A Semantics Aware Program Repair Agent0
Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning BehaviorCode1
Watermarking Autoregressive Image GenerationCode2
Wavelet-based Global Orientation and Surface Reconstruction for Point Clouds0
OJBench: A Competition Level Code Benchmark For Large Language ModelsCode1
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate DetailsCode3
VRAIL: Vectorized Reward-based Attribution for Interpretable Learning0
REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing0
A Distributional-Lifting Theorem for PAC Learning0
Capturing Visualization Design RationaleCode0
AuraGenome: An LLM-Powered Framework for On-the-Fly Reusable and Scalable Circular Genome VisualizationsCode0
Model Fusion via Neuron InterpolationCode0
Retrospective Memory for Camouflaged Object Detection0
CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy OptimizationCode0
Baltimore Atlas: FreqWeaver Adapter for Semi-supervised Ultra-high Spatial Resolution Land Cover Classification0
RA-NeRF: Robust Neural Radiance Field Reconstruction with Accurate Camera Pose Estimation under Complex Trajectories0
Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training0
ExtPose: Robust and Coherent Pose Estimation by Extending ViTs0
From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents0
Sekai: A Video Dataset towards World Exploration0
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts0
Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion ModelCode1
SignBart -- New approach with the skeleton sequence for Isolated Sign language RecognitionCode0
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedCode0
SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification0
Conquering the Retina: Bringing Visual in-Context Learning to OCTCode0
Modulated Diffusion: Accelerating Generative Modeling with Modulated QuantizationCode0
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph RefinementCode2
Fiber Signal Denoising Algorithm using Hybrid Deep Learning Networks0
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute0
Effect of Signal Quantization on Performance Measures of a 1st Order One Dimensional Differential Microphone Array0
Reinforcement Learning-Based Policy Optimisation For Heterogeneous Radio Access0
Joint Computation Offloading and Resource Allocation for Uncertain Maritime MEC via Cooperation of UAVs and Vessels0
Multi-Timescale Gradient Sliding for Distributed Optimization0
Active Learning-Guided Seq2Seq Variational Autoencoder for Multi-target Inhibitor Generation0
Learning Task-Agnostic Skill Bases to Uncover Motor Primitives in Animal Behaviors0
Conditional Generative Modeling for Enhanced Credit Risk Management in Supply Chain Finance0
CopulaSMOTE: A Copula-Based Oversampling Approach for Imbalanced Classification in Diabetes Prediction0
Urban RIS-Assisted HAP Networks: Performance Analysis Using Stochastic Geometry0
Show:102550
← PrevPage 297 of 9486Next →