SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 50515100 of 661570 papers

TitleStatusHype
Learning to See in the Extremely DarkCode2
WAFT: Warping-Alone Field Transforms for Optical FlowCode2
Stochastic Parameter DecompositionCode2
Language Modeling by Language ModelsCode2
OctoThinker: Mid-training Incentivizes Reinforcement Learning ScalingCode2
Video Compression for Spatiotemporal Earth System DataCode2
An ab initio foundation model of wavefunctions that accurately describes chemical bond breakingCode2
ConStellaration: A dataset of QI-like stellarator plasma boundaries and optimization benchmarksCode2
MegaFold: System-Level Optimizations for Accelerating Protein Structure Prediction ModelsCode2
PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket ConditioningCode2
AnalogNAS-Bench: A NAS Benchmark for Analog In-Memory ComputingCode2
Thought Anchors: Which LLM Reasoning Steps Matter?Code2
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics LearningCode2
Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation BoosterCode2
Graphs Meet AI Agents: Taxonomy, Progress, and Future OpportunitiesCode2
TAB: Unified Benchmarking of Time Series Anomaly Detection MethodsCode2
From Tiny Machine Learning to Tiny Deep Learning: A SurveyCode2
Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion ModelsCode2
Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario GenerationCode2
MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based AgentsCode2
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow MatchingCode2
RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and TrackingCode2
Watermarking Autoregressive Image GenerationCode2
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph RefinementCode2
Descriptor-based Foundation Models for Molecular Property PredictionCode2
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language ModelsCode2
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization ChallengesCode2
SonicVerse: Multi-Task Learning for Music Feature-Informed CaptioningCode2
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax TreeCode2
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMsCode2
Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and VerificationCode2
OS-Harm: A Benchmark for Measuring Safety of Computer Use AgentsCode2
Essential-Web v1.0: 24T tokens of organized web dataCode2
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation ModelsCode2
TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement LearningCode2
DETRPose: Real-time end-to-end transformer model for multi-person pose estimationCode2
A Comprehensive Survey on Continual Learning in Generative ModelsCode2
SuperPoint-SLAM3: Augmenting ORB-SLAM3 with Deep Features, Adaptive NMS, and Learning-Based Loop ClosureCode2
LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential RecommendationCode2
Test3R: Learning to Reconstruct 3D at Test TimeCode2
Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language ModelsCode2
Focusing on Tracks for Online Multi-Object TrackingCode2
Humanity's Last Code Exam: Can Advanced LLMs Conquer Human's Hardest Code Competition?Code2
Improving spliced alignment by modeling splice sites with deep learningCode2
QFFT, Question-Free Fine-Tuning for Adaptive ReasoningCode2
Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario AnalysisCode2
BraTS orchestrator : Democratizing and Disseminating state-of-the-art brain tumor image analysisCode2
TreeRL: LLM Reinforcement Learning with On-Policy Tree SearchCode2
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic SoundscapesCode2
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security TasksCode2
Show:102550
← PrevPage 102 of 13232Next →