SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 50765100 of 661570 papers

TitleStatusHype
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language ModelsCode2
Descriptor-based Foundation Models for Molecular Property PredictionCode2
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization ChallengesCode2
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax TreeCode2
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMsCode2
Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and VerificationCode2
Essential-Web v1.0: 24T tokens of organized web dataCode2
OS-Harm: A Benchmark for Measuring Safety of Computer Use AgentsCode2
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation ModelsCode2
Test3R: Learning to Reconstruct 3D at Test TimeCode2
TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement LearningCode2
LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential RecommendationCode2
SuperPoint-SLAM3: Augmenting ORB-SLAM3 with Deep Features, Adaptive NMS, and Learning-Based Loop ClosureCode2
A Comprehensive Survey on Continual Learning in Generative ModelsCode2
DETRPose: Real-time end-to-end transformer model for multi-person pose estimationCode2
Focusing on Tracks for Online Multi-Object TrackingCode2
Humanity's Last Code Exam: Can Advanced LLMs Conquer Human's Hardest Code Competition?Code2
Improving spliced alignment by modeling splice sites with deep learningCode2
Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language ModelsCode2
QFFT, Question-Free Fine-Tuning for Adaptive ReasoningCode2
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic SoundscapesCode2
Efficient Speech Enhancement via Embeddings from Pre-trained Generative AudioencodersCode2
CGVQM+D: Computer Graphics Video Quality Metric and DatasetCode2
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security TasksCode2
TreeRL: LLM Reinforcement Learning with On-Policy Tree SearchCode2
Show:102550
← PrevPage 204 of 26463Next →