SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 28762900 of 177340 papers

TitleStatusHype
The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety AnalysisCode3
Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline SummarizationCode3
Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene ReconstructionCode3
MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning LibraryCode3
UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image SegmentationCode3
GraphNeuralNetworks.jl: Deep Learning on Graphs with JuliaCode3
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RLCode3
A Simple Framework for Open-Vocabulary Segmentation and DetectionCode3
LinFusion: 1 GPU, 1 Minute, 16K ImageCode3
CHESS: Contextual Harnessing for Efficient SQL SynthesisCode3
Flexible and Scalable Deep Learning with MMLSparkCode3
A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and TrustworthinessCode3
Why Transformers Need Adam: A Hessian PerspectiveCode3
LiftFeat: 3D Geometry-Aware Local Feature MatchingCode3
An Empirical Study on Prompt Compression for Large Language ModelsCode3
This Time is Different: An Observability Perspective on Time Series Foundation ModelsCode3
Image and Video Tokenization with Binary Spherical QuantizationCode3
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and ExtrapolationCode3
Distilling LLM Agent into Small Models with Retrieval and Code ToolsCode3
Highly Compressed Tokenizer Can Generate Without TrainingCode3
When to use Graphs in RAG: A Comprehensive Analysis for Graph Retrieval-Augmented GenerationCode3
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual TokensCode3
Discrete Diffusion in Large Language and Multimodal Models: A SurveyCode3
Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised ModelsCode3
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every LanguageCode3
Show:102550
← PrevPage 116 of 7094Next →