SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 27512800 of 659983 papers

TitleStatusHype
The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety AnalysisCode3
MetaDE: Evolving Differential Evolution by Differential EvolutionCode3
MDCrow: Automating Molecular Dynamics Workflows with Large Language ModelsCode3
Cognify: Supercharging Gen-AI Workflows With Hierarchical AutotuningCode3
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented GenerationCode3
FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading AgentsCode3
GENERator: A Long-Context Generative Genomic Foundation ModelCode3
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem ProvingCode3
EVEv2: Improved Baselines for Encoder-Free Vision-Language ModelsCode3
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time ScalingCode3
History-Guided Video DiffusionCode3
PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural MapCode3
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal UnderstandingCode3
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency PolicyCode3
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video GenerationCode3
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context AccurayCode3
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation TasksCode3
VideoRoPE: What Makes for Good Video Rotary Position Embedding?Code3
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare CopilotCode3
ConceptAttention: Diffusion Transformers Learn Highly Interpretable FeaturesCode3
Ola: Pushing the Frontiers of Omni-Modal Language ModelCode3
Multi-agent Architecture Search via Agentic SupernetCode3
Demystifying Long Chain-of-Thought Reasoning in LLMsCode3
One Diffusion Step to Real-World Super-Resolution via Flow Trajectory DistillationCode3
ParetoQ: Scaling Laws in Extremely Low-bit LLM QuantizationCode3
Transolver++: An Accurate Neural Solver for PDEs on Million-Scale GeometriesCode3
Flow Q-LearningCode3
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech RecognitionCode3
GFM-RAG: Graph Foundation Model for Retrieval Augmented GenerationCode3
Safety at Scale: A Comprehensive Survey of Large Model SafetyCode3
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization PerspectiveCode3
OneForecast: A Universal Framework for Global and Regional Weather ForecastingCode3
MambaGlue: Fast and Robust Local Feature Matching With MambaCode3
M+: Extending MemoryLLM with Scalable Long-Term MemoryCode3
Rethinking Early Stopping: Refine, Then CalibrateCode3
Test-Time Training Scaling Laws for Chemical Exploration in Drug DesignCode3
Partially Rewriting a Transformer in Natural LanguageCode3
Decoding-based RegressionCode3
Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation ModelsCode3
LLMs can see and hear without any trainingCode3
Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic DifferentiationCode3
Molecular Fingerprints Are Strong Models for Peptide Function PredictionCode3
Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series ForecastingCode3
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat GenerationCode3
Deformable Beta SplattingCode3
Parametric Retrieval Augmented GenerationCode3
MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM AgentsCode3
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and GenerationCode3
OSUM: Advancing Open Speech Understanding Models with Limited Resources in AcademiaCode3
The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling CapabilitiesCode3
Show:102550
← PrevPage 56 of 13200Next →