SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 16011650 of 659983 papers

TitleStatusHype
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree SearchCode4
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and OpportunitiesCode4
SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion PlanningCode4
Agent Q: Advanced Reasoning and Learning for Autonomous AI AgentsCode4
Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversCode4
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented GenerationCode4
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh TokenizationCode4
miniCTX: Neural Theorem Proving with (Long-)ContextsCode4
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented GenerationCode4
ParkingE2E: Camera-based End-to-end Parking Network, from Images to PlanningCode4
Deep Patch Visual SLAMCode4
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPSCode4
CitationMap: A Python Tool to Identify and Visualize Your Google Scholar Citations Around the WorldCode4
Medical SAM 2: Segment medical images as video via Segment Anything Model 2Code4
Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up QuestionsCode4
Expressive Whole-Body 3D Gaussian AvatarCode4
The Llama 3 Herd of ModelsCode4
Generation of Training Data from HD Maps in the Lanelet2 FrameworkCode4
LAMBDA: A Large Model Based Data AgentCode4
Multi-label Cluster Discrimination for Visual Representation LearningCode4
LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN proverCode4
Stable-Hair: Real-World Hair Transfer via Diffusion ModelCode4
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model InternalsCode4
Scaling Granite Code Models to 128K ContextCode4
Goldfish: Vision-Language Understanding of Arbitrarily Long VideosCode4
Distilling Tiny and Ultra-fast Deep Neural Networks for Autonomous Navigation on Nano-UAVsCode4
Halu-J: Critique-Based Hallucination JudgeCode4
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM InferenceCode4
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world EnvironmentsCode4
Deep-TEMPEST: Using Deep Learning to Eavesdrop on HDMI from its Unintended Electromagnetic EmanationsCode4
SEED-Story: Multimodal Long Story Generation with Large Language ModelCode4
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data EngineCode4
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication TrainingCode4
The GeometricKernels Package: Heat and Matérn Kernels for Geometric Learning on Manifolds, Meshes, and GraphsCode4
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future TrendsCode4
A Survey on Deep Stereo Matching in the TwentiesCode4
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative IntelligenceCode4
MiraData: A Large-Scale Video Dataset with Long Durations and Structured CaptionsCode4
Wavelet Convolutions for Large Receptive FieldsCode4
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text GenerationCode4
MUSE: Machine Unlearning Six-Way Evaluation for Language ModelsCode4
TALENT: A Tabular Analytics and Learning ToolboxCode4
Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades LaterCode4
MIGC++: Advanced Multi-Instance Generation Controller for Image SynthesisCode4
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language ModelsCode4
Tiny-PULP-Dronets: Squeezing Neural Networks for Faster and Lighter Inference on Multi-Tasking Autonomous Nano-DronesCode4
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized SoundsCode4
fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial IntelligenceCode4
A Closer Look at Deep Learning Methods on Tabular DatasetsCode4
Kolmogorov-Arnold Convolutions: Design Principles and Empirical StudiesCode4
Show:102550
← PrevPage 33 of 13200Next →