SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 801850 of 659983 papers

TitleStatusHype
Weakly Supervised Detection of Hallucinations in LLM ActivationsCode5
Vectorized and performance-portable QuicksortCode5
Less-to-More Generalization: Unlocking More Controllability by In-Context GenerationCode5
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View SynthesisCode5
PaSa: An LLM Agent for Comprehensive Academic Paper SearchCode5
Voyager: An Open-Ended Embodied Agent with Large Language ModelsCode5
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIsCode5
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-ExpertsCode5
On the Computation of the Fisher Information in Continual LearningCode5
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse AttentionCode5
How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a SurveyCode5
GRUtopia: Dream General Robots in a City at ScaleCode5
Fractal Generative ModelsCode5
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal RepresentationsCode5
Factuality Enhanced Language Models for Open-Ended Text GenerationCode5
Tool Learning with Foundation ModelsCode5
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic EvaluatorsCode5
Deep Lake: a Lakehouse for Deep LearningCode5
MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkitCode5
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-DictionaryCode5
Efficient Diffusion Model for Image Restoration by Residual ShiftingCode5
τ^2-Bench: Evaluating Conversational Agents in a Dual-Control EnvironmentCode5
DUSt3R: Geometric 3D Vision Made EasyCode5
Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar ModelingCode5
EnvPool: A Highly Parallel Reinforcement Learning Environment Execution EngineCode5
ProPainter: Improving Propagation and Transformer for Video InpaintingCode5
MedRAX: Medical Reasoning Agent for Chest X-rayCode5
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt CompressionCode5
TextMonkey: An OCR-Free Large Multimodal Model for Understanding DocumentCode5
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use CasesCode5
Self-Instruct: Aligning Language Models with Self-Generated InstructionsCode5
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical ReasoningCode5
ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body SkillsCode5
MambaIRv2: Attentive State Space RestorationCode5
WebLINX: Real-World Website Navigation with Multi-Turn DialogueCode5
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative ModelsCode5
Trust Regions for Explanations via Black-Box Probabilistic CertificationCode5
MEIA: Multimodal Embodied Perception and Interaction in Unknown EnvironmentsCode5
EasyPhoto: Your Smart AI Photo GeneratorCode5
Language Agents as Optimizable GraphsCode5
Data-Juicer: A One-Stop Data Processing System for Large Language ModelsCode5
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes InteractivelyCode5
Common 7B Language Models Already Possess Strong Math CapabilitiesCode5
Fast On-device LLM Inference with NPUsCode5
VideoCrafter1: Open Diffusion Models for High-Quality Video GenerationCode5
Efficient Multimodal Learning from Data-centric PerspectiveCode5
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented GenerationCode5
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and InferenceCode5
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task LearningCode5
A ConvNet for the 2020sCode5
Show:102550
← PrevPage 17 of 13200Next →