SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 15261550 of 661570 papers

TitleStatusHype
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative RefinementCode4
Blendify -- Python rendering framework for BlenderCode4
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN ProblemsCode4
Improving Parallel Program Performance with LLM Optimizers via Agent-System InterfacesCode4
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory TreeCode4
SNAC: Multi-Scale Neural Audio CodecCode4
Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM AgentsCode4
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active AssistanceCode4
One Step Diffusion via Shortcut ModelsCode4
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AICode4
MoH: Multi-Head Attention as Mixture-of-Head AttentionCode4
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality DocumentsCode4
EasyRAG: Efficient Retrieval-Augmented Generation Framework for Automated Network OperationsCode4
Agent-as-a-Judge: Evaluate Agents with AgentsCode4
Generalizable Humanoid Manipulation with 3D Diffusion PoliciesCode4
Depth Any Video with Scalable Synthetic DataCode4
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming HeadsCode4
When Does Perceptual Alignment Benefit Vision Representations?Code4
LLMMapReduce: Simplified Long-Sequence Processing using Large Language ModelsCode4
SuperCorrect: Supervising and Correcting Language Models with Error-Driven InsightsCode4
Generalizable and Animatable Gaussian Head AvatarCode4
Taking a turn for the better: Conversation redirection throughout the course of mental-health therapyCode4
CoBa: Convergence Balancer for Multitask Finetuning of Large Language ModelsCode4
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation ExpertsCode4
Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 MatchingCode4
Show:102550
← PrevPage 62 of 26463Next →