SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 151175 of 177340 papers

TitleStatusHype
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary TextsCode9
LawGPT: A Chinese Legal Knowledge-Enhanced Large Language ModelCode9
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-HaystackCode9
NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context?Code9
YuE: Scaling Open Foundation Models for Long-Form Music GenerationCode9
Depth Anything V2Code9
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-TuningCode9
Visually Descriptive Language Model for Vector Graphics ReasoningCode9
KAG: Boosting LLMs in Professional Domains via Knowledge Augmented GenerationCode9
World Model on Million-Length Video And Language With Blockwise RingAttentionCode9
UFO2: The Desktop AgentOSCode9
LLM4Decompile: Decompiling Binary Code with Large Language ModelsCode9
Do Large Language Models Need a Content Delivery Network?Code9
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal UnderstandingCode9
LatentSync: Audio Conditioned Latent Diffusion Models for Lip SyncCode9
FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language ModelsCode9
MiniCPM4: Ultra-Efficient LLMs on End DevicesCode9
Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code UnderstandingCode9
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation ModelsCode9
OLMo: Accelerating the Science of Language ModelsCode9
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training StrategiesCode9
UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented GenerationCode9
Model Stock: All we need is just a few fine-tuned modelsCode9
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge FusionCode9
Large Action Models: From Inception to ImplementationCode9
Show:102550
← PrevPage 7 of 7094Next →