SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 60016025 of 474278 papers

TitleStatusHype
Automatic database description generation for Text-to-SQLCode2
AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit TopologiesCode2
SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation ModelsCode2
UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly DetectionCode2
Digital Player: Evaluating Large Language Models based Human-like Agent in GamesCode2
BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational BiologyCode2
InsTaG: Learning Personalized 3D Talking Head from Few-Second VideoCode2
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report GenerationCode2
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASRCode2
Image Referenced Sketch Colorization Based on Animation Creation WorkflowCode2
LiteASR: Efficient Automatic Speech Recognition with Low-Rank ApproximationCode2
High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion ModelCode2
Mobius: Text to Seamless Looping Video Generation via Latent ShiftCode2
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image FusionCode2
Sanity Checking Causal Representation Learning on a Simple Real-World SystemCode2
FlexVAR: Flexible Visual Autoregressive Modeling without Residual PredictionCode2
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You ThinkCode2
One-for-More: Continual Diffusion Model for Anomaly DetectionCode2
AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web PlatformsCode2
OntologyRAG: Better and Faster Biomedical Code Mapping with Retrieval-Augmented Generation (RAG) Leveraging Ontology Knowledge Graphs and Large Language ModelsCode2
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K TokensCode2
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks AutomationCode2
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward SystemsCode2
BIG-Bench Extra HardCode2
FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series ForecastingCode2
Show:102550
← PrevPage 241 of 18972Next →