SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 97269750 of 474278 papers

TitleStatusHype
KnowMT-Bench: Benchmarking Knowledge-Intensive Long-Form Question Answering in Multi-Turn DialoguesCode0
Abductive Logical Rule Induction by Bridging Inductive Logic Programming and Multimodal Large Language ModelsCode0
Discrete Guidance Matching: Exact Guidance for Discrete Flow MatchingCode0
FailureAtlas:Mapping the Failure Landscape of T2I Models via Active ExplorationCode0
Beyond Textual Context: Structural Graph Encoding with Adaptive Space Alignment to alleviate the hallucination of LLMsCode0
Multidimensional Uncertainty Quantification via Optimal TransportCode0
DEFT: Decompositional Efficient Fine-Tuning for Text-to-Image ModelsCode0
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail GenerationCode0
From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round RefinementCode0
Scalable Option Learning in High-Throughput EnvironmentsCode0
NIFTY: a Non-Local Image Flow Matching for Texture SynthesisCode0
Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video RestorationCode0
Multi-Channel Differential Transformer for Cross-Domain Sleep Stage Classification with Heterogeneous EEG and EOGCode0
Chain or tree? Re-evaluating complex reasoning from the perspective of a matrix of thoughtCode0
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate ThemCode0
UniVid: Unifying Vision Tasks with Pre-trained Video Generation ModelsCode0
MIRG-RL: Multi-Image Reasoning and Grounding with Reinforcement LearningCode0
FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft LearningCode0
RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Social MediaCode0
SpecXNet: A Dual-Domain Convolutional Network for Robust Deepfake DetectionCode0
Think Right, Not More: Test-Time Scaling for Numerical Claim VerificationCode0
Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical SegmentationCode0
Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM ApproachCode0
γ-Quant: Towards Learnable Quantization for Low-bit Pattern RecognitionCode0
Language Models Can Learn from Verbal Feedback Without Scalar RewardsCode0
Show:102550
← PrevPage 390 of 18972Next →