SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 71767200 of 474278 papers

TitleStatusHype
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function OptimizationCode2
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation FrameworkCode2
On the State of NLP Approaches to Modeling Depression in Social Media: A Post-COVID-19 OutlookCode2
Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented GenerationCode2
radarODE-MTL: A Multi-Task Learning Framework with Eccentric Gradient Alignment for Robust Radar-Based ECG ReconstructionCode2
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information StructurizationCode2
Window Function-less DFT with Reduced Noise and Latency for Real-Time Music AnalysisCode2
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence ActCode2
Poison-splat: Computation Cost Attack on 3D Gaussian SplattingCode2
Doob's Lagrangian: A Sample-Efficient Variational Approach to Transition Path SamplingCode2
DelTA: An Online Document-Level Translation Agent Based on Multi-Level MemoryCode2
Heating Up Quasi-Monte Carlo Graph Random Features: A Diffusion Kernel PerspectiveCode2
Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-ExpertsCode2
Q-VLM: Post-training Quantization for Large Vision-Language ModelsCode2
Deconstructing equivariant representations in molecular systemsCode2
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven InteractionsCode2
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMsCode2
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language ModelsCode2
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring ModelingCode2
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical CodeCode2
MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian SplattingCode2
VibeCheck: Discover and Quantify Qualitative Differences in Large Language ModelsCode2
IncEventGS: Pose-Free Gaussian Splatting from a Single Event CameraCode2
Progressive Autoregressive Video Diffusion ModelsCode2
VoxelPrompt: A Vision-Language Agent for Grounded Medical Image AnalysisCode2
Show:102550
← PrevPage 288 of 18972Next →