SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 61766200 of 474278 papers

TitleStatusHype
Exploring the Limit of Outcome Reward for Learning Mathematical ReasoningCode2
TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series ForecastingCode2
Skill Expansion and Composition in Parameter SpaceCode2
Saving 77% of the Parameters in Large Language Models Technical ReportCode2
3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised AnomalyCode2
Knowledge Graph-Guided Retrieval Augmented GenerationCode2
Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A SurveyCode2
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and DebuggingCode2
Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition BenchmarkCode2
Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation ModelCode2
GaussRender: Learning 3D Occupancy with Gaussian RenderingCode2
QuEST: Stable Training of LLMs with 1-Bit Weights and ActivationsCode2
Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph StructuresCode2
SiriuS: Self-improving Multi-agent Systems via Bootstrapped ReasoningCode2
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?Code2
MHAF-YOLO: Multi-Branch Heterogeneous Auxiliary Fusion YOLO for accurate object detectionCode2
NoLiMa: Long-Context Evaluation Beyond Literal MatchingCode2
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference OptimizationCode2
Training Language Models to Reason EfficientlyCode2
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality InversionCode2
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language ModelsCode2
SoK: Benchmarking Poisoning Attacks and Defenses in Federated LearningCode2
WaferLLM: Large Language Model Inference at Wafer ScaleCode2
CTR-Driven Advertising Image Generation with Multimodal Large Language ModelsCode2
Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance EstimationCode2
Show:102550
← PrevPage 248 of 18972Next →