SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 54015425 of 661570 papers

TitleStatusHype
A Tutorial on Structural Identifiability of Epidemic Models Using StructuralIdentifiability.jlCode2
WorldPM: Scaling Human Preference ModelingCode2
MASS: Multi-Agent Simulation Scaling for Portfolio ConstructionCode2
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and ThoroughlyCode2
MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language ModelsCode2
Learning to Detect Multi-class Anomalies with Just One Normal Image PromptCode2
Recent Advances in Medical Imaging Segmentation: A SurveyCode2
WavReward: Spoken Dialogue Models With Generalist Reward EvaluatorsCode2
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"Code2
Few-Shot Anomaly-Driven Generation for Anomaly Classification and SegmentationCode2
MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-LearningCode2
Behind Maya: Building a Multilingual Vision Language ModelCode2
CodePDE: An Inference Framework for LLM-driven PDE Solver GenerationCode2
Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and EnhancementCode2
BAT: Benchmark for Auto-bidding TaskCode2
Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search AgentCode2
DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented GenerationCode2
YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language ModelsCode2
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem SolvingCode2
Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly DetectionCode2
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language ModelsCode2
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning EngineeringCode2
Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMsCode2
Piloting Structure-Based Drug Design via Modality-Specific Optimal ScheduleCode2
Unified Continuous Generative ModelsCode2
Show:102550
← PrevPage 217 of 26463Next →