SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 51015150 of 661570 papers

TitleStatusHype
CGVQM+D: Computer Graphics Video Quality Metric and DatasetCode2
Efficient Speech Enhancement via Embeddings from Pre-trained Generative AudioencodersCode2
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic SoundscapesCode2
Execution Guided Line-by-Line Code GenerationCode2
SDialog: A Python Toolkit for Synthetic Dialogue Generation and AnalysisCode2
ConTextTab: A Semantics-Aware Tabular In-Context LearnerCode2
AutoMind: Adaptive Knowledgeable Agent for Automated Data ScienceCode2
Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMsCode2
CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design GenerationCode2
VideoDeepResearch: Long Video Understanding With Agentic Tool UsingCode2
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMsCode2
TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document ReasoningCode2
QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy PredictionCode2
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation BenchmarksCode2
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization ProblemsCode2
ChineseHarm-Bench: A Chinese Harmful Content Detection BenchmarkCode2
GLAP: General contrastive audio-text pretraining across domains and languagesCode2
SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill BlendingCode2
A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI AutonomyCode2
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical ReasoningCode2
TaskCraft: Automated Generation of Agentic TasksCode2
ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single ModelCode2
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual DrawingCode2
Marrying Autoregressive Transformer and Diffusion with Multi-Reference AutoregressionCode2
VerIF: Verification Engineering for Reinforcement Learning in Instruction FollowingCode2
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic EnvironmentsCode2
Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile InformationCode2
Urban1960SatSeg: Unsupervised Semantic Segmentation of Mid-20^th century Urban Landscapes with Satellite ImageriesCode2
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian SplattingCode2
CoRT: Code-integrated Reasoning within ThinkingCode2
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math ReasoningCode2
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video ModelsCode2
Do MIL Models Transfer?Code2
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usabilityCode2
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement LearningCode2
Solving the Job Shop Scheduling Problem with Graph Neural Networks: A Customizable Reinforcement Learning EnvironmentCode2
AbstentionBench: Reasoning LLMs Fail on Unanswerable QuestionsCode2
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand BetterCode2
StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video StreamsCode2
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model EvaluationCode2
Segment This Thing: Foveated Tokenization for Efficient Point-Prompted SegmentationCode2
SeerAttention-R: Sparse Attention Adaptation for Long ReasoningCode2
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm EngineeringCode2
FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation SystemsCode2
Snap-and-tune: combining deep learning and test-time optimization for high-fidelity cardiovascular volumetric meshingCode2
Open World Scene Graph Generation using Vision Language ModelsCode2
CausalPFN: Amortized Causal Effect Estimation via In-Context LearningCode2
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest QuestionsCode2
FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative ModelingCode2
Play to Generalize: Learning to Reason Through Game PlayCode2
Show:102550
← PrevPage 103 of 13232Next →