SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1525115300 of 474278 papers

TitleStatusHype
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial IntelligenceCode1
Premise Selection for a Lean HammerCode1
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement LearningCode1
SEED: Enhancing Text-to-SQL Performance and Practical Usability Through Automatic Evidence GenerationCode1
FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian VelocityCode1
MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language ModelsCode1
LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point CloudsCode1
Diffuse Everything: Multimodal Diffusion Models on Arbitrary State SpacesCode1
From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash EquilibriumCode1
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated TextCode1
Certified Unlearning for Neural NetworksCode1
Learning Compact Vision Tokens for Efficient Large Multimodal ModelsCode1
Towards Universal Offline Black-Box Optimization via Learning Language Model EmbeddingsCode1
AlphaSteer: Learning Refusal Steering with Principled Null-Space ConstraintCode1
Multi-Step Visual Reasoning with Visual Tokens Scaling and VerificationCode1
SAFE: Finding Sparse and Flat Minima to Improve PruningCode1
Depth-Optimal Quantum Layout Synthesis as SATCode1
DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference AccelerationCode1
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous DrivingCode1
Eigenspectrum Analysis of Neural Networks without Aspect Ratio BiasCode1
3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World ModelCode1
LETS Forecast: Learning Embedology for Time Series ForecastingCode1
Revealing hidden correlations from complex spatial distributions: Adjacent Correlation AnalysisCode1
Mapping correlations and coherence: adjacency-based approach to data visualization and regularity discoveryCode1
SDS-Net: Shallow-Deep Synergism-detection Network for infrared small target detectionCode1
Towards an Explainable Comparison and Alignment of Feature EmbeddingsCode1
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and ChallengingCode1
Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot LearningCode1
DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code GenerationCode1
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph PropertiesCode1
KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data LakesCode1
NILMFormer: Non-Intrusive Load Monitoring that Accounts for Non-StationarityCode1
Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation SystemsCode1
Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language ModelsCode1
FADE: Frequency-Aware Diffusion Model Factorization for Video EditingCode1
AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill DiversificationCode1
Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media ManipulationCode1
Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning ModelsCode1
MTPNet: Multi-Grained Target Perception for Unified Activity Cliff PredictionCode1
Progressive Tempering Sampler with DiffusionCode1
OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-ViewCode1
Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection LearningCode1
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long ContextsCode1
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual SimulationsCode1
OpenMaskDINO3D : Reasoning 3D Segmentation via Large Language ModelCode1
FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video GenerationCode1
Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal VerificationCode1
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout ReplayCode1
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia GamesCode1
Tuning the Right Foundation Models is What you Need for Partial Label LearningCode1
Show:102550
← PrevPage 306 of 9486Next →