SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2055120600 of 474278 papers

TitleStatusHype
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense CapabilitiesCode1
QCircuitNet: A Large-Scale Hierarchical Dataset for Quantum Algorithm DesignCode1
StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMsCode1
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal DissectionCode1
Minority-Focused Text-to-Image Generation via Prompt OptimizationCode1
OneNet: A Fine-Tuning Free Framework for Few-Shot Entity Linking via Large Language Model PromptingCode1
Noether's razor: Learning Conserved QuantitiesCode1
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time AlignmentCode1
Efficient Dictionary Learning with Switch Sparse AutoencodersCode1
Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from VideosCode1
ZeroComp: Zero-shot Object Compositing from Image Intrinsics via DiffusionCode1
Neural Reasoning Networks: Efficient Interpretable Neural Networks With Automatic Textual ExplanationsCode1
Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank StructuresCode1
StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for Large Language ModelsCode1
Causal Image Modeling for Efficient Visual UnderstandingCode1
Metalic: Meta-Learning In-Context with Protein Language ModelsCode1
RayEmb: Arbitrary Landmark Detection in X-Ray Images Using Ray Embedding SubspaceCode1
CrackSegDiff: Diffusion Probability Model-based Multi-modal Crack SegmentationCode1
SPA: 3D Spatial-Awareness Enables Effective Embodied RepresentationCode1
Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical ReasoningCode1
Executing Arithmetic: Fine-Tuning Large Language Models as Turing MachinesCode1
Understanding the Interplay between Parametric and Contextual Knowledge for Large Language ModelsCode1
Multi-Agent Collaborative Data Selection for Efficient LLM PretrainingCode1
Automatic Curriculum Expert Iteration for Reliable LLM ReasoningCode1
Physics and Deep Learning in Computational Wave ImagingCode1
Bilinear MLPs enable weight-based mechanistic interpretabilityCode1
Reward-Augmented Data Enhances Direct Preference Alignment of LLMsCode1
Pretraining Graph Transformers with Atom-in-a-Molecule Quantum Properties for Improved ADMET ModelingCode1
TANet: Triplet Attention Network for All-In-One Adverse Weather Image RestorationCode1
Pap2Pat: Towards Automated Paper-to-Patent Drafting using Chunk-based Outline-guided GenerationCode1
DiffGAD: A Diffusion-based Unsupervised Graph Anomaly DetectorCode1
Does Spatial Cognition Emerge in Frontier Models?Code1
To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language ModelsCode1
Dynamic Neural Potential Field: Online Trajectory Optimization in Presence of Moving ObstaclesCode1
Iterative Optimization Annotation Pipeline and ALSS-YOLO-Seg for Efficient Banana Plantation Segmentation in UAV ImageryCode1
InstructG2I: Synthesizing Images from Multimodal Attributed GraphsCode1
TextLap: Customizing Language Models for Text-to-Layout PlanningCode1
Cluster-wise Graph Transformer with Dual-granularity Kernelized AttentionCode1
Learning Evolving Tools for Large Language ModelsCode1
Continual Learning in the Frequency DomainCode1
Personalized Visual Instruction TuningCode1
LLM Embeddings Improve Test-time Adaptation to Tabular Y|X-ShiftsCode1
Towards Generalisable Time Series Understanding Across DomainsCode1
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data SelectionCode1
BiC-MPPI: Goal-Pursuing, Sampling-Based Bidirectional Rollout Clustering Path Integral for Trajectory OptimizationCode1
Deep Correlated Prompting for Visual Recognition with Missing ModalitiesCode1
Retrieval-Augmented Decision Transformer: External Memory for In-context RLCode1
ING-VP: MLLMs cannot Play Easy Vision-based Games YetCode1
Tree of Problems: Improving structured problem solving with compositionalityCode1
Mitigating Time Discretization Challenges with WeatherODE: A Sandwich Physics-Driven Neural ODE for Weather ForecastingCode1
Show:102550
← PrevPage 412 of 9486Next →