SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1985119900 of 474278 papers

TitleStatusHype
DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes0
Keyed Chaotic Dynamics for Privacy-Preserving Neural Inference0
Securing AI Agents with Information-Flow ControlCode2
UniTEX: Universal High Fidelity Generative Texturing for 3D ShapesCode2
FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image EditingCode1
The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated LearningCode1
Accelerating AllReduce with a Persistent StragglerCode1
HyperPointFormer: Multimodal Fusion in 3D Space with Dual-Branch Cross-Attention TransformersCode0
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and GenerationCode1
LADA: Scalable Label-Specific CLIP Adapter for Continual LearningCode1
GSO: Challenging Software Optimization Tasks for Evaluating SWE-AgentsCode2
TextRegion: Text-Aligned Region Tokens from Frozen Image-Text ModelsCode2
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language ModelCode0
MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models0
Thinking with Generated ImagesCode0
D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples0
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?0
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding0
DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers0
Do You See Me : A Multidimensional Benchmark for Evaluating Visual Perception in Multimodal LLMsCode1
GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning0
Continuous Evolution Pool: Taming Recurring Concept Drift in Online Time Series Forecasting0
PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms0
DeepRTL2: A Versatile Model for RTL-Related Tasks0
3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model0
GUST: Quantifying Free-Form Geometric Uncertainty of Metamaterials Using Small Data0
Improving statistical learning methods via features selection without replacement sampling and random projection0
Contextual Memory Intelligence -- A Foundational Paradigm for Human-AI Collaboration and Reflective Generative AI Systems0
Limits of Disclosure in Search Markets0
On the Interplay of Privacy, Persuasion and Quantization0
EvolveSearch: An Iterative Self-Evolving Search Agent0
LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents0
Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents0
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models0
NOCL: Node-Oriented Conceptualization LLM for Graph Tasks without Message Passing0
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models0
StateSpaceDiffuser: Bringing Long Context to Diffusion World Models0
Judging LLMs on a Simplex0
Universal Visuo-Tactile Video Understanding for Embodied Interaction0
THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models0
Learning World Models for Interactive Video Generation0
Individualised Counterfactual Examples Using Conformal Prediction Intervals0
RAG-Zeval: Towards Robust and Interpretable Evaluation on RAG Responses through End-to-End Rule-Guided Reasoning0
Neuromorphic Sequential Arena: A Benchmark for Neuromorphic Temporal ProcessingCode1
EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles0
Curse of High Dimensionality Issue in Transformer for Long-context ModelingCode0
Improving Out-of-Distribution Detection with Markov Logic Networks0
From Motion to Behavior: Hierarchical Modeling of Humanoid Generative Behavior Control0
Are classical deep neural networks weakly adversarially robust?0
Show:102550
← PrevPage 398 of 9486Next →