SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1485114900 of 474278 papers

TitleStatusHype
AgenticRed: Optimizing Agentic Systems for Automated Red-teaming1
MuSLR: Multimodal Symbolic Logical Reasoning1
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty1
Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts1
Do Reasoning Models Enhance Embedding Models?1
Failing to Explore: Language Models on Interactive Tasks1
Epistemic Diversity and Knowledge Collapse in Large Language Models1
Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models1
Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space1
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers1
Optimal Scaling Needs Optimal Norm1
SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment1
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning1
CooperBench: Why Coding Agents Cannot be Your Teammates Yet1
TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models1
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios1
Multimodal Evaluation of Russian-language Architectures1
PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR1
One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment1
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models1
AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation1
Flow-based Extremal Mathematical Structure Discovery1
UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders1
TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors1
OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG1
Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods1
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning1
FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation1
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts1
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory1
Can Language Models Discover Scaling Laws?1
DSGym: A Holistic Framework for Evaluating and Training Data Science Agents1
A Mechanistic View on Video Generation as World Models: State and Dynamics1
Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers?1
From Charts to Code: A Hierarchical Benchmark for Multimodal Models1
Universal Reasoning ModelVerified1
NeuroXAI: Adaptive, robust, explainable surrogate framework for determination of channel importance in EEG applicationCode1
Tri-Learn Graph Fusion Network for Attributed Graph ClusteringCode1
FLEXITOKENS: Flexible Tokenization for Evolving Language ModelsCode1
Describe Anything Model for Visual Question Answering on Text-rich ImagesCode1
Mitigating Object Hallucinations via Sentence-Level Early InterventionCode1
InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofingCode1
MonoMVSNet: Monocular Priors Guided Multi-View Stereo NetworkCode1
Fairness-Aware Grouping for Continuous Sensitive Variables: Application for Debiasing Face Analysis with respect to Skin ToneCode1
AdaMuon: Adaptive Muon OptimizerCode1
Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based AdaptationCode1
Relative Entropy Pathwise Policy OptimizationCode1
Are Vision Foundation Models Ready for Out-of-the-Box Medical Image Registration?Code1
MMOne: Representing Multiple Modalities in One SceneCode1
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New BenchmarksCode1
Show:102550
← PrevPage 298 of 9486Next →