SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 29513000 of 659983 papers

TitleStatusHype
TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online ReconstructionCode0
Meta-Reinforcement Learning with Self-Reflection for Agentic SearchCode0
Rationale Matters: Learning Transferable Rubrics via Proxy-Guided Critique for VLM Reward ModelsCode0
Towards Motion-aware Referring Image SegmentationCode0
UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal ModelsCode0
Procedural Generation of Algorithm Discovery Tasks in Machine LearningCode0
Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated GradientsCode0
Self-Attention And Beyond the Infinite: Towards Linear Transformers with Infinite Self-AttentionCode0
Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation1
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery4
Complementary Reinforcement Learning1
Stereo World Model: Camera-Guided Stereo Video Generation1
Tree Search for LLM Agent Reinforcement Learning3
Generative Refocusing: Flexible Defocus Control from a Single Image3
Learning Goal-Oriented Vision-and-Language Navigation with Self-Improving Demonstrations at Scale1
FoMo X: Modular Explainability Signals for Outlier Detection Foundation Models0
Parameter-Efficient Modality-Balanced Symmetric Fusion for Multimodal Remote Sensing Semantic SegmentationCode0
Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores0
Scaling Attention via Feature Sparsity0
Latent Semantic Manifolds in Large Language Models0
Research on Individual Trait Clustering and Development Pathway Adaptation Based on the K-means Algorithm0
Sample Transform Cost-Based Training-Free Hallucination Detector for Large Language Models0
Mitigating Premature Discretization with Progressive Quantization for Robust Vector Tokenization0
Mix-and-Match Pruning: Globally Guided Layer-Wise Sparsification of DNNs0
Learning Communication Between Heterogeneous Agents in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence0
Efficient AI-Driven Multi-Section Whole Slide Image Analysis for Biochemical Recurrence Prediction in Prostate Cancer0
Solomonoff induction0
Understanding Pruning Regimes in Vision-Language Models Through Domain-Aware Layer Selection0
Me, Myself, and π : Evaluating and Explaining LLM Introspection0
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis0
A General Deep Learning Framework for Wireless Resource Allocation under Discrete Constraints0
Prompt-tuning with Attribute Guidance for Low-resource Entity Matching0
Target Concept Tuning Improves Extreme Weather Forecasting0
An FPGA-Based SoC Architecture with a RISC-V Controller for Energy-Efficient Temporal-Coding Spiking Neural Networks0
NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference0
DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models0
Auditing the Auditors: Does Community-based Moderation Get It Right?0
MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models1
TCATSeg: A Tooth Center-Wise Attention Network for 3D Dental Model Semantic Segmentation0
Beyond Accuracy: Evaluating Forecasting Models by Multi-Echelon Inventory Cost0
Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion PolicyCode0
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the WildCode0
LLM NL2SQL Robustness: Surface Noise vs. Linguistic Variation in Traditional and Agentic Settings0
Learning the Intrinsic Dimensionality of Fermi-Pasta-Ulam-Tsingou Trajectories: A Nonlinear Approach using a Deep Autoencoder Model0
Learning through Creation: A Hash-Free Framework for On-the-Fly Category DiscoveryCode0
Who Benchmarks the Benchmarks? A Case Study of LLM Evaluation in Icelandic0
SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation3
Noise-Response Calibration: A Causal Intervention Protocol for LLM-Judges0
Arabic Morphosyntactic Tagging and Dependency Parsing with Large Language Models0
Cascade-Aware Multi-Agent Routing: Spatio-Temporal Sidecars and Geometry-Switching0
Show:102550
← PrevPage 60 of 13200Next →