SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 92519300 of 661570 papers

TitleStatusHype
HECTOR: Hybrid Editable Compositional Object References for Video Generation0
Bayesian Transformer for Probabilistic Load Forecasting in Smart Grids0
How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms0
Autoregressive Visual Decoding from EEG Signals0
Revisiting Gradient Staleness: Evaluating Distance Metrics for Asynchronous Federated Learning Aggregation0
OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security0
ConflictBench: Evaluating Human-AI Conflict via Interactive and Visually Grounded Environments0
A Consensus-Driven Multi-LLM Pipeline for Missing-Person Investigations0
Latent Sculpting for Zero-Shot Generalization: A Manifold Learning Approach to Out-of-Distribution Anomaly Detection0
Semantic Risk Scoring of Aggregated Metrics: An AI-Driven Approach for Healthcare Data Governance0
MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals0
Beyond Attention Heatmaps: How to Get Better Explanations for Multiple Instance Learning Models in HistopathologyCode0
The FABRIC Strategy for Verifying Neural Feedback Systems0
Are vision-language models ready to zero-shot replace supervised classification models in agriculture?0
Context-free Self-Conditioned GAN for Trajectory Forecasting0
Double projection for reconstructing dynamical systems: between stochastic and deterministic regimes0
CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic DataCode0
The Ends Justify the Thoughts: RL-Induced Motivated Reasoning in LLM CoTsCode0
ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language ModelsCode0
ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?Code0
HypoSpace: Evaluating LLM Creativity as Set-Valued Hypothesis Generators under UnderdeterminationCode0
CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map RestraintsCode0
SCOPE: Scene-Contextualized Incremental Few-Shot 3D SegmentationCode0
SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic PlansCode0
Geometric Transformation-Embedded Mamba for Learned Video CompressionCode0
Enhancing Unregistered Hyperspectral Image Super-Resolution via Unmixing-based Abundance Fusion LearningCode0
VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision TransformerCode0
Missing No More: Dictionary-Guided Cross-Modal Image Fusion under Missing InfraredCode0
TALON: Test-time Adaptive Learning for On-the-Fly Category DiscoveryCode0
Training event-based neural networks with exact gradients via Differentiable ODE Solving in JAXCode0
LAMUS: A Large-Scale Corpus for Legal Argument Mining from U.S. Caselaw using LLMsCode0
Local-Global Prompt Learning via Sparse Optimal TransportCode0
Echo2ECG: Enhancing ECG Representations with Cardiac Morphology from Multi-View EchosCode0
OccTrack360: 4D Panoptic Occupancy Tracking from Surround-View Fisheye CamerasCode0
Computational Multi-Agents Society Experiments: Social Modeling Framework Based on Generative AgentsCode0
Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral SpecificationsCode0
Meissa: Multi-modal Medical Agentic IntelligenceCode0
LEL: Lipschitz Continuity Constrained Ensemble Learning for Efficient EEG-Based Intra-subject Emotion RecognitionCode0
Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation AttacksCode0
Unified and Semantically Grounded Domain Adaptation for Medical Image SegmentationCode0
MICA: Multi-Agent Industrial Coordination AssistantCode0
Mapping Overlaps in Benchmarks through Perplexity in the WildCode0
CroSTAta: Cross-State Transition Attention Transformer for Robotic ManipulationCode0
HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated AgentsCode0
TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor AnalysisCode0
ELLMob: Event-Driven Human Mobility Generation with Self-Aligned LLM FrameworkCode0
SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model ReasoningCode0
High-Fidelity Pruning for Large Language ModelsCode0
Adaptive MLP Pruning for Large Vision TransformersCode0
Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive WeightingCode0
Show:102550
← PrevPage 186 of 13232Next →