SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 89518975 of 474278 papers

TitleStatusHype
Mem4Nav: Boosting Vision-and-Language Navigation in Urban Environments with a Hierarchical Spatial-Cognition Long-Short Memory SystemCode0
TriP-LLM: A Tri-Branch Patch-wise Large Language Model Framework for Time-Series Anomaly DetectionCode0
Transferable Parasitic Estimation via Graph Contrastive Learning and Label Rebalancing in AMS CircuitsCode0
Reasoning through Exploration: A Reinforcement Learning Framework for Robust Function Calling0
Visual Representation Alignment for Multimodal Large Language Models0
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity0
GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations0
Online Video Depth Anything: Temporally-Consistent Depth Prediction with Low Memory Consumption0
On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models0
LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning0
Towards Safer and Understandable Driver Intention Prediction0
Obstacle Avoidance using Dynamic Movement Primitives and Reinforcement LearningCode0
Goal-oriented Backdoor Attack against Vision-Language-Action Models via Physical Objects0
KORMo: Korean Open Reasoning Model for Everyone0
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs0
StatEval: A Comprehensive Benchmark for Large Language Models in Statistics0
SpaceVista: All-Scale Visual Spatial Reasoning from mm to km0
Why Do Transformers Fail to Forecast Time Series In-Context?0
Don't Throw Away Your Pretrained Model0
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language ModelsCode0
AMFT: Aligning LLM Reasoners by Meta-Learning the Optimal Imitation-Exploration BalanceCode0
StreamingVLM: Real-Time Understanding for Infinite Video StreamsCode0
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive GenerationCode0
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data0
InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code GenerationCode0
Show:102550
← PrevPage 359 of 18972Next →